Numpy学习笔记
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
1. Introduction
- 基本上所有的Python科学计算中都会使用\(Numpy\)
- 这是一个给出向量、数组、高纬度的数据结构的宏包
- 用于表示向量、矩阵和高纬度数据集的是\(array\)
2. Creating numpy arrays
\(array\)可以来自于Python中很多的数据类型,比如:
- \(list\), \(tuple\)
- functions dedicated to generating numpy arrays, \(arange\), \(linspace\) etc.
- reading data from files(\(csv\) etc.)
2.1 From lists
我们使用 \(numpy.array\) 函数进行强制的类型转换
v = np.array([1,2,3,4])
m = np.array([[1,2],[3,4]])
v, type(v)
(array([1, 2, 3, 4]), numpy.ndarray)
m, type(m)
(array([[1, 2],
[3, 4]]),
numpy.ndarray)
v.shape, m.shape
((4,), (2, 2))
np.shape(v), np.shape(m)
((4,), (2, 2))
v.size, m.size
(4, 4)
ndarray 是numpy模块中的一个class
shape和size是ndarray每个实例的两个属性
shape(a), size(a) 是numpy模块中的函数,其功能是返回ndarray对象的shape和size属性
使用ndarray类实例的\(dtype\)属性,可以查看到ndarray中存储数据的类型
也可以在array函数中附加\(dtype\)属性的值,可以将原来list和tuple中的值的类型进行强制转换
M = np.array([[1,2],[3,4]], dtype = complex)
M
array([[1.+0.j, 2.+0.j],
[3.+0.j, 4.+0.j]])
\(dtype\)属性有\(int, float, complex, bool, object\)等
2.2 Using array-generating functions
We can use functions to generate arrays of different forms
\(arange\)
Create a range
x = arange(start, stop, step)
x = np.arange (0, 10, 1)
x
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
x = np.arange(-1, 1, 0.1)
x
array([-1.00000000e+00, -9.00000000e-01, -8.00000000e-01, -7.00000000e-01,
-6.00000000e-01, -5.00000000e-01, -4.00000000e-01, -3.00000000e-01,
-2.00000000e-01, -1.00000000e-01, -2.22044605e-16, 1.00000000e-01,
2.00000000e-01, 3.00000000e-01, 4.00000000e-01, 5.00000000e-01,
6.00000000e-01, 7.00000000e-01, 8.00000000e-01, 9.00000000e-01])
\(linspace \& logspace\)
\(linspace\) 在一定范围之间一定数量的等间隔的数
x = np.linspace(start, stop, N)
\(logspace\) 在一定范围之间一定数量的等间隔的数作为指数, base为底
x = np.logspace(start, stop, N, base = a)
np.linspace(0, 10, 25)
array([ 0. , 0.41666667, 0.83333333, 1.25 , 1.66666667,
2.08333333, 2.5 , 2.91666667, 3.33333333, 3.75 ,
4.16666667, 4.58333333, 5. , 5.41666667, 5.83333333,
6.25 , 6.66666667, 7.08333333, 7.5 , 7.91666667,
8.33333333, 8.75 , 9.16666667, 9.58333333, 10. ])
np.logspace(0, 10, 11, base = 10)
array([1.e+00, 1.e+01, 1.e+02, 1.e+03, 1.e+04, 1.e+05, 1.e+06, 1.e+07,
1.e+08, 1.e+09, 1.e+10])
\(mgrid\)
x, y = np.mgrid[0:5, 0:6]
x, y
(array([[0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1],
[2, 2, 2, 2, 2, 2],
[3, 3, 3, 3, 3, 3],
[4, 4, 4, 4, 4, 4]]),
array([[0, 1, 2, 3, 4, 5],
[0, 1, 2, 3, 4, 5],
[0, 1, 2, 3, 4, 5],
[0, 1, 2, 3, 4, 5],
[0, 1, 2, 3, 4, 5]]))
\(random \quad data\)
\(rand\) 产生 \([0,1]\)内的均匀分布
np.random.rand(5,5)
array([[0.66442015, 0.90629324, 0.5467012 , 0.00513714, 0.49277189],
[0.17140787, 0.92461858, 0.67124238, 0.2545325 , 0.74407112],
[0.39156052, 0.40204662, 0.47811055, 0.81849795, 0.28357586],
[0.38806353, 0.5585088 , 0.36854102, 0.2059275 , 0.5687478 ],
[0.17182169, 0.42607232, 0.31883796, 0.05032641, 0.11960338]])
\(randn\) 按照标准正态分布产生随机数
np.random.randn(5,5)
array([[ 0.01381457, -1.19134824, -0.96232556, -0.81001151, -0.20093418],
[ 0.73000021, -1.27875203, 0.71340573, 1.42708013, 1.91083278],
[-0.26139491, -1.01805302, -2.37713149, -2.74976479, -0.72777884],
[-0.36674878, -0.67727608, 1.31941451, 0.18738834, 0.29141163],
[ 0.06743741, -0.67125098, -1.02038057, -0.68247009, 0.97729812]])
\(diag\)
\(diag(list)\) 用\(list\)里面的元素生成一个对角矩阵
np.diag([1,2,3])
array([[1, 0, 0],
[0, 2, 0],
[0, 0, 3]])
np.diag([1,2,3], k=1)
array([[0, 1, 0, 0],
[0, 0, 2, 0],
[0, 0, 0, 3],
[0, 0, 0, 0]])
\(zeros\&ones\)
np.zeros((3,3))
array([[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]])
np.ones((3,3))
array([[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.]])
传入的参数一定要是一个元组\((w,h)\)
3. File I/O
3.1 Comma-separated values (CSV)
逗号分隔文件,纯字符的形式读取和存储
\(CSV\)和\(TSV\)文件,\(Comma-separated\)和\(Tad-separated\)
使用\(np.genfromtxt\)函数来读取
用 \(savetxt\) 进行纯字符形式的存储
M = np.random.rand(3,3)
M
array([[0.28011205, 0.82406684, 0.539676 ],
[0.48132395, 0.4149462 , 0.04837249],
[0.91403304, 0.00245526, 0.41934376]])
np.savetxt('random.csv', M, fmt='%.5f')
!random.csv
3.2 Numpy's native file format
Use \(np.save\) and \(np.load\) to save and load \(.npy\) files.
np.save("random.npy", M)
N = np.load("random.npy")
N
array([[0.28011205, 0.82406684, 0.539676 ],
[0.48132395, 0.4149462 , 0.04837249],
[0.91403304, 0.00245526, 0.41934376]])
4. More properties of the numpy arrays
bytes per element
M.itemsize
8
number of bytes
M.nbytes
72
number of dimensions
M.ndim
2
5. Manipulating arrays
5.1 Indexing
M[0]
array([0.28011205, 0.82406684, 0.539676 ])
M[0,:]
array([0.28011205, 0.82406684, 0.539676 ])
M[1,1]
0.4149462008918443
M[:,1]
array([0.82406684, 0.4149462 , 0.00245526])
M[0,0]
0.2801120490316811
M[1,:] = 0
M
array([[0.28011205, 0.82406684, 0.539676 ],
[0. , 0. , 0. ],
[0.91403304, 0.00245526, 0.41934376]])
M[:,2] = -1
M
array([[ 0.28011205, 0.82406684, -1. ],
[ 0. , 0. , -1. ],
[ 0.91403304, 0.00245526, -1. ]])
5.2 Index slicing
Index slicing 是用来对数组使用\(M[lower:upper:step]\)进行片段提取的操作
A = np.arange(1,6)
A
array([1, 2, 3, 4, 5])
A[1:3]
array([2, 3])
A[0:5:2]
array([1, 3, 5])
A[1:3] = [-2,-3]
A
array([ 1, -2, -3, 4, 5])
A[-1]
5
A[-3:]
array([-3, 4, 5])
A = np.array([n+m*10 for n in range(5) for m in range(4)])
A
array([ 0, 10, 20, 30, 1, 11, 21, 31, 2, 12, 22, 32, 3, 13, 23, 33, 4,
14, 24, 34])
A = np.array([[n+m*10 for n in range(5)] for m in range(4)])
A
array([[ 0, 1, 2, 3, 4],
[10, 11, 12, 13, 14],
[20, 21, 22, 23, 24],
[30, 31, 32, 33, 34]])
A[1:4, 1:4]
array([[11, 12, 13],
[21, 22, 23],
[31, 32, 33]])
5.3 Fancy Indexing
index of a \(list\) or an \(array\)
row_indices = [1,2,3]
col_indices = [1,2,-1]
A[row_indices, col_indices]
array([11, 22, 34])
A[row_indices]
array([[10, 11, 12, 13, 14],
[20, 21, 22, 23, 24],
[30, 31, 32, 33, 34]])
B = np.array([n for n in range(5)])
B
array([0, 1, 2, 3, 4])
row_mask = np.array([True, False, True, False, False])
B[row_mask]
array([0, 2])
row_mask = np.array([1,0,1,0,0], dtype = bool)
B[row_mask]
array([0, 2])
x = np.arange(0, 10, 0.5)
x
array([0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5, 5. , 5.5, 6. ,
6.5, 7. , 7.5, 8. , 8.5, 9. , 9.5])
mask = (5<x) * (x<7.5)
mask
array([False, False, False, False, False, False, False, False, False,
False, False, True, True, True, True, False, False, False,
False, False])
x [mask]
array([5.5, 6. , 6.5, 7. ])
6. Functions for extracting data from arrays and creating arrays
\(where\)
\(where\) 用来寻找bool类型的数组中真值的下标
indices = np.where(mask)
indices
(array([11, 12, 13, 14], dtype=int64),)
x[indices]
array([5.5, 6. , 6.5, 7. ])
\(diag\)
取出主对角线
A
array([[ 0, 1, 2, 3, 4],
[10, 11, 12, 13, 14],
[20, 21, 22, 23, 24],
[30, 31, 32, 33, 34]])
np.diag(A)
array([ 0, 11, 22, 33])
np.diag(A, -1)
array([10, 21, 32])
\(take\)
B
array([0, 1, 2, 3, 4])
B.take([0,2,4])
array([0, 2, 4])
np.take([0,1,2,3,4],[0,2,4])
array([0, 2, 4])
\(choose\)
which = [1, 0, 1, 0]
choices = [[-2,-2,-2,-2],[5,5,5,5]]
np.choose (which, choices)
array([ 5, -2, 5, -2])
which = [1,2,0,1]
choices = [[-2,-2,-2,-2], [5,5,5,5], [6,6,6,6]]
np.choose (which, choices)
array([ 5, 6, -2, 5])
决定每个位置的元素是来自于第几个列表
7. Linear Algebra
可以基于\(Numpy\)模块中的两个类进行线性代数的运算。线性代数运算的核心在于使用向量来表征运算。
- \(array\)
- \(matrix\)
7.1 Scalar-array operation
v1 = np.arange(0,5)
v1*2
array([0, 2, 4, 6, 8])
v1+2
array([2, 3, 4, 5, 6])
A*2, A+2
(array([[ 0, 2, 4, 6, 8],
[20, 22, 24, 26, 28],
[40, 42, 44, 46, 48],
[60, 62, 64, 66, 68]]),
array([[ 2, 3, 4, 5, 6],
[12, 13, 14, 15, 16],
[22, 23, 24, 25, 26],
[32, 33, 34, 35, 36]]))
这里采用的是广播机制
7.2 Element-wise array-array operations
在array之间使用的运算符运算都是对应位置元素之间进行运算。
称为element-wise
A * A
array([[ 0, 1, 4, 9, 16],
[ 100, 121, 144, 169, 196],
[ 400, 441, 484, 529, 576],
[ 900, 961, 1024, 1089, 1156]])
v1*v1
array([ 0, 1, 4, 9, 16])
A.shape, v1.shape
((4, 5), (5,))
A*v1
array([[ 0, 1, 4, 9, 16],
[ 0, 11, 24, 39, 56],
[ 0, 21, 44, 69, 96],
[ 0, 31, 64, 99, 136]])
A, v1
(array([[ 0, 1, 2, 3, 4],
[10, 11, 12, 13, 14],
[20, 21, 22, 23, 24],
[30, 31, 32, 33, 34]]),
array([0, 1, 2, 3, 4]))
这里所使用到的称为广播机制
7.3 Matrix algebra
要做矩阵运算,就有以下两种运算方式:
- \(array\)
- \(matrix\)
\(array\)
np.dot(A,v1)
array([ 30, 130, 230, 330])
注意array里面用list存储的向量是列向量!!
因为这个相当于每行都是一个只有一个元素的list!!
\(matrix\)
M = np.matrix(A)
v = np.matrix(v1).T
M, v
(matrix([[ 0, 1, 2, 3, 4],
[10, 11, 12, 13, 14],
[20, 21, 22, 23, 24],
[30, 31, 32, 33, 34]]),
matrix([[0],
[1],
[2],
[3],
[4]]))
M*v
matrix([[ 30],
[130],
[230],
[330]])
#M+v
这一步会出错,因为矩阵之间的加法需要矩阵的尺寸完全一样。
v.T * v
matrix([[30]])
这一步就是求向量积
np.shape(v), np.shape(M)
((5, 1), (4, 5))
7.4 Matrix computations
Inverse
就是求逆矩阵
C = M
C
matrix([[ 0, 1, 2, 3, 4],
[10, 11, 12, 13, 14],
[20, 21, 22, 23, 24],
[30, 31, 32, 33, 34]])
C.I
matrix([[-0.158, -0.086, -0.014, 0.058],
[-0.082, -0.044, -0.006, 0.032],
[-0.006, -0.002, 0.002, 0.006],
[ 0.07 , 0.04 , 0.01 , -0.02 ],
[ 0.146, 0.082, 0.018, -0.046]])
Determinant
Det = v * v.T
np.linalg.det(Det)
0.0
7.5 Data Processing
\(mean: mean(data[:,3])\)
\(standard\quad deviations: std(data[:,3])\)
\(variance: var(data[:, 3])\)
\(min: data[:,3].min()\)
\(max: data[:,3].max()\)
\(sum: sum(matrix)\)
\(prod: prod(matrix)\)
\(cumsum: cumsum(matrix)\) 前缀和
\(cumprod: cumprod(matrix)\) 前缀积
\(trace: trace(matrix)\) 迹
7.6 Computations on subsets of arrays
\(unique: unique(data[:,1])\)
\(mask\_feb = data[:, 1] == 2\)