Numpy学习笔记

%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np

1. Introduction

  • 基本上所有的Python科学计算中都会使用\(Numpy\)
  • 这是一个给出向量、数组、高纬度的数据结构的宏包
  • 用于表示向量、矩阵和高纬度数据集的是\(array\)

2. Creating numpy arrays

\(array\)可以来自于Python中很多的数据类型,比如:

  • \(list\), \(tuple\)
  • functions dedicated to generating numpy arrays, \(arange\), \(linspace\) etc.
  • reading data from files(\(csv\) etc.)

2.1 From lists

我们使用 \(numpy.array\) 函数进行强制的类型转换

v = np.array([1,2,3,4])
m = np.array([[1,2],[3,4]])
v, type(v)
(array([1, 2, 3, 4]), numpy.ndarray)
m, type(m)
(array([[1, 2],
        [3, 4]]),
 numpy.ndarray)
v.shape, m.shape
((4,), (2, 2))
np.shape(v), np.shape(m)
((4,), (2, 2))
v.size, m.size
(4, 4)

ndarray 是numpy模块中的一个class
shape和size是ndarray每个实例的两个属性
shape(a), size(a) 是numpy模块中的函数,其功能是返回ndarray对象的shape和size属性

使用ndarray类实例的\(dtype\)属性,可以查看到ndarray中存储数据的类型
也可以在array函数中附加\(dtype\)属性的值,可以将原来list和tuple中的值的类型进行强制转换

M = np.array([[1,2],[3,4]], dtype = complex)
M
array([[1.+0.j, 2.+0.j],
       [3.+0.j, 4.+0.j]])

\(dtype\)属性有\(int, float, complex, bool, object\)

2.2 Using array-generating functions

We can use functions to generate arrays of different forms

\(arange\)

Create a range

x = arange(start, stop, step)
x = np.arange (0, 10, 1)
x
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
x = np.arange(-1, 1, 0.1)
x
array([-1.00000000e+00, -9.00000000e-01, -8.00000000e-01, -7.00000000e-01,
       -6.00000000e-01, -5.00000000e-01, -4.00000000e-01, -3.00000000e-01,
       -2.00000000e-01, -1.00000000e-01, -2.22044605e-16,  1.00000000e-01,
        2.00000000e-01,  3.00000000e-01,  4.00000000e-01,  5.00000000e-01,
        6.00000000e-01,  7.00000000e-01,  8.00000000e-01,  9.00000000e-01])

\(linspace \& logspace\)

\(linspace\) 在一定范围之间一定数量的等间隔的数

x = np.linspace(start, stop, N)

\(logspace\) 在一定范围之间一定数量的等间隔的数作为指数, base为底

x = np.logspace(start, stop, N, base = a)
np.linspace(0, 10, 25)
array([ 0.        ,  0.41666667,  0.83333333,  1.25      ,  1.66666667,
        2.08333333,  2.5       ,  2.91666667,  3.33333333,  3.75      ,
        4.16666667,  4.58333333,  5.        ,  5.41666667,  5.83333333,
        6.25      ,  6.66666667,  7.08333333,  7.5       ,  7.91666667,
        8.33333333,  8.75      ,  9.16666667,  9.58333333, 10.        ])
np.logspace(0, 10, 11, base = 10)
array([1.e+00, 1.e+01, 1.e+02, 1.e+03, 1.e+04, 1.e+05, 1.e+06, 1.e+07,
       1.e+08, 1.e+09, 1.e+10])

\(mgrid\)

x, y = np.mgrid[0:5, 0:6]
x, y
(array([[0, 0, 0, 0, 0, 0],
        [1, 1, 1, 1, 1, 1],
        [2, 2, 2, 2, 2, 2],
        [3, 3, 3, 3, 3, 3],
        [4, 4, 4, 4, 4, 4]]),
 array([[0, 1, 2, 3, 4, 5],
        [0, 1, 2, 3, 4, 5],
        [0, 1, 2, 3, 4, 5],
        [0, 1, 2, 3, 4, 5],
        [0, 1, 2, 3, 4, 5]]))

\(random \quad data\)

\(rand\) 产生 \([0,1]\)内的均匀分布

np.random.rand(5,5)
array([[0.66442015, 0.90629324, 0.5467012 , 0.00513714, 0.49277189],
       [0.17140787, 0.92461858, 0.67124238, 0.2545325 , 0.74407112],
       [0.39156052, 0.40204662, 0.47811055, 0.81849795, 0.28357586],
       [0.38806353, 0.5585088 , 0.36854102, 0.2059275 , 0.5687478 ],
       [0.17182169, 0.42607232, 0.31883796, 0.05032641, 0.11960338]])

\(randn\) 按照标准正态分布产生随机数

np.random.randn(5,5)
array([[ 0.01381457, -1.19134824, -0.96232556, -0.81001151, -0.20093418],
       [ 0.73000021, -1.27875203,  0.71340573,  1.42708013,  1.91083278],
       [-0.26139491, -1.01805302, -2.37713149, -2.74976479, -0.72777884],
       [-0.36674878, -0.67727608,  1.31941451,  0.18738834,  0.29141163],
       [ 0.06743741, -0.67125098, -1.02038057, -0.68247009,  0.97729812]])

\(diag\)

\(diag(list)\)\(list\)里面的元素生成一个对角矩阵

np.diag([1,2,3])
array([[1, 0, 0],
       [0, 2, 0],
       [0, 0, 3]])
np.diag([1,2,3], k=1)
array([[0, 1, 0, 0],
       [0, 0, 2, 0],
       [0, 0, 0, 3],
       [0, 0, 0, 0]])

\(zeros\&ones\)

np.zeros((3,3))
array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])
np.ones((3,3))
array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

传入的参数一定要是一个元组\((w,h)\)

3. File I/O

3.1 Comma-separated values (CSV)

逗号分隔文件,纯字符的形式读取和存储

\(CSV\)\(TSV\)文件,\(Comma-separated\)\(Tad-separated\)
使用\(np.genfromtxt\)函数来读取

\(savetxt\) 进行纯字符形式的存储

M = np.random.rand(3,3)
M
array([[0.28011205, 0.82406684, 0.539676  ],
       [0.48132395, 0.4149462 , 0.04837249],
       [0.91403304, 0.00245526, 0.41934376]])
np.savetxt('random.csv', M, fmt='%.5f')
!random.csv

3.2 Numpy's native file format

Use \(np.save\) and \(np.load\) to save and load \(.npy\) files.

np.save("random.npy", M)
N = np.load("random.npy")
N
array([[0.28011205, 0.82406684, 0.539676  ],
       [0.48132395, 0.4149462 , 0.04837249],
       [0.91403304, 0.00245526, 0.41934376]])

4. More properties of the numpy arrays

bytes per element

M.itemsize
8

number of bytes

M.nbytes
72

number of dimensions

M.ndim
2

5. Manipulating arrays

5.1 Indexing

M[0]
array([0.28011205, 0.82406684, 0.539676  ])
M[0,:]
array([0.28011205, 0.82406684, 0.539676  ])
M[1,1]
0.4149462008918443
M[:,1]
array([0.82406684, 0.4149462 , 0.00245526])
M[0,0]
0.2801120490316811
M[1,:] = 0
M
array([[0.28011205, 0.82406684, 0.539676  ],
       [0.        , 0.        , 0.        ],
       [0.91403304, 0.00245526, 0.41934376]])
M[:,2] = -1
M
array([[ 0.28011205,  0.82406684, -1.        ],
       [ 0.        ,  0.        , -1.        ],
       [ 0.91403304,  0.00245526, -1.        ]])

5.2 Index slicing

Index slicing 是用来对数组使用\(M[lower:upper:step]\)进行片段提取的操作

A = np.arange(1,6)
A
array([1, 2, 3, 4, 5])
A[1:3]
array([2, 3])
A[0:5:2]
array([1, 3, 5])
A[1:3] = [-2,-3]
A
array([ 1, -2, -3,  4,  5])
A[-1]
5
A[-3:]
array([-3,  4,  5])
A = np.array([n+m*10 for n in range(5) for m in range(4)])
A
array([ 0, 10, 20, 30,  1, 11, 21, 31,  2, 12, 22, 32,  3, 13, 23, 33,  4,
       14, 24, 34])
A = np.array([[n+m*10 for n in range(5)] for m in range(4)])
A
array([[ 0,  1,  2,  3,  4],
       [10, 11, 12, 13, 14],
       [20, 21, 22, 23, 24],
       [30, 31, 32, 33, 34]])
A[1:4, 1:4]
array([[11, 12, 13],
       [21, 22, 23],
       [31, 32, 33]])

5.3 Fancy Indexing

index of a \(list\) or an \(array\)

row_indices = [1,2,3]
col_indices = [1,2,-1]
A[row_indices, col_indices]
array([11, 22, 34])
A[row_indices]
array([[10, 11, 12, 13, 14],
       [20, 21, 22, 23, 24],
       [30, 31, 32, 33, 34]])
B = np.array([n for n in range(5)])
B
array([0, 1, 2, 3, 4])
row_mask = np.array([True, False, True, False, False])
B[row_mask]
array([0, 2])
row_mask = np.array([1,0,1,0,0], dtype = bool)
B[row_mask]
array([0, 2])
x = np.arange(0, 10, 0.5)
x
array([0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5, 5. , 5.5, 6. ,
       6.5, 7. , 7.5, 8. , 8.5, 9. , 9.5])
mask = (5<x) * (x<7.5)
mask
array([False, False, False, False, False, False, False, False, False,
       False, False,  True,  True,  True,  True, False, False, False,
       False, False])
x [mask]
array([5.5, 6. , 6.5, 7. ])

6. Functions for extracting data from arrays and creating arrays

\(where\)

\(where\) 用来寻找bool类型的数组中真值的下标

indices = np.where(mask)
indices
(array([11, 12, 13, 14], dtype=int64),)
x[indices]
array([5.5, 6. , 6.5, 7. ])

\(diag\)

取出主对角线

A
array([[ 0,  1,  2,  3,  4],
       [10, 11, 12, 13, 14],
       [20, 21, 22, 23, 24],
       [30, 31, 32, 33, 34]])
np.diag(A)
array([ 0, 11, 22, 33])
np.diag(A, -1)
array([10, 21, 32])

\(take\)

B
array([0, 1, 2, 3, 4])
B.take([0,2,4])
array([0, 2, 4])
np.take([0,1,2,3,4],[0,2,4])
array([0, 2, 4])

\(choose\)

which = [1, 0, 1, 0]
choices = [[-2,-2,-2,-2],[5,5,5,5]]
np.choose (which, choices)
array([ 5, -2,  5, -2])
which = [1,2,0,1]
choices = [[-2,-2,-2,-2], [5,5,5,5], [6,6,6,6]]
np.choose (which, choices)
array([ 5,  6, -2,  5])

决定每个位置的元素是来自于第几个列表

7. Linear Algebra

可以基于\(Numpy\)模块中的两个类进行线性代数的运算。线性代数运算的核心在于使用向量来表征运算。

  • \(array\)
  • \(matrix\)

7.1 Scalar-array operation

v1 = np.arange(0,5)
v1*2
array([0, 2, 4, 6, 8])
v1+2
array([2, 3, 4, 5, 6])
A*2, A+2
(array([[ 0,  2,  4,  6,  8],
        [20, 22, 24, 26, 28],
        [40, 42, 44, 46, 48],
        [60, 62, 64, 66, 68]]),
 array([[ 2,  3,  4,  5,  6],
        [12, 13, 14, 15, 16],
        [22, 23, 24, 25, 26],
        [32, 33, 34, 35, 36]]))

这里采用的是广播机制

7.2 Element-wise array-array operations

在array之间使用的运算符运算都是对应位置元素之间进行运算。
称为element-wise

A * A
array([[   0,    1,    4,    9,   16],
       [ 100,  121,  144,  169,  196],
       [ 400,  441,  484,  529,  576],
       [ 900,  961, 1024, 1089, 1156]])
v1*v1
array([ 0,  1,  4,  9, 16])
A.shape, v1.shape
((4, 5), (5,))
A*v1
array([[  0,   1,   4,   9,  16],
       [  0,  11,  24,  39,  56],
       [  0,  21,  44,  69,  96],
       [  0,  31,  64,  99, 136]])
A, v1
(array([[ 0,  1,  2,  3,  4],
        [10, 11, 12, 13, 14],
        [20, 21, 22, 23, 24],
        [30, 31, 32, 33, 34]]),
 array([0, 1, 2, 3, 4]))

这里所使用到的称为广播机制

7.3 Matrix algebra

要做矩阵运算,就有以下两种运算方式:

  • \(array\)
  • \(matrix\)

\(array\)

np.dot(A,v1)
array([ 30, 130, 230, 330])

注意array里面用list存储的向量是列向量!!
因为这个相当于每行都是一个只有一个元素的list!!

\(matrix\)

M = np.matrix(A)
v = np.matrix(v1).T
M, v
(matrix([[ 0,  1,  2,  3,  4],
         [10, 11, 12, 13, 14],
         [20, 21, 22, 23, 24],
         [30, 31, 32, 33, 34]]),
 matrix([[0],
         [1],
         [2],
         [3],
         [4]]))
M*v
matrix([[ 30],
        [130],
        [230],
        [330]])
#M+v

这一步会出错,因为矩阵之间的加法需要矩阵的尺寸完全一样。

v.T * v
matrix([[30]])

这一步就是求向量积

np.shape(v), np.shape(M)
((5, 1), (4, 5))

7.4 Matrix computations

Inverse

就是求逆矩阵

C = M
C
matrix([[ 0,  1,  2,  3,  4],
        [10, 11, 12, 13, 14],
        [20, 21, 22, 23, 24],
        [30, 31, 32, 33, 34]])
C.I
matrix([[-0.158, -0.086, -0.014,  0.058],
        [-0.082, -0.044, -0.006,  0.032],
        [-0.006, -0.002,  0.002,  0.006],
        [ 0.07 ,  0.04 ,  0.01 , -0.02 ],
        [ 0.146,  0.082,  0.018, -0.046]])

Determinant

Det = v * v.T
np.linalg.det(Det)
0.0

7.5 Data Processing

\(mean: mean(data[:,3])\)

\(standard\quad deviations: std(data[:,3])\)

\(variance: var(data[:, 3])\)

\(min: data[:,3].min()\)

\(max: data[:,3].max()\)

\(sum: sum(matrix)\)

\(prod: prod(matrix)\)

\(cumsum: cumsum(matrix)\) 前缀和

\(cumprod: cumprod(matrix)\) 前缀积

\(trace: trace(matrix)\)

7.6 Computations on subsets of arrays

\(unique: unique(data[:,1])\)

\(mask\_feb = data[:, 1] == 2\)


posted @ 2020-02-29 12:54  Xiaojian_xiang  阅读(160)  评论(0编辑  收藏  举报