numpy 教程

本文整理自：Machine Learning Plus

Numpy是Python中最基本和最强大的科学计算和数据处理软件包。

numpy提供了优秀的ndarray对象，n维数组的简称。

在'ndarray'对象中，又名'数组'，可以存储多个相同数据类型的项目。正是这些围绕数组对象的工具使numpy便于执行数学和数据操作。

你可能会想，'我可以将数字和其他对象存储在Python列表中，并通过列表解析，for循环等方式进行各种计算和操作。我需要一个numpy数组用于什么？'

那么，使用numpy数组相对于列表有非常显着的优势。

如何创建numpy 数组

参考 numpy笔记（一）：多维数组的创建

1. 从已有数据中创建：

 1 # create from list
 2 a = np.array([1, 2, 3, 4])
 3 print 'a is:', a
 4 
 5 #create from tuple
 6 '''
 7 tuple can't change after initialization;
 8 tuple1 = (1,)
 9 number1 = (1)
10 '''
11 b = np.array((1, 2, 3, 4))
12 print 'b is:', b
13 
14 #load from file
15 #text saved by this method is not readable
16 from tempfile import TemporaryFile
17 
18 origin_array = np.array([1, 2, 3, 4])
19 np.save('/tmp/array', origin_array)
20 
21 array_from_file = np.load('/tmp/array.npy')
22 print array_from_file
23 
24 #text saved by this method is readable
25 origin_array = np.array([1, 2, 3, 4])
26 np.savetxt('/tmp/array.txt', origin_array)
27 
28 array_from_file = np.loadtxt('/tmp/array.txt')
29 print array_from_file
30 
31 #read from string
32 array = np.fromstring('1 2 3 4', dtype=float, sep=' ')
33 #best practice is explicitly indicate dtye

2. 创建矩阵

#一维数组
#创建给定形状的多维数组并将数组中所有元素填充为 1
print np.ones((3, 4))

#创建给定形状的多维数组并将数组中所有元素填充为 0
print np.zeros((3, 4))

#创建给定形状的多维数组，但不进行初始化，得到的多维数组中的元素值是不确定的
print np.empty((3, 4))

#创建给定形状的多维数组并将数组中所有元素填充为指定值
print np.full((3, 4), 17)

#从numerical range 创建多维数组
#创建一个一维的数组, arange(start, stop[, step])
print np.arange(10)
print np.arange(9, -1, -1)

#给定一个区间，返回等差数列
print np.linspace(start, stop, num = 50, endpoint=True..是否加上最后一个数, retstep=False..是否返回间距, dtype=None)

#给定一个区间，返回等比数列
print np.logspace(start, stop, num=50, endpoint=True, base=10.0, dtype=None)
## the sequence starts at base ** start (base to the power of start) and ends with base ** stop (see endpoint below).

#创建矩阵（二维数组）
#创建一个对角矩阵或者 super/sub diagional square matrix，且所指定的对角线上的元素值为 1.
numpy.eye(N, M=None, k=0, dtype=<type 'float'>)
#k : int, optional :Index of the diagonal: 0 (the default) refers to the main diagonal, a positive value refers to an upper diagonal, and a negative value to a lower diagonal.

#创建单位矩阵
numpy.identity(n, dtype=None)[source]

#创建对角矩阵或 super/sub diagional matrix。
'''
与 eye 的不同之处在于:

对角线上的元素值不是都为 1 ，而是手动指定
不需要制定矩阵的形状，而是靠指定对角线上元素值来确定矩阵的形状
'''
numpy.diag(v, k=0)
'''
v : array_like

If v is a 2-D array, return a copy of its k-th diagonal. If v is a 1-D array, return a 2-D array with v on the k-th diagonal.

k : int, optional

Diagonal in question. The default is 0. Use k>0 for diagonals above the main diagonal, and k<0 for diagonals below the main diagonal.
'''

narray与Python list的主要区别是：

数组支持向量化操作，而列表不支持。

list + 2   # error
narray + 2  # every element add 2

创建数组后，您无法更改其大小。你将不得不创建一个新的数组或覆盖现有的数组。
每个数组都只有一个dtype。其中的所有项目都应该是dtype。
```
# how to change data type
narray1.astype('int')
```
等价的numpy数组占用的空间比列表的python列表少得多。

# Create an object array to hold numbers as well as strings
arr1d_obj = np.array([1, 'a'], dtype='object')

# Convert an array back to a list
arr1d_obj.tolist()

获取ndarray 的属性

ndarray 的属性：

`T`	Same as self.transpose(), except that self is returned if self.ndim < 2.
`data`	Python buffer object pointing to the start of the array’s data.
`dtype`	Data-type of the array’s elements.
`flags`	Information about the memory layout of the array.
`flat`	A 1-D iterator over the array.
`imag`	The imaginary part of the array.
`real`	The real part of the array.
`size`	Number of elements in the array. 一个数
`itemsize`	Length of one array element in bytes.
`nbytes`	Total bytes consumed by the elements of the array.
`ndim`	Number of array dimensions. 维度：二维矩阵之类
`shape`	Tuple of array dimensions. (3, 2)
`strides`	Tuple of bytes to step in each dimension when traversing an array.
`ctypes`	An object to simplify the interaction of the array with the ctypes module.
`base`	Base object if memory is from some other object.

#example 
print(ndarray.shape)

获取特定的元素

arr2

#> array([[ 1.,  2.,  3.,  4.],
#>          [ 3.,  4.,  5.,  6.],
#>          [ 5.,  6.,  7.,  8.]])

# Extract the first 2 rows and columns
arr2[:2, :2]

#> array([[ 1.,  2.],
#>        [ 3.,  4.]])

# Get the boolean output by applying the condition to each element. 
b = arr2 > 4

#> array([[False, False, False, False],
#>        [False, False,  True,  True],
#>        [ True,  True,  True,  True]], dtype=bool)

# Reverse only the row positions
arr2[::-1, ]

# Reverse the row and column positions
arr2[::-1, ::-1]

# Insert a nan and an inf
arr2[1,1] = np.nan  # not a number
arr2[1,2] = np.inf  # infinite
arr2

#> array([[  1.,   2.,   3.,   4.],
#>        [  3.,  nan,  inf,   6.],
#>        [  5.,   6.,   7.,   8.]])

# Replace nan and inf with -1. Don't use arr2 == np.nan
missing_bool = np.isnan(arr2) | np.isinf(arr2)
arr2[missing_bool] = -1  
arr2

#> array([[ 1.,  2.,  3.,  4.],
#>        [ 3., -1., -1.,  6.],
#>        [ 5.,  6.,  7.,  8.]])

ndarray 的复制

如果只是将数组的一部分分配给另一个数组，那么刚刚创建的新数组实际上是指内存中的父数组。

这意味着，如果对新数组进行更改，它也会反映到父数组中。

因此为了避免干扰父数组，需要使用copy()复制它。所有numpy数组都附带copy()方法。

# Assign portion of arr2 to arr2a. Doesn't really create a new array.
arr2a = arr2 [：2 ，：2 ]   
arr2a [：1 ，：1 ] = 100     # 100 will reflect in arr2
ARR2

#> array([[ 100.,    2.,    3.,    4.],
#>        [   3.,   -1.,   -1.,    6.],
#>        [   5.,    6.,    7.,    8.]])

# Copy portion of arr2 to arr2b
arr2b = arr2 [：2 ，：2 ]。copy （）
arr2b [：1 ，：1 ] = 101      # 101 will not reflect in arr2
ARR2

#> array([[ 100.,    2.,    3.,    4.],
#>        [   3.,   -1.,   -1.,    6.],
#>        [   5.,    6.,    7.,    8.]])

重塑和平整多维数组

ravel和flatten之间的区别在于，使用ravel创建的新数组实际上是对父数组的引用。所以，对新数组的任何更改都会影响父级。但是由于不创建副本，所以内存效率很高。

数据生成

np.tile将重复整个列表或数组n次。而np.repeat重复每个元素n次.

a = [1,2,3] 

# Repeat whole of 'a' two times
print('Tile:   ', np.tile(a, 2))

# Repeat each element of 'a' two times
print('Repeat: ', np.repeat(a, 2))

#> Tile:    [1 2 3 1 2 3]
#> Repeat:  [1 1 2 2 3 3]

生成随机数：

# Random numbers between [0,1) of shape 2,2
print(np.random.rand(2,2))

# Normal distribution with mean=0 and variance=1 of shape 2,2
print(np.random.randn(2,2))

# Random integers between [0, 10) of shape 2,2
print(np.random.randint(0, 10, size=[2,2]))

# One random number between [0,1)
print(np.random.random())

# Random numbers between [0,1) of shape 2,2
print(np.random.random(size=[2,2]))

# Pick 10 items from a given list, with equal probability
print(np.random.choice(['a', 'e', 'i', 'o', 'u'], size=10))  

# Pick 10 items from a given list with a predefined probability 'p'
print(np.random.choice(['a', 'e', 'i', 'o', 'u'], size=10, p=[0.3, .1, 0.1, 0.4, 0.1]))  # picks more o's

#> [[ 0.84  0.7 ]
#>  [ 0.52  0.8 ]]

#> [[-0.06 -1.55]
#>  [ 0.47 -0.04]]

#> [[4 0]
#>  [8 7]]

#> 0.08737272424956832

#> [[ 0.45  0.78]
#>  [ 0.03  0.74]]

#> ['i' 'a' 'e' 'e' 'a' 'u' 'o' 'e' 'i' 'u']
#> ['o' 'a' 'e' 'a' 'a' 'o' 'o' 'o' 'a' 'o']

posted @ 2018-03-13 16:46 limerick1718 阅读(74) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部