numpy 教程

本文整理自:Machine Learning Plus

Numpy是Python中最基本和最强大的科学计算和数据处理软件包。

numpy提供了优秀的ndarray对象,n维数组的简称。

在'ndarray'对象中,又名'数组',可以存储多个相同数据类型的项目。正是这些围绕数组对象的工具使numpy便于执行数学和数据操作。

你可能会想,'我可以将数字和其他对象存储在Python列表中,并通过列表解析,for循环等方式进行各种计算和操作。我需要一个numpy数组用于什么?'

那么,使用numpy数组相对于列表有非常显着的优势。

如何创建numpy 数组

参考 numpy笔记(一):多维数组的创建

1. 从已有数据中创建:

 1 # create from list
 2 a = np.array([1, 2, 3, 4])
 3 print 'a is:', a
 4 
 5 #create from tuple
 6 '''
 7 tuple can't change after initialization;
 8 tuple1 = (1,)
 9 number1 = (1)
10 '''
11 b = np.array((1, 2, 3, 4))
12 print 'b is:', b
13 
14 #load from file
15 #text saved by this method is not readable
16 from tempfile import TemporaryFile
17 
18 origin_array = np.array([1, 2, 3, 4])
19 np.save('/tmp/array', origin_array)
20 
21 array_from_file = np.load('/tmp/array.npy')
22 print array_from_file
23 
24 #text saved by this method is readable
25 origin_array = np.array([1, 2, 3, 4])
26 np.savetxt('/tmp/array.txt', origin_array)
27 
28 array_from_file = np.loadtxt('/tmp/array.txt')
29 print array_from_file
30 
31 #read from string
32 array = np.fromstring('1 2 3 4', dtype=float, sep=' ')
33 #best practice is explicitly indicate dtye

2. 创建矩阵

#一维数组
#创建给定形状的多维数组并将数组中所有元素填充为 1
print np.ones((3, 4))

#创建给定形状的多维数组并将数组中所有元素填充为 0
print np.zeros((3, 4))

#创建给定形状的多维数组,但不进行初始化,得到的多维数组中的元素值是不确定的
print np.empty((3, 4))

#创建给定形状的多维数组并将数组中所有元素填充为指定值
print np.full((3, 4), 17)

#从numerical range 创建多维数组
#创建一个一维的数组, arange(start, stop[, step])
print np.arange(10)
print np.arange(9, -1, -1)

#给定一个区间,返回等差数列
print np.linspace(start, stop, num = 50, endpoint=True..是否加上最后一个数, retstep=False..是否返回间距, dtype=None)

#给定一个区间,返回等比数列
print np.logspace(start, stop, num=50, endpoint=True, base=10.0, dtype=None)
## the sequence starts at base ** start (base to the power of start) and ends with base ** stop (see endpoint below).

#创建矩阵(二维数组)
#创建一个对角矩阵或者 super/sub diagional square matrix,且所指定的对角线上的元素值为 1.
numpy.eye(N, M=None, k=0, dtype=<type 'float'>)
#k : int, optional :Index of the diagonal: 0 (the default) refers to the main diagonal, a positive value refers to an upper diagonal, and a negative value to a lower diagonal.

#创建单位矩阵
numpy.identity(n, dtype=None)[source]

#创建对角矩阵或 super/sub diagional matrix。
'''
与 eye 的不同之处在于:

对角线上的元素值不是都为 1 ,而是手动指定
不需要制定矩阵的形状,而是靠指定对角线上元素值来确定矩阵的形状
'''
numpy.diag(v, k=0)
'''
v : array_like

If v is a 2-D array, return a copy of its k-th diagonal. If v is a 1-D array, return a 2-D array with v on the k-th diagonal.

k : int, optional

Diagonal in question. The default is 0. Use k>0 for diagonals above the main diagonal, and k<0 for diagonals below the main diagonal.
'''

narray与Python list的主要区别是:

  1. 数组支持向量化操作,而列表不支持。
    list + 2   # error
    narray + 2  # every element add 2
  2. 创建数组后,您无法更改其大小。你将不得不创建一个新的数组或覆盖现有的数组。
  3. 每个数组都只有一个dtype。其中的所有项目都应该是dtype。
    # how to change data type
    narray1.astype('int')

  4. 等价的numpy数组占用的空间比列表的python列表少得多。
 
# Create an object array to hold numbers as well as strings
arr1d_obj = np.array([1, 'a'], dtype='object')

# Convert an array back to a list
arr1d_obj.tolist()

获取ndarray 的属性

ndarray 的属性:

T Same as self.transpose(), except that self is returned if self.ndim < 2.
data Python buffer object pointing to the start of the array’s data.
dtype Data-type of the array’s elements.
flags Information about the memory layout of the array.
flat A 1-D iterator over the array.
imag The imaginary part of the array.
real The real part of the array.
size Number of elements in the array.  一个数
itemsize Length of one array element in bytes.
nbytes Total bytes consumed by the elements of the array.
ndim Number of array dimensions. 维度:二维矩阵 之类
shape Tuple of array dimensions. (3, 2)
strides Tuple of bytes to step in each dimension when traversing an array.
ctypes An object to simplify the interaction of the array with the ctypes module.
base Base object if memory is from some other object.
#example 
print(ndarray.shape)

获取特定的元素

arr2

#> array([[ 1.,  2.,  3.,  4.],
#>          [ 3.,  4.,  5.,  6.],
#>          [ 5.,  6.,  7.,  8.]])

 

# Extract the first 2 rows and columns
arr2[:2, :2]
#> array([[ 1.,  2.],
#>        [ 3.,  4.]])
# Get the boolean output by applying the condition to each element. 
b = arr2 > 4

#> array([[False, False, False, False],
#>        [False, False,  True,  True],
#>        [ True,  True,  True,  True]], dtype=bool)

# Reverse only the row positions
arr2[::-1, ]
# Reverse the row and column positions
arr2[::-1, ::-1]
# Insert a nan and an inf
arr2[1,1] = np.nan  # not a number
arr2[1,2] = np.inf  # infinite
arr2

#> array([[  1.,   2.,   3.,   4.],
#>        [  3.,  nan,  inf,   6.],
#>        [  5.,   6.,   7.,   8.]])

# Replace nan and inf with -1. Don't use arr2 == np.nan
missing_bool = np.isnan(arr2) | np.isinf(arr2)
arr2[missing_bool] = -1  
arr2

#> array([[ 1.,  2.,  3.,  4.],
#>        [ 3., -1., -1.,  6.],
#>        [ 5.,  6.,  7.,  8.]])

ndarray 的复制

如果只是将数组的一部分分配给另一个数组,那么刚刚创建的新数组实际上是指内存中的父数组。

这意味着,如果对新数组进行更改,它也会反映到父数组中。

因此为了避免干扰父数组,需要使用copy()复制它。所有numpy数组都附带copy()方法。

# Assign portion of arr2 to arr2a. Doesn't really create a new array.
arr2a = arr2 [:2 ,:2 ]   
arr2a [:1 ,:1 ] = 100     # 100 will reflect in arr2
ARR2

#> array([[ 100.,    2.,    3.,    4.],
#>        [   3.,   -1.,   -1.,    6.],
#>        [   5.,    6.,    7.,    8.]])

# Copy portion of arr2 to arr2b
arr2b = arr2 [:2 ,:2 ]。copy ()
arr2b [:1 ,:1 ] = 101      # 101 will not reflect in arr2
ARR2

#> array([[ 100.,    2.,    3.,    4.],
#>        [   3.,   -1.,   -1.,    6.],
#>        [   5.,    6.,    7.,    8.]])

重塑和平整多维数组

ravel和flatten之间的区别在于,使用ravel创建的新数组实际上是对父数组的引用。所以,对新数组的任何更改都会影响父级。但是由于不创建副本,所以内存效率很高。

 

数据生成

np.tile将重复整个列表或数组n次。而np.repeat重复每个元素n次.

a = [1,2,3] 

# Repeat whole of 'a' two times
print('Tile:   ', np.tile(a, 2))

# Repeat each element of 'a' two times
print('Repeat: ', np.repeat(a, 2))

#> Tile:    [1 2 3 1 2 3]
#> Repeat:  [1 1 2 2 3 3]

生成随机数:

# Random numbers between [0,1) of shape 2,2
print(np.random.rand(2,2))

# Normal distribution with mean=0 and variance=1 of shape 2,2
print(np.random.randn(2,2))

# Random integers between [0, 10) of shape 2,2
print(np.random.randint(0, 10, size=[2,2]))

# One random number between [0,1)
print(np.random.random())

# Random numbers between [0,1) of shape 2,2
print(np.random.random(size=[2,2]))

# Pick 10 items from a given list, with equal probability
print(np.random.choice(['a', 'e', 'i', 'o', 'u'], size=10))  

# Pick 10 items from a given list with a predefined probability 'p'
print(np.random.choice(['a', 'e', 'i', 'o', 'u'], size=10, p=[0.3, .1, 0.1, 0.4, 0.1]))  # picks more o's

#> [[ 0.84  0.7 ]
#>  [ 0.52  0.8 ]]

#> [[-0.06 -1.55]
#>  [ 0.47 -0.04]]

#> [[4 0]
#>  [8 7]]

#> 0.08737272424956832

#> [[ 0.45  0.78]
#>  [ 0.03  0.74]]

#> ['i' 'a' 'e' 'e' 'a' 'u' 'o' 'e' 'i' 'u']
#> ['o' 'a' 'e' 'a' 'a' 'o' 'o' 'o' 'a' 'o']

 

posted @ 2018-03-13 16:46  limerick1718  阅读(74)  评论(0编辑  收藏  举报