利用Python进行数据分析_Numpy_基础_2
Numpy数据类型包括:
int8、uint8、int16、uint16、int32、uint32、int64、uint64、float16、float32、float64、float128、complex64、complex128、complex256、bool、object、string_、unicode_
astype
显示转换数组类型的方法
例如:
NumPy数组的索引和切片
索引
和python列表差不多,基本上没啥区别
切片
NumPy数组的切片出来的数值改变,就会改变NumPy数组的源数组的值。NumPy数组的切片是源数组的视图,而不是新复制出来的一个数组。从下面的例子,我们可以看到arr[1,1]=0 ,arr的数组变化了,data数组对应位置的数值也变化了。
In [101]: data = np.random.randn(4,4) In [102]: data Out[102]: array([[-1.68867271, -0.89369286, -0.0288363 , 0.73855122], [-0.13084603, 0.43972144, 0.73542583, 1.99925332], [ 0.04291022, -0.91963212, 3.09214837, -0.6070068 ], [-0.01416294, -1.46576298, 1.42196278, 0.84758994]]) In [103]: arr = data[2:,1:] In [104]: arr Out[104]: array([[-0.91963212, 3.09214837, -0.6070068 ], [-1.46576298, 1.42196278, 0.84758994]]) In [105]: arr = 0 In [106]: data Out[106]: array([[-1.68867271, -0.89369286, -0.0288363 , 0.73855122], [-0.13084603, 0.43972144, 0.73542583, 1.99925332], [ 0.04291022, -0.91963212, 3.09214837, -0.6070068 ], [-0.01416294, -1.46576298, 1.42196278, 0.84758994]]) In [107]: arr Out[107]: 0 In [108]: arr = data[2:,1:] In [109]: arr Out[109]: array([[-0.91963212, 3.09214837, -0.6070068 ], [-1.46576298, 1.42196278, 0.84758994]]) In [110]: arr == 0 Out[110]: array([[False, False, False], [False, False, False]], dtype=bool) In [111]: arr Out[111]: array([[-0.91963212, 3.09214837, -0.6070068 ], [-1.46576298, 1.42196278, 0.84758994]]) In [112]: arr[1,1]=0 In [113]: arr Out[113]: array([[-0.91963212, 3.09214837, -0.6070068 ], [-1.46576298, 0. , 0.84758994]]) In [114]: data Out[114]: array([[-1.68867271, -0.89369286, -0.0288363 , 0.73855122], [-0.13084603, 0.43972144, 0.73542583, 1.99925332], [ 0.04291022, -0.91963212, 3.09214837, -0.6070068 ], [-0.01416294, -1.46576298, 0. , 0.84758994]]) In [115]:
如果要复制NumPy数组的切片,则可以使用显示复制方法copy()
In [116]: data Out[116]: array([[-1.68867271, -0.89369286, -0.0288363 , 0.73855122], [-0.13084603, 0.43972144, 0.73542583, 1.99925332], [ 0.04291022, -0.91963212, 3.09214837, -0.6070068 ], [-0.01416294, -1.46576298, 0. , 0.84758994]]) In [117]: arr = data In [118]: arr Out[118]: array([[-1.68867271, -0.89369286, -0.0288363 , 0.73855122], [-0.13084603, 0.43972144, 0.73542583, 1.99925332], [ 0.04291022, -0.91963212, 3.09214837, -0.6070068 ], [-0.01416294, -1.46576298, 0. , 0.84758994]]) In [119]: arr = np.copy(data) In [120]: arr Out[120]: array([[-1.68867271, -0.89369286, -0.0288363 , 0.73855122], [-0.13084603, 0.43972144, 0.73542583, 1.99925332], [ 0.04291022, -0.91963212, 3.09214837, -0.6070068 ], [-0.01416294, -1.46576298, 0. , 0.84758994]])
布尔类型索引
假设每个字符串对应data数组一行数据。需要注意布尔型数组的长度必须与被索引的轴长度一致。
通过布尔型索引查找数组数值的方式如下:
In [140]: names = np.array(['aaa','bbb','ccc','ddd','eee','fff'])
In [141]: data = np.random.randn(6,4)
In [142]: names
Out[142]:
array(['aaa', 'bbb', 'ccc', 'ddd', 'eee', 'fff'],
dtype='<U3')
In [143]: data
Out[143]:
array([[ 0.49394026, -0.65887621, -0.26946242, 0.22042355],
[-1.11606179, -1.94945158, -0.4866134 , 0.67712409],
[-2.33792045, 0.01639887, -0.46020647, 0.84180777],
[-1.99622938, 1.937877 , -0.17134376, 0.56915872],
[ 1.50980905, 0.07244016, -0.95650922, 1.23508517],
[ 0.74706519, -0.03149619, -0.38235363, 0.69786257]])
In [144]: names == 'aaa'
Out[144]: array([ True, False, False, False, False, False], dtype=bool)
In [145]: data[names=='aaa']
Out[145]: array([[ 0.49394026, -0.65887621, -0.26946242, 0.22042355]])
In [146]: names =='ccc'
Out[146]: array([False, False, True, False, False, False], dtype=bool)
In [147]: data[names=='ccc']
Out[147]: array([[-2.33792045, 0.01639887, -0.46020647, 0.84180777]])
布尔数组索引结合切片进行查找数组的数值:
In [148]: data[names=='aaa',2] Out[148]: array([-0.26946242]) In [149]: data[names=='aaa',2:] Out[149]: array([[-0.26946242, 0.22042355]]) In [150]: data[names=='aaa',1:] Out[150]: array([[-0.65887621, -0.26946242, 0.22042355]])
反向查找
In [155]: names !='aaa' Out[155]: array([False, True, True, True, True, True], dtype=bool) In [156]: data[names!='aaa'] Out[156]: array([[-1.11606179, -1.94945158, -0.4866134 , 0.67712409], [-2.33792045, 0.01639887, -0.46020647, 0.84180777], [-1.99622938, 1.937877 , -0.17134376, 0.56915872], [ 1.50980905, 0.07244016, -0.95650922, 1.23508517], [ 0.74706519, -0.03149619, -0.38235363, 0.69786257]])
组合查找
In [171]: mask = (names == 'aaa')|(names == 'ccc') In [172]: mask Out[172]: array([ True, False, True, False, False, False], dtype=bool) In [173]: data[mask] Out[173]: array([[ 0.49394026, -0.65887621, -0.26946242, 0.22042355], [-2.33792045, 0.01639887, -0.46020647, 0.84180777]])
花式索引
其实就是利用整数列表或数组进行索引查找。花式索引与数组切片不同,花式索引会将数据复制到新的数组。
整数列表
创建一个二维数组arr,然后传入[3,1],意思就是按 arr [3,:]、arr[1,:]的顺序显示出来。
In [203]: arr = np.array(([1,2,3,4],[2,3,4,5],[3,4,5,6],[7,8,9,10])) In [204]: arr Out[204]: array([[ 1, 2, 3, 4], [ 2, 3, 4, 5], [ 3, 4, 5, 6], [ 7, 8, 9, 10]]) In [205]: arr[[3,1]] Out[205]: array([[ 7, 8, 9, 10], [ 2, 3, 4, 5]])
传入多个整数数组
一次传入多个整数数组,返回的是一个一维数组。
数组转置对轴对换
数组转置,是指将原数组A的行与列交换得到的一个新数组。
比如:
的转置是,的转置是
方法1:T
In [227]: arr = np.random.randn(10) In [228]: arr Out[228]: array([-1.42853867, 1.54300781, -0.74079757, -1.20272388, -1.00416459, -0.59571731, 1.16744662, 0.05739806, 1.01660691, -0.84625494]) In [229]: arr.T Out[229]: array([-1.42853867, 1.54300781, -0.74079757, -1.20272388, -1.00416459, -0.59571731, 1.16744662, 0.05739806, 1.01660691, -0.84625494]) In [230]: arr = np.random.randn(3,5) In [231]: arr Out[231]: array([[ 1.36114118, 0.48455027, 0.64847485, 0.01691785, -0.03622465], [-2.31302164, 1.14992892, -1.47836923, 1.08003907, -1.33663009], [-0.38005499, 1.3517217 , 2.52024026, -0.3576492 , 0.46016645]]) In [232]: arr.T Out[232]: array([[ 1.36114118, -2.31302164, -0.38005499], [ 0.48455027, 1.14992892, 1.3517217 ], [ 0.64847485, -1.47836923, 2.52024026], [ 0.01691785, 1.08003907, -0.3576492 ], [-0.03622465, -1.33663009, 0.46016645]])
方法2:transpose
三维数组 arr:4个3*4的数组
In [275]: arr = np.arange(48).reshape(4,3,4)
In [276]: arr
Out[276]:
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]],
[[24, 25, 26, 27],
[28, 29, 30, 31],
[32, 33, 34, 35]],
[[36, 37, 38, 39],
[40, 41, 42, 43],
[44, 45, 46, 47]]])
transpose
参数的真正意义在于这个shape
元组的索引(轴编号)。
In [278]: arr.shape Out[278]: (4, 3, 4)
arr数组的索引(轴编号):0、1、2
下面是按索引 2、0、1进行对换
In [277]: arr.transpose(2,0,1) Out[277]: array([[[ 0, 4, 8], [12, 16, 20], [24, 28, 32], [36, 40, 44]], [[ 1, 5, 9], [13, 17, 21], [25, 29, 33], [37, 41, 45]], [[ 2, 6, 10], [14, 18, 22], [26, 30, 34], [38, 42, 46]], [[ 3, 7, 11], [15, 19, 23], [27, 31, 35], [39, 43, 47]]])
然后,我们再按(轴编号)0、1、2 对换回到原来的样子
In [279]: arr.transpose(0,1,2) Out[279]: array([[[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]], [[12, 13, 14, 15], [16, 17, 18, 19], [20, 21, 22, 23]], [[24, 25, 26, 27], [28, 29, 30, 31], [32, 33, 34, 35]], [[36, 37, 38, 39], [40, 41, 42, 43], [44, 45, 46, 47]]])
方法3:swapaxes
swapaxes返回的是源数组的视图。
相比于transpose是需要传入一个索引元组(轴编号),swapaxes只需要一对索引元组(轴编号)。
swapaxes只需要一对索引元组(轴编号)。
In [283]: arr.swapaxes(2,1) Out[283]: array([[[ 0, 4, 8], [ 1, 5, 9], [ 2, 6, 10], [ 3, 7, 11]], [[12, 16, 20], [13, 17, 21], [14, 18, 22], [15, 19, 23]], [[24, 28, 32], [25, 29, 33], [26, 30, 34], [27, 31, 35]], [[36, 40, 44], [37, 41, 45], [38, 42, 46], [39, 43, 47]]])
本文来自博客园,作者:江雪独钓翁,转载请注明原文链接:https://www.cnblogs.com/zhouwp/p/8425164.html