数据分析之numpy使用

使用numpy生成数字

生成的类型是ndarray类型

t1 = np.array([1,2,3,4,5])
print(t1,type(t1))        # 类型为ndarray

t2 = np.array(range(10))
print(t2)

t3 = np.arange(10)   # 相当于array+range
print(t3,t3.dtype)  # dtype 可以查看数组内的数据类型

t4 = np.arange(10,dtype="f2")   # 制定数据类型
print(t4.dtype)

t5 = np.array([random.random() for i in range(10)])  # 10个小数
print(t5)

t6 = np.round(t5,2)  # 取小数后两位
print(t6)

[1 2 3 4 5] <class 'numpy.ndarray'>
[0 1 2 3 4 5 6 7 8 9]
[0 1 2 3 4 5 6 7 8 9] int32
float16
[0.71127883 0.16178949 0.57974356 0.92394061 0.29455775 0.44950361
 0.30519271 0.23295048 0.24572958 0.85217598]
[0.71 0.16 0.58 0.92 0.29 0.45 0.31 0.23 0.25 0.85]

numpy常见的数据类型

查看数组的形状(几行几列)

一维数组

a1 = np.arange(12)
print(a1)
a1.shape

[ 0  1  2  3  4  5  6  7  8  9 10 11]

二维数组

a2 = np.array([[1,2,3],[4,5,6]])
print(a2)
a2.shape

[[1 2 3]
 [4 5 6]]

三维数组

a3 = np.array([[[1,2,3],[4,5,6]],[[7,8,9],[10,11,12]]])
print(a3)
a3.shape

[[[ 1  2  3]
  [ 4  5  6]]

 [[ 7  8  9]
  [10 11 12]]]

改变数组的形状

b1 = np.arange(12)
b1.reshape(3,4)   # 将原数组形状变成3行4列的二维数组

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

b2 = np.arange(24).reshape((2,3,4))   # 2表示块儿数 (3,4)表示每一块的形状
print(b2) 

b2.reshape(4,6)  # 将b2变形为4行6列的2维数组    reshape是有返回值的  不会改变b2原来的数据

# 将b2变形为1维数的两种方式
b2.flatten()
b2.reshape((24,))  


[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23])

数组的计算

数组和数字进行计算(广播机制)

c1 = np.arange(12)
print(c1)

# (广播机制) 当我们把数组与数字进行计算的时候 它会把计算的过程应用到数组的每一个数字 然后分别计算

c1+2   [ 2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13]
c1*2   [ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18, 20, 22]
c1/2   [0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5, 5. , 5.5]
c1/0   [nan, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf]

数组与数组之间的计算(形状相同)

c2 = np.arange(24)
c3 = np.arange(100,124)
print(c2,c3)

# 当数组中的数据长度相同时
# 两个数组中的数据一一对应进行计算
c2+c3   [100, 102, 104, 106, 108, 110, 112, 114, 116, 118, 120, 122, 124,
       126, 128, 130, 132, 134, 136, 138, 140, 142, 144, 146]

c2*c3   [   0,  101,  204,  309,  416,  525,  636,  749,  864,  981, 1100,
       1221, 1344, 1469, 1596, 1725, 1856, 1989, 2124, 2261, 2400, 2541,
       3 2684, 2829]

c2/c3   [0.        , 0.00990099, 0.01960784, 0.02912621, 0.03846154,
       0.04761905, 0.05660377, 0.06542056, 0.07407407, 0.08256881,
       0.09090909, 0.0990991 , 0.10714286, 0.11504425, 0.12280702,
       0.13043478, 0.13793103, 0.14529915, 0.15254237, 0.15966387,
       0.16666667, 0.17355372, 0.18032787, 0.18699187]

数组和形状不一样的数组进行计算

# 当他们在某一维度形状一样时是可以进行计算的
n1 = np.arange(12).reshape((4,3))
n2 = np.arange(4).reshape((4,1))
print(n1)
print(n2)
n1+n2        # n1与n2行数相同

array([[ 0,  1,  2],
       [ 4,  5,  6],
       [ 8,  9, 10],
       [12, 13, 14]])

总结:

两个不同形状的数组只要在某一维度相同就是可以计算的
　　- 如果所有维度都不相同是不可以计算的

numpy读取数据

行列转换

s1 = np.arange(24).reshape(4,6)
print(s1)

[[ 0  1  2  3  4  5]
 [ 6  7  8  9 10 11]
 [12 13 14 15 16 17]
 [18 19 20 21 22 23]]

方法一

np.loadtxt(frame,unpack=True)  # loadtxt中的unpack设置为True也是可以将从文件读取出来的数据进行行列转换的

方法二

s1.transpose()

方法三

s1.T

方法四

# 0代表x轴，1代表y轴
s1.swapaxes(1,0)  # 交换轴

numpy的索引和切片

z1 = np.arange(24).reshape(4,6)

[[ 0  1  2  3  4  5]
 [ 6  7  8  9 10 11]
 [12 13 14 15 16 17]
 [18 19 20 21 22 23]]

行操作

# 取行
print(z1[2])   # 中括号就是取行

# 连续取多行
print(z1[[1,2]])
print(z1[1:])

# 取不连续多行
print(z1[[1,3]])

列操作

# 取列
print(z1[:,1])

# 取连续多列
print(z1[:,3:])

# 取不连续多列
print(z1[:,[1,3,5]])

取行和列

# 取行和列的值  取第3行4列的值      这里注意我们在取值时用的都是索引,行和列都是从0开始, 而我们正常都是从1开始数行和列的
print(z1[2,3])


z2 = np.arange(100).reshape(10,10)
print(z2)
# 取多行和多列  取第3行到第6行  第2列到第5列的结果
print(z2[2:6,1:5])   # 取得是行和列交叉点得位置


# 取多个不相邻得值
# print(z2[[1,2],[2,4]])  # 分别取第2行的第3列  和 第3行的第5列的值   选出来的点就是(1,2)  (2,4)

print(z2[[6,7,8],[6,7,8]])  # 选出来的点是(6,6) (7,7) (8,8)

numpy中数值修改

重新赋值

# 取到值后重新赋值即可
res = np.arange(12).reshape(3,4)
# 将6修改为100
res[1,2]=100
print(res)

# 修改多个值
res[1:2]=[3,3,3,4]
print(res)

[[  0   1   2   3]
 [  4   5 100   7]
 [  8   9  10  11]]
[[ 0  1  2  3]
 [ 3  3  3  4]
 [ 8  9 10 11]]

根据范围取值

ret = np.arange(100).reshape(10,10)
print(ret)

# 根据范围取值
ret[ret<50] = 666
print(ret)


[[ 0  1  2  3  4  5  6  7  8  9]
 [10 11 12 13 14 15 16 17 18 19]
 [20 21 22 23 24 25 26 27 28 29]
 [30 31 32 33 34 35 36 37 38 39]
 [40 41 42 43 44 45 46 47 48 49]
 [50 51 52 53 54 55 56 57 58 59]
 [60 61 62 63 64 65 66 67 68 69]
 [70 71 72 73 74 75 76 77 78 79]
 [80 81 82 83 84 85 86 87 88 89]
 [90 91 92 93 94 95 96 97 98 99]]

[[666 666 666 666 666 666 666 666 666 666]
 [666 666 666 666 666 666 666 666 666 666]
 [666 666 666 666 666 666 666 666 666 666]
 [666 666 666 666 666 666 666 666 666 666]
 [666 666 666 666 666 666 666 666 666 666]
 [ 50  51  52  53  54  55  56  57  58  59]
 [ 60  61  62  63  64  65  66  67  68  69]
 [ 70  71  72  73  74  75  76  77  78  79]
 [ 80  81  82  83  84  85  86  87  88  89]
 [ 90  91  92  93  94  95  96  97  98  99]]

三元运算

tt = np.arange(100).reshape(10,10)
np.where(tt<50,0,1)   # 所有小于50的替换成0,大于50的替换成1

array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])

numpy中的clip(裁剪)

w = np.arange(100).reshape(10,10)
w.clip(50,60)     # 将小于50的替换成50,大于60的替换成60

array([[50, 50, 50, 50, 50, 50, 50, 50, 50, 50],
       [50, 50, 50, 50, 50, 50, 50, 50, 50, 50],
       [50, 50, 50, 50, 50, 50, 50, 50, 50, 50],
       [50, 50, 50, 50, 50, 50, 50, 50, 50, 50],
       [50, 50, 50, 50, 50, 50, 50, 50, 50, 50],
       [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
       [60, 60, 60, 60, 60, 60, 60, 60, 60, 60],
       [60, 60, 60, 60, 60, 60, 60, 60, 60, 60],
       [60, 60, 60, 60, 60, 60, 60, 60, 60, 60],
       [60, 60, 60, 60, 60, 60, 60, 60, 60, 60]])

数组的拼接

数组的行列转换

numpy生成随机数

numpy中的nan和inf

nan注意点

axis=0 取的是列上的每一行数据

axis=1 取得是行上得每一列数据

什么是中值？

[1,2,3,4,5]    # 中值为3
[1,2,3,4,5,6]  # 中值为 (3+4)/2 = 3.5

numpy中常用得统计函数

numpy中的删除操作

posted @ 2019-09-17 15:57 叫我大表哥阅读(663) 评论(0) 编辑收藏举报

刷新页面返回顶部

叫我大表哥