python数据分析之Numpy
安装介绍
- numpy其实就是一个一维或者多维的数组
- 安装:pip install numpy
...
Numpy中获取数组的方法
- 导入:
import numpy as np
- 基本创建数组以及创建多维数据:
- np.array():
array(object, dtype=None, copy=True, order='K', subok=False, ndmin=0) Create an array. Parameters ---------- object : array_like An array, any object exposing the array interface, an object whose __array__ method returns an array, or any (nested) sequence. dtype : data-type, optional The desired data-type for the array. If not given, then the type will be determined as the minimum type required to hold the objects in the sequence. This argument can only be used to 'upcast' the array. For downcasting, use the .astype(t) method. copy : bool, optional If true (default), then the object is copied. Otherwise, a copy will only be made if __array__ returns a copy, if obj is a nested sequence, or if a copy is needed to satisfy any of the other requirements (`dtype`, `order`, etc.). order : {'K', 'A', 'C', 'F'}, optional Specify the memory layout of the array. If object is not an array, the newly created array will be in C order (row major) unless 'F' is specified, in which case it will be in Fortran order (column major). If object is an array the following holds. ===== ========= =================================================== order no copy copy=True ===== ========= =================================================== 'K' unchanged F & C order preserved, otherwise most similar order 'A' unchanged F order if input is F and not C, otherwise C order 'C' C order C order 'F' F order F order ===== ========= =================================================== When ``copy=False`` and a copy is made for other reasons, the result is the same as if ``copy=True``, with some exceptions for `A`, see the Notes section. The default order is 'K'. subok : bool, optional If True, then sub-classes will be passed-through, otherwise the returned array will be forced to be a base-class array (default). ndmin : int, optional Specifies the minimum number of dimensions that the resulting array should have. Ones will be pre-pended to the shape as needed to meet this requirement. Returns ------- out : ndarray An array object satisfying the specified requirements. See Also -------- empty, empty_like, zeros, zeros_like, ones, ones_like, full, full_like Notes ----- When order is 'A' and `object` is an array in neither 'C' nor 'F' order, and a copy is forced by a change in dtype, then the order of the result is not necessarily 'C' as expected. This is likely a bug. Examples -------- >>> np.array([1, 2, 3]) array([1, 2, 3]) Upcasting: >>> np.array([1, 2, 3.0]) array([ 1., 2., 3.]) More than one dimension: >>> np.array([[1, 2], [3, 4]]) array([[1, 2], [3, 4]]) Minimum dimensions 2: >>> np.array([1, 2, 3], ndmin=2) array([[1, 2, 3]]) Type provided: >>> np.array([1, 2, 3], dtype=complex) array([ 1.+0.j, 2.+0.j, 3.+0.j]) Data-type consisting of more than one element: >>> x = np.array([(1,2),(3,4)],dtype=[('a','<i4'),('b','<i4')]) >>> x['a'] array([1, 3]) Creating an array from sub-classes: >>> np.array(np.mat('1 2; 3 4')) array([[1, 2], [3, 4]]) >>> np.array(np.mat('1 2; 3 4'), subok=True) matrix([[1, 2], [3, 4]]) Type: builtin_function_or_method
- array(object, dtype=None, copy=True, order='K', subok=False, ndmin=0): 参数解析:
- object: 传入数据对象,一般传入列表对象,列表中的内容的type尽量保持一致;
- dtype:
- copy:
- order:
- subok:
- ndmin:
- 创建一维数组:
# 创建一维数组 a = np.array([1, 2, 3, 4, 5]) print(a) print(type(a)) """ [1 2 3 4 5] <class 'numpy.ndarray'> """
- 创建二维数据:
# 创建二维数组 a2 = np.array([[1, 2, 3], [4, 5, 6]]) print(a2) print(type(a2)) """ [[1 2 3] [4 5 6]] <class 'numpy.ndarray'> """
- 注意: numpy默认ndarray的实例对象中所有元素的类型是相同的
如果传进来的列表中包含不同的类型,则统一为同一类型,优先级:str>float>in
- numpy.ndarray实例对象中的属性:
- np.shape 显示数组形状;
- shape的使用:shape返回的是ndarray的数组形状。返回值的数据个数表示的就是数组的维度;
- 示例:
a = np.array([1, 2, 3, 4, 5]) print(a.shape) """ (5,) """ a2 = np.array([[1, 2, 3], [4, 5, 6]]) print(a2.shape) """ (2, 3) """
- np.ndim 显示维度
- np.size 数组总长度
- np.dtype 数组中的数据类型
- 通过matplotlib.pyplot获取一组数组对象;
- 代码示例:
import numpy as np import matplotlib.pyplot as plt img_array = plt.imread("./icon.png") # 打印数组 print(img_array) # 查看数组类型 print(type(img_array)) # 查看数组形状 print(img_array.shape) # 展示图片 plt.imshow(img_array)
- 效果示例:使用Jupyter
- 使用np的routines函数创建:
- np.ones(shape, dtype=None, order='C')
- shape: 传入数组形状 一般为 元组或列表;
- return: 根据数组形状生成对应维度的数组,值默认为1;
- 示例:
a1 = np.ones([2]) print(a1) print("\n") a1 = np.ones([2,1]) print(a1) print("\n") a1 = np.ones((2,)) print(a1) print("\n") a1 = np.ones((2, 3)) print(a1) print("\n") """ [1. 1.] [[1.] [1.]] [1. 1.] [[1. 1. 1.] [1. 1. 1.]] """
- np.zeros(shape, dtype=None, order='C')
- return: 返回的数组的值为0,用法与 ones 一样
- np.full(shape, fill_value, dtype=None, order='C')
- shape:数组形状
- fill_value:数组中的默认值;
- return:根据数组形状生成对应维度的数组
- np.linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None) 等差数列
- start:开始的值;
- stop:结束的值;
- num:结束的值与开始的值之间等差的50个数
- return:返回一维数组的等差数列
- 示例:
a1 = np.linspace(0, 2) print(a1) print(a1.size) """ [0. 0.04081633 0.08163265 0.12244898 0.16326531 0.20408163 0.24489796 0.28571429 0.32653061 0.36734694 0.40816327 0.44897959 0.48979592 0.53061224 0.57142857 0.6122449 0.65306122 0.69387755 0.73469388 0.7755102 0.81632653 0.85714286 0.89795918 0.93877551 0.97959184 1.02040816 1.06122449 1.10204082 1.14285714 1.18367347 1.2244898 1.26530612 1.30612245 1.34693878 1.3877551 1.42857143 1.46938776 1.51020408 1.55102041 1.59183673 1.63265306 1.67346939 1.71428571 1.75510204 1.79591837 1.83673469 1.87755102 1.91836735 1.95918367 2. ] 50 """
- np.arange([start, ]stop, [step, ]dtype=None)
- arange 的用法等同于 range的用法,开始的值,结束的值,步长
- return:一维数组
- np.random.randint(low, high=None, size=None, dtype='l')
- low:最小的值
- high:最大的值
- size:数组的样子,值为元组或列表
- return: 从给出的范围中随机按照数组的样子生成数组
- np.random.randn(d0, d1, ..., dn)
- 传入 几个值,则生成 一个 个数维的数组 数组中的值,根据每个位置传入的值随机生成
- return 随机生成多维数组
- np.random.random(size=None)
- 生成0到1的随机数,左闭右开
- np.random.seed():
- 让随机生成的不在随机
Numpy基础操作
- 索引:
- 等同于Python中的列表操作
- 可以根据索引取值,以及修改数据
- 切片:
- 基本操作:
arr = np.random.randint(0, 100, size=(6, 7)) # 获取二维数组的前两行数据 print(arr[0:2]) # 获取二维数组的前两行的前两列数据 print(arr[0:2, 0:2]) # 获取二维数组前两列数据 print(arr[:, 0:2]) # 将数组的行倒序 print(arr[::-1]) # 列倒序 print(arr[:, ::-1]) # 全部倒序 print(arr[::-1, ::-1])
- 变形:arr.reshape()
- 参数是一个元组;
- 将一维数组变成多维数组
a1 = np.linspace(0,20,16) print(a1) a2 = a1.reshape((2,8)) print(a2)
- 将多维数组变成一维数组
???
- 级联:np.concatenate()
- 注意:
级联的参数是列表:一定要加中括号或小括号
维度必须相同
形状相符:在维度保持一致的前提下,如果进行横向(axis=1)级联,必须保证进行级联的数组行数保持一致。如果进行纵向(axis=0)级联,必须保证进行级联的数组列数保持一致。
可通过axis参数改变级联的方向
- 切分:
- 与级联类似,三个函数完成切分工作:
- np.split(arr,行/列号,轴):参数2是一个列表类型
- np.vsplit
- np.hsplit
- 副本:
- 所有赋值运算不会为ndarray的任何元素创建副本。对赋值后的对象的操作也对原来的对象生效。
Numpy中的聚合操作
- 求和 np.sum
- 求最大/最小值:
- 平均值:
- 其他操作:
Function Name NaN-safe Version Description np.sum np.nansum Compute sum of elements np.prod np.nanprod Compute product of elements np.mean np.nanmean Compute mean of elements np.std np.nanstd Compute standard deviation np.var np.nanvar Compute variance np.min np.nanmin Find minimum value np.max np.nanmax Find maximum value np.argmin np.nanargmin Find index of minimum value np.argmax np.nanargmax Find index of maximum value np.median np.nanmedian Compute median of elements np.percentile np.nanpercentile Compute rank-based statistics of elements np.any N/A Evaluate whether any elements are true np.all N/A Evaluate whether all elements are true np.power 幂运算
广播机制
ndarray广播机制的三条规则:缺失维度的数组将维度补充为进行运算的数组的维度。缺失的数组元素使用已有元素进行补充。
规则一:为缺失的维度补1(进行运算的两个数组之间的维度只能相差一个维度)
规则二:缺失元素用已有值填充
规则三:缺失维度的数组只能有一行或者一列
排序
- 快速排序
np.sort()与ndarray.sort()都可以,但有区别:
-
- np.sort()不改变输入
- ndarray.sort()本地处理,不占用空间,但改变输入
- 部分排序
np.partition(a,k)
有的时候我们不是对全部数据感兴趣,我们可能只对最小或最大的一部分感兴趣。
-
- 当k为正时,我们想要得到最小的k个数
- 当k为负时,我们想要得到最大的k个数