NumPy
参见维基百科NumPy
NumPy
Type: module
Provides
- An array object of arbitrary homogeneous items
- Fast mathematical operations over arrays
- Linear Algebra, Fourier Transforms, Random Number Generation
How to use the documentation
Documentation is available in two forms: docstrings provided
with the code, and a loose standing reference guide, available from
the NumPy homepage
http://www.scipy.org_.
We recommend exploring the docstrings using
IPython
http://ipython.scipy.org_, an advanced Python shell with
TAB-completion and introspection capabilities.
For some objects, np.info(obj)
may provide additional help(用来获取函数,类,模块的一些相关信息). This is
particularly true if you see the line "Help on ufunc object:" at the top
of the help() page. Ufuncs are implemented in C, not Python, for speed.
The native Python help() does not know how to view their help, but our
np.info() function does.
To search for documents containing a keyword, do::
import numpy as np
np.lookfor('keyword')
General-purpose documents like a glossary and help on the basic concepts
of numpy are available under the doc
sub-module::
from numpy import doc
help(doc)
Available subpackages
---------------------
doc
Topical documentation on broadcasting, indexing, etc.
lib
Basic functions used by several sub-packages.
random
Core Random Tools
linalg
Core Linear Algebra Tools
fft
Core FFT routines
polynomial
Polynomial tools
testing
NumPy testing tools
f2py
Fortran to Python Interface Generator.
distutils
Enhancements to distutils with support for
Fortran compilers support and more.
Utilities
---------
test
Run numpy unittests
show_config
Show numpy build configuration
dual
Overwrite certain functions with high-performance Scipy tools
matlib
Make everything matrices.
__version__
NumPy version string
下面举几个例子:
import numpy as np
help(doc)
help(doc.creation)
doc.basics?
help(np.lib)
ndarray
预览
翻译自Quickstart tutorial¶
NumPy的主要的对象是同类的多维数组
(homogeneous multidimensional array)。 NumPy的维度(dimensions)被称为轴(axes)
。 轴的数字代表rank
。
例如,在三维空间中一个坐标(coordinates)为[1, 2, 1]
的点是一维数组,axis的长度(length)是3。而
[[ 1., 0., 0.],
[ 0., 1., 2.]]
的rank是 2 (此数组是2-dimensional)。它的第一个维度(dimension (axis)
)的长度是 2, 第二个维度长度是3。
NumPy的array类被称为ndarray
。
ndarray.ndim
: 数组的坐标轴(或轴或维度)(axes (dimensions))的个数。ndarray.shape
: 数组的维度(dimensions),是由每个维度的length
组成的整数元组。
对于一个n行m列的矩阵(matrix), shape便是(n,m)
。ndarray.size
: 数组的元素(elements)的总数,等于shape
的元素的积。ndarray.dtype
:一个描述数组的元素的类型的对象。ndarray.itemsize
:数组的每个元素的二进制表示的大小。 例如,元素的类型为float64
的数组有 8 (=64/8)个itemsize
,类型为complex32
是itemsize 4 (=32/8)
。ndarray.data
:the buffer containing the actual elements of the array. Normally, we won’t need to use this attribute because we will access the elements in an array using indexing facilities.
下面有一些示例:
z = np.array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
t = np.array([z, 2 * z + 1])
t
array([[[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]],
[[ 1, 3, 5, 7, 9],
[11, 13, 15, 17, 19],
[21, 23, 25, 27, 29]]])
print('z.ndim = ', z.ndim)
print('t.ndim = ', t.ndim)
z.ndim = 2
t.ndim = 3
print('z.shape = ',z.shape)
print('t.shape = ',t.shape)
z.shape = (3, 5)
t.shape = (2, 3, 5)
print('z.size = ',z.size)
print('t.size = ',t.size)
z.size = 15
t.size = 30
t.dtype.name
'int32'
t.itemsize
4
type(t)
numpy.ndarray
ndarray
索引
z
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
z[0] # 第一行元素
array([0, 1, 2, 3, 4])
z[0, 2] # 第一行的第三个元素
2
t[0]
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
t[0][2]
array([10, 11, 12, 13, 14])
t[0, 2]
array([10, 11, 12, 13, 14])
t[0, 2, 3]
13
t[0, :2, 2:4]
array([[2, 3],
[7, 8]])
对于列表
e = [1, 2, 3, 4]
p = [e, e]
p[0][0]
1
p[0,0] # 这种语法是错误的
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-300-d527d1725556> in <module>()
----> 1 p[0,0] # 这种语法是错误的
TypeError: list indices must be integers or slices, not tuple
ndarray
支持向量化运算
作用于每个元素的运算
z
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
z.sum() # 所有元素的sum
105
z.sum(axis = 0) # sum along axis 0, i.e. column-wise sum,相当于矩阵的行向量
array([15, 18, 21, 24, 27])
z.sum(axis = 1) # 相当于矩阵的列向量
array([10, 35, 60])
z.std() # 所有元素标准差
4.3204937989385739
z.std(axis = 0)
array([ 4.0824829, 4.0824829, 4.0824829, 4.0824829, 4.0824829])
z.cumsum() # 所有元素的累积和
array([ 0, 1, 3, 6, 10, 15, 21, 28, 36, 45, 55, 66, 78,
91, 105], dtype=int32)
z * 2 # 类似矩阵的数量乘法
array([[ 0, 2, 4, 6, 8],
[10, 12, 14, 16, 18],
[20, 22, 24, 26, 28]])
z ** 2
array([[ 0, 1, 4, 9, 16],
[ 25, 36, 49, 64, 81],
[100, 121, 144, 169, 196]], dtype=int32)
np.sqrt(z)
array([[ 0. , 1. , 1.41421356, 1.73205081, 2. ],
[ 2.23606798, 2.44948974, 2.64575131, 2.82842712, 3. ],
[ 3.16227766, 3.31662479, 3.46410162, 3.60555128, 3.74165739]])
y = np.arange(10) # 类似 Python 的 range, 但是回传 array
y
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
a = np.array([1, 2, 3, 6])
b = np.linspace(0, 2, 4) # 建立一個array, 在0与2的范围之间4等分
c = a - b
c
array([ 1. , 1.33333333, 1.66666667, 4. ])
# 全域方法
a = np.linspace(-np.pi, np.pi, 100)
b = np.sin(a)
c = np.cos(a)
b = np.array([1,2,3,4])
a = np.array([4,5,6,7])
print('a + b = ', a + b)
print('a - b = ', a - b)
print('a * b = ', a * b)
print('a / b = ', a / b)
print('a // b = ', a // b)
print('a % b = ', a % b)
a + b = [ 5 7 9 11]
a - b = [3 3 3 3]
a * b = [ 4 10 18 28]
a / b = [ 4. 2.5 2. 1.75]
a // b = [4 2 2 1]
a % b = [0 1 0 3]
对于非数值型数组
a = np.array(list('python'))
a
array(['p', 'y', 't', 'h', 'o', 'n'],
dtype='<U1')
b = np.array(list('numpy'))
b
array(['n', 'u', 'm', 'p', 'y'],
dtype='<U1')
a + b
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-153-f96fb8f649b6> in <module>()
----> 1 a + b
TypeError: ufunc 'add' did not contain a loop with signature matching types dtype('<U1') dtype('<U1') dtype('<U1')
list(a) + list(b)
['p', 'y', 't', 'h', 'o', 'n', 'n', 'u', 'm', 'p', 'y']
线性代数
from numpy.random import rand
from numpy.linalg import solve, inv
a = np.array([[1, 2, 3], [3, 4, 6.7], [5, 9.0, 5]])
a.transpose()
array([[ 1. , 3. , 5. ],
[ 2. , 4. , 9. ],
[ 3. , 6.7, 5. ]])
inv(a)
array([[-2.27683616, 0.96045198, 0.07909605],
[ 1.04519774, -0.56497175, 0.1299435 ],
[ 0.39548023, 0.05649718, -0.11299435]])
b = np.array([3, 2, 1])
solve(a, b) # 解方程式 ax = b
array([-4.83050847, 2.13559322, 1.18644068])
c = rand(3, 3) # 建立一個 3x3 随机矩阵
c
array([[ 0.98539238, 0.62602057, 0.63592577],
[ 0.84697864, 0.86223698, 0.20982139],
[ 0.15532627, 0.53992238, 0.65312854]])
np.dot(a, c) # 矩阵相乘
array([[ 3.14532847, 3.97026167, 3.01495417],
[ 7.38477771, 8.94448958, 7.1230241 ],
[ 13.32640097, 13.58984759, 8.33366406]])
数组的创建
参考 np.doc.creation?
There are 5 general mechanisms for creating arrays:
- Conversion from other Python structures (e.g., lists, tuples)
- Intrinsic numpy array array creation objects (e.g., arange, ones, zeros,
etc.) - Reading arrays from disk, either from standard or custom formats
- Creating arrays from raw bytes through the use of strings or buffers
- Use of special library functions (e.g., random)
import numpy as np
x = np.array([2,3,1,0])
x1 = np.array([[1,2.0],[0,0],(1+1j,3.)]) # note mix of tuple and lists, and types
x2 = np.array([[ 1.+0.j, 2.+0.j], [ 0.+0.j, 0.+0.j], [ 1.+1.j, 3.+0.j]])
y = np.zeros((2, 3))
y1 = np.ones((2,3))
y2 = np.arange(10)
y3 = np.arange(2, 10, dtype=np.float)
y4 = np.arange(2, 10, 0.2)
y5 = np.linspace(1., 4., 6) # 将1和4之间六等分
z = np.indices((3,3))
r = [x, x1, x2, y, y1, y2, y3, y4, y5, z]
s = 'x, x1, x2, y, y1, y2, y3, y4, y5, z'.split(', ')
for i in range(len(r)):
print('%s = ' % s[i])
print('')
print(r[i])
print(75 * '=')
x =
[2 3 1 0]
===========================================================================
x1 =
[[ 1.+0.j 2.+0.j]
[ 0.+0.j 0.+0.j]
[ 1.+1.j 3.+0.j]]
===========================================================================
x2 =
[[ 1.+0.j 2.+0.j]
[ 0.+0.j 0.+0.j]
[ 1.+1.j 3.+0.j]]
===========================================================================
y =
[[ 0. 0. 0.]
[ 0. 0. 0.]]
===========================================================================
y1 =
[[ 1. 1. 1.]
[ 1. 1. 1.]]
===========================================================================
y2 =
[0 1 2 3 4 5 6 7 8 9]
===========================================================================
y3 =
[ 2. 3. 4. 5. 6. 7. 8. 9.]
===========================================================================
y4 =
[ 2. 2.2 2.4 2.6 2.8 3. 3.2 3.4 3.6 3.8 4. 4.2 4.4 4.6 4.8
5. 5.2 5.4 5.6 5.8 6. 6.2 6.4 6.6 6.8 7. 7.2 7.4 7.6 7.8
8. 8.2 8.4 8.6 8.8 9. 9.2 9.4 9.6 9.8]
===========================================================================
y5 =
[ 1. 1.6 2.2 2.8 3.4 4. ]
===========================================================================
z =
[[[0 0 0]
[1 1 1]
[2 2 2]]
[[0 1 2]
[0 1 2]
[0 1 2]]]
===========================================================================
Tips: 关于参数 order
:
order
指内存中存储元素的顺序,C
指和 C语言
相似(即行优先),F
指和 Fortran
相似(即列优先)
g = np.ones((2,3,4), dtype = 'i', order = 'C') # 还有 `np.zeros()`
g
array([[[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1]],
[[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1]]], dtype=int32)
# 可将其他数组作为参数传入,返回传入数组的 `shape` 相同的全一矩阵
h = np.ones_like(g, dtype = 'float16', order = 'C') # 还有 `np.zeros_like()`
h
array([[[ 1., 1., 1., 1.],
[ 1., 1., 1., 1.],
[ 1., 1., 1., 1.]],
[[ 1., 1., 1., 1.],
[ 1., 1., 1., 1.],
[ 1., 1., 1., 1.]]], dtype=float16)
注意事项:
- 数组的组成/长度/大小在任何维度内都是
同质的
。 - 整个数组只允许一种数据类型(numpy.dtype)。
NumPy dtype对象
dtype |
描述 | 示例 |
---|---|---|
t |
位域 | t4 (4位) |
b |
布尔值 | b (True 或False ) |
I |
整数 | i8 (64位) |
u |
无符号整数 | u8 (64位) |
f |
浮点数 | f8 (64位) |
c |
浮点复数 | c16 (128位) |
o |
对象 | o (指向对象的指针) |
S,a |
字符串 | S24 (24个字符) |
U |
Unicode |
U24 (24个Unicode字符) |
V |
其他 | V12 (12字节数据块) |
结构数组
允许我们至少在每列上使用不同的NumPy数据类型。
np.info(np.dtype)
dtype()
dtype(obj, align=False, copy=False)
Create a data type object.
A numpy array is homogeneous, and contains elements described by a
dtype object. A dtype object can be constructed from different
combinations of fundamental numeric types.
Parameters
----------
obj
Object to be converted to a data type object.
align : bool, optional
Add padding to the fields to match what a C compiler would output
for a similar C-struct. Can be ``True`` only if `obj` is a dictionary
or a comma-separated string. If a struct dtype is being created,
this also sets a sticky alignment flag ``isalignedstruct``.
copy : bool, optional
Make a new copy of the data-type object. If ``False``, the result
may just be a reference to a built-in data-type object.
See also
--------
result_type
Examples
--------
Using array-scalar type:
>>> np.dtype(np.int16)
dtype('int16')
Structured type, one field name 'f1', containing int16:
>>> np.dtype([('f1', np.int16)])
dtype([('f1', '<i2')])
Structured type, one field named 'f1', in itself containing a structured
type with one field:
>>> np.dtype([('f1', [('f1', np.int16)])])
dtype([('f1', [('f1', '<i2')])])
Structured type, two fields: the first field contains an unsigned int, the
second an int32:
>>> np.dtype([('f1', np.uint), ('f2', np.int32)])
dtype([('f1', '<u4'), ('f2', '<i4')])
Using array-protocol type strings:
>>> np.dtype([('a','f8'),('b','S10')])
dtype([('a', '<f8'), ('b', '|S10')])
Using comma-separated field formats. The shape is (2,3):
>>> np.dtype("i4, (2,3)f8")
dtype([('f0', '<i4'), ('f1', '<f8', (2, 3))])
Using tuples. ``int`` is a fixed type, 3 the field's shape. ``void``
is a flexible type, here of size 10:
>>> np.dtype([('hello',(np.int,3)),('world',np.void,10)])
dtype([('hello', '<i4', 3), ('world', '|V10')])
Subdivide ``int16`` into 2 ``int8``'s, called x and y. 0 and 1 are
the offsets in bytes:
>>> np.dtype((np.int16, {'x':(np.int8,0), 'y':(np.int8,1)}))
dtype(('<i2', [('x', '|i1'), ('y', '|i1')]))
Using dictionaries. Two fields named 'gender' and 'age':
>>> np.dtype({'names':['gender','age'], 'formats':['S1',np.uint8]})
dtype([('gender', '|S1'), ('age', '|u1')])
Offsets in bytes, here 0 and 25:
>>> np.dtype({'surname':('S25',0),'age':(np.uint8,25)})
dtype([('surname', '|S25'), ('age', '|u1')])
Methods:
newbyteorder -- newbyteorder(new_order='S')
dt = np.dtype([('Name', 'S10'), ('Age', 'i4'),
('Height', 'f'), ('Children/Pets', 'i4', 2)])
s = np.array([('Smith', 45, 1.83, (0, 1)),
('Jones', 53, 1.72, (2, 2))], dtype=dt)
s
array([(b'Smith', 45, 1.83000004, [0, 1]),
(b'Jones', 53, 1.72000003, [2, 2])],
dtype=[('Name', 'S10'), ('Age', '<i4'), ('Height', '<f4'), ('Children/Pets', '<i4', (2,))])
s['Name']
array([b'Smith', b'Jones'],
dtype='|S10')
s['Age']
array([45, 53])
s["Height"].mean()
1.7750001
s[1]
(b'Jones', 53, 1.72000003, [2, 2])
s[1]['Age']
53
代码向量化
r = np.array([[1,2,3],[2,3,4],[3,4,5],[4,5,6]])
s = np.array([[2,3,4],[3,4,5],[4,5,6],[6,7,8]])
简单的数学运算
r + s
array([[ 3, 5, 7],
[ 5, 7, 9],
[ 7, 9, 11],
[10, 12, 14]])
r * s
array([[ 2, 6, 12],
[ 6, 12, 20],
[12, 20, 30],
[24, 35, 48]])
r % s
array([[1, 2, 3],
[2, 3, 4],
[3, 4, 5],
[4, 5, 6]], dtype=int32)
s // r
array([[2, 1, 1],
[1, 1, 1],
[1, 1, 1],
[1, 1, 1]], dtype=int32)
支持广播
更多内容参考http://www.cnblogs.com/lyon2014/p/4696989.html
r
array([[1, 2, 3],
[2, 3, 4],
[3, 4, 5],
[4, 5, 6]])
2 * r + 3
array([[ 5, 7, 9],
[ 7, 9, 11],
[ 9, 11, 13],
[11, 13, 15]])
f = np.array([9,8,7])
f
array([9, 8, 7])
r + f
array([[10, 10, 10],
[11, 11, 11],
[12, 12, 12],
[13, 13, 13]])
# r.transpose() 转置
np.shape(r.T)
(3, 4)
def f(x):
return 3 * x + 5
f(r.T)
array([[ 8, 11, 14, 17],
[11, 14, 17, 20],
[14, 17, 20, 23]])
np.sin(r)
array([[ 0.84147098, 0.90929743, 0.14112001],
[ 0.90929743, 0.14112001, -0.7568025 ],
[ 0.14112001, -0.7568025 , -0.95892427],
[-0.7568025 , -0.95892427, -0.2794155 ]])
np.sin(np.pi)
1.2246467991473532e-16
ufunc
http://docs.scipy.org/doc/numpy/reference/ufuncs.html
Memory Layout(内存布局)
x = np.random.standard_normal((5, 10000000))
y = 2 * x + 3 # linear equation y = a * x + b
C = np.array((x, y), order='C')
F = np.array((x, y), order='F')
x = 0.0; y = 0.0 # memory clean-up
C[:2].round(2)
array([[[ 0.67, 0.29, 1.54, ..., 0.07, 2.64, -0.65],
[ 0.4 , -0.63, 1.43, ..., 1.11, 0.93, -0.52],
[-0.41, 2.23, -1.16, ..., -1.66, 0.07, 0.21],
[ 1.46, 1.22, 0.2 , ..., -0.56, 2.36, -1.65],
[-0.39, 1.73, -0.24, ..., -1.45, 0.43, -0.41]],
[[ 4.34, 3.58, 6.08, ..., 3.15, 8.28, 1.69],
[ 3.79, 1.73, 5.86, ..., 5.22, 4.87, 1.97],
[ 2.17, 7.46, 0.67, ..., -0.32, 3.15, 3.42],
[ 5.93, 5.44, 3.4 , ..., 1.89, 7.72, -0.3 ],
[ 2.22, 6.46, 2.51, ..., 0.1 , 3.85, 2.18]]])
%timeit C.sum()
135 ms ± 2.29 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit F.sum()
134 ms ± 499 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
加总数组元素时,两种内存布局没有显著差异。但是,考虑以下情况便会有显著的差异。
%timeit C[0].sum(axis=0)
128 ms ± 894 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit C[0].sum(axis=1)
66.5 ms ± 296 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit F.sum(axis=0)
1.06 s ± 48.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit F.sum(axis=1)
2.12 s ± 35.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
F = 0.0; C = 0.0 # memory clean-up
从上面可以看出:
在少量大型的向量上的操作比在大量小型向量上性能好。
少量大型向量的元素保存在相邻的内存位置上,这可以解释相对的性能优势。
但是,与类C语言变种相比,整体操作要慢得多。
选择合适的内存布局,可将代码执行速度提高2个以上的数量级。
结语:
- 基本数据类型(整数,浮点数,字符串)提供了原始数据类型。
- 标准数据结构(元组,列表,字典,集合类)提供了对数据集的各种操作。
- 数组(numpy.ndarray类)提供了代码的向量化操作,使得代码变得更加简洁、方便、高性能。
值得参考的资料:
- Python入门必备:http://www.python.org/doc/
- NumPy使用帮助文件:http://docs.scipy.org/doc/
- SciPy讲义:http://www.scipy-lectures.org/index.html