Go Python 1: First Meet Python and Numpy

What is NumPy?

NumPy is the fundamental package for scientific computing in Python. It is a Python library that provides a multidimensional array object, various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation and much more.

At the core of the NumPy package, is the ndarray object. This encapsulates n-dimensional arrays of homogeneous data types, with many operations being performed in compiled code for performance. There are several important differences between NumPy arrays and the standard Python sequences:

  • NumPy arrays have a fixed size at creation, unlike Python lists (which can grow dynamically). Changing the size of an ndarray will create a new array and delete the original.
  • The elements in a NumPy array are all required to be of the same data type, and thus will be the same size in memory. The exception: one can have arrays of (Python, including NumPy) objects, thereby allowing for arrays of different sized elements.
  • NumPy arrays facilitate advanced mathematical and other types of operations on large numbers of data. Typically, such operations are executed more efficiently and with less code than is possible using Python’s built-in sequences.
  • A growing plethora of scientific and mathematical Python-based packages are using NumPy arrays; though these typically support Python-sequence input, they convert such input to NumPy arrays prior to processing, and they often output NumPy arrays. In other words, in order to efficiently use much (perhaps even most) of today’s scientific/mathematical Python-based software, just knowing how to use Python’s built-in sequence types is insufficient - one also needs to know how to use NumPy arrays.

The points about sequence size and speed are particularly important in scientific computing. As a simple example, consider the case of multiplying each element in a 1-D sequence with the corresponding element in another sequence of the same length. If the data are stored in two Python lists, a and b, we could iterate over each element:

c = []
for i in range(len(a)):
    c.append(a[i]*b[i])

This produces the correct answer, but if a and b each contain millions of numbers, we will pay the price for the inefficiencies of looping in Python. We could accomplish the same task much more quickly in C by writing (for clarity we neglect variable declarations and initializations, memory allocation, etc.)

for (i = 0; i < rows; i++): {
  c[i] = a[i]*b[i];
}

This saves all the overhead involved in interpreting the Python code and manipulating Python objects, but at the expense of the benefits gained from coding in Python. Furthermore, the coding work required increases with the dimensionality of our data. In the case of a 2-D array, for example, the C code (abridged as before) expands to

for (i = 0; i < rows; i++): {
  for (j = 0; j < columns; j++): {
    c[i][j] = a[i][j]*b[i][j];
  }
}

NumPy gives us the best of both worlds: element-by-element operations are the “default mode” when an ndarray is involved, but the element-by-element operation is speedily executed by pre-compiled C code. In NumPy

c = a * b

does what the earlier examples do, at near-C speeds, but with the code simplicity we expect from something based on Python. Indeed, the NumPy idiom is even simpler! This last example illustrates two of NumPy’s features which are the basis of much of its power: vectorization and broadcasting.

Vectorization describes the absence of any explicit looping, indexing, etc., in the code - these things are taking place, of course, just “behind the scenes” in optimized, pre-compiled C code. Vectorized code has many advantages, among which are:

  • vectorized code is more concise and easier to read
  • fewer lines of code generally means fewer bugs
  • the code more closely resembles standard mathematical notation (making it easier, typically, to correctly code mathematical constructs)
  • vectorization results in more “Pythonic” code. Without vectorization, our code would be littered with inefficient and difficult to read for loops.

https://docs.scipy.org/doc/numpy-1.10.1/user/whatisnumpy.html.

Why NumPy is faster than regular Python?

 
import numpy as np

Let's run through an example showing how powerful NumPy is. Suppose we have two lists a and b, consisting of the first 100,000 non-negative numbers, and we want to create a new list c whose ith element is a[i] + 2 * b[i].

Without NumPy:

 
%%time
a = [i for i in range(100000)]
b = [i for i in range(100000)]
CPU times: user 5.13 ms, sys: 1.03 ms, total: 6.16 ms
Wall time: 6.07 ms
 
%%time
c = []
for i in range(len(a)):
    c.append(a[i] + 2 * b[i])
CPU times: user 52.2 ms, sys: 1.59 ms, total: 53.8 ms
Wall time: 52.3 ms

With NumPy:

 
%%time
a = np.arange(100000)
b = np.arange(100000)
CPU times: user 392 µs, sys: 1.13 ms, total: 1.52 ms
Wall time: 688 µs
 
%%time
c = a + 2 * b
CPU times: user 2.25 ms, sys: 200 µs, total: 2.45 ms
Wall time: 1.02 ms

The result is 10 to 15 times faster, and we could do it in fewer lines of code (and the code itself is more intuitive)!

Regular Python is much slower due to type checking and other overhead of needing to interpret code and support Python's abstractions.

For example, if we are doing some addition in a loop, constantly type checking in a loop will lead to many more instructions than just performing a regular addition operation. NumPy, using optimized pre-compiled C code, is able to avoid a lot of the overhead introduced.

Jupyter

start the notebook server

1 jupyter notebook

 

 

Learning Resoureces

Tutorial:
1.Python Numpy Tutorial: http://cs231n.github.io/python-numpy-tutorial/
2.CS228 Python Tutorial: https://github.com/kuleshov/cs228-material/blob/master/tutorials/python/cs228-python-tutorial.ipynb
3.The Python Tutorial: https://docs.python.org/3.5/tutorial/index.html
4.Python Basics with Numpy: https://hub.coursera-notebooks.org/user/mifonxumuslonynjimbkxm/notebooks/Week%202/Python%20Basics%20with%20Numpy/Python%20Basics%20With%20Numpy%20v3.ipynb

5.Pyhon tutorial:

https://www.tutorialspoint.com/python/index.htm

https://www.py4e.com/

Example:

1.CS 231n Python & NumPy Tutorial: http://cs231n.stanford.edu/notebooks/python_numpy_tutorial.ipynb
2.Logistic Regression with a Neural Network mindset: https://hub.coursera-notebooks.org/user/mifonxumuslonynjimbkxm/notebooks/Week%202/Logistic%20Regression%20as%20a%20Neural%20Network/Logistic%20Regression%20with%20a%20Neural%20Network%20mindset%20v5.ipynb

Docs

Python:

1.python official site: https://www.python.org/

2.python doc: https://docs.python.org/3/

scipy:
1.scipy docs: https://docs.scipy.org/doc/scipy/reference/index.html

numpy:
1.numpy: http://www.numpy.org/

Jupyter:
1.jupyter-notebook docs: http://jupyter-notebook.readthedocs.io/en/stable/index.html

2.jupyter shortcuts: https://www.cheatography.com/weidadeyue/cheat-sheets/jupyter-notebook/pdf_bw/

posted @ 2018-06-08 15:09  wordchao  阅读(284)  评论(0编辑  收藏  举报