萌新向Python数据分析及数据挖掘 第二章 pandas 第二节 Python Language Basics, IPython, and Jupyter Notebooks
Python Language Basics, IPython, and Jupyter Notebooks
import numpy as np #导入numpy
np.random.seed(12345)#设定再现的的随机数
np.set_printoptions(precision=4, suppress=True) #设置打印设置
Signature: np.set_printoptions(precision=None, threshold=None, edgeitems=None, linewidth=None, suppress=None, nanstr=None, infstr=None, formatter=None, sign=None, floatmode=None, **kwarg) Docstring: Set printing options.
These options determine the way floating point numbers, arrays and other NumPy objects are displayed.
The Python Interpreter
$ python
Python 3.6.0 | packaged by conda-forge | (default, Jan 13 2017, 23:17:12)
[GCC 4.8.2 20140120 (Red Hat 4.8.2-15)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> a = 5
>>> print(a)
5
print('Hello world')
$ python hello_world.py
Hello world
$ ipython
Python 3.6.0 | packaged by conda-forge | (default, Jan 13 2017, 23:17:12)
Type "copyright", "credits" or "license" for more information.
IPython 5.1.0 -- An enhanced Interactive Python.
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help -> Python's own help system.
object? -> Details about 'object', use 'object??' for extra details.
In [1]: %run hello_world.py
Hello world
In [2]:
IPython Basics
Running the IPython Shell
$
import numpy as np
data = {i : np.random.randn() for i in range(7)}
data
Return a sample (or samples) from the "standard normal" distribution. 返回7个标准正态分布随机数,存在data字典里 KEY为0-6
from numpy.random import randn data = {i : randn() for i in range(7)} print(data) {0: -1.5948255432744511, 1: 0.10569006472787983, 2: 1.972367135977295, 3: 0.15455217573074576, 4: -0.24058577449429575, 5: -1.2904897053651216, 6: 0.3308507317325902}
Running the Jupyter Notebook
$ jupyter notebook
[I 15:20:52.739 NotebookApp] Serving notebooks from local directory:
/home/wesm/code/pydata-book
[I 15:20:52.739 NotebookApp] 0 active kernels
[I 15:20:52.739 NotebookApp] The Jupyter Notebook is running at:
http://localhost:8888/
[I 15:20:52.740 NotebookApp] Use Control-C to stop this server and shut down
all kernels (twice to skip confirmation).
Created new window in existing browser session.
Tab Completion
按TAB键可以有提示输入功能
In [1]: an_apple = 27
In [2]: an_example = 42
In [3]: an
In [3]: b = [1, 2, 3]
In [4]: b.
In [1]: import datetime
In [2]: datetime.
In [7]: datasets/movielens/
Introspection
问号可以显示相应帮助信息
In [8]: b = [1, 2, 3]
In [9]: b?
Type: list
String Form:[1, 2, 3]
Length: 3
Docstring:
list() -> new empty list
list(iterable) -> new list initialized from iterable's items
In [10]: print?
Docstring:
print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False)
Prints the values to a stream, or to sys.stdout by default.
Optional keyword arguments:
file: a file-like object (stream); defaults to the current sys.stdout.
sep: string inserted between values, default a space.
end: string appended after the last value, default a newline.
flush: whether to forcibly flush the stream.
Type: builtin_function_or_method
def add_numbers(a, b):
"""
Add two numbers together
Returns
-------
the_sum : type of arguments
"""
return a + b
In [11]: add_numbers?
Signature: add_numbers(a, b)
Docstring:
Add two numbers together
Returns
-------
the_sum : type of arguments
File: <ipython-input-9-6a548a216e27>
Type: function
In [12]: add_numbers??
Signature: add_numbers(a, b)
Source:
def add_numbers(a, b):
"""
Add two numbers together
Returns
-------
the_sum : type of arguments
"""
return a + b
File: <ipython-input-9-6a548a216e27>
Type: function
In [13]: np.*load*?
np.__loader__
np.load
np.loads
np.loadtxt
np.pkgload
*load*?可以搜索numpy顶级命名空间中有load的所有函数
The %run Command
def f(x, y, z):
return (x + y) / z
a = 5
b = 6
c = 7.5
result = f(a, b, c)
In [14]: %run ipython_script_test.py
In [15]: c
Out [15]: 7.5
In [16]: result
Out[16]: 1.4666666666666666
>>> %load ipython_script_test.py
def f(x, y, z):
return (x + y) / z
a = 5
b = 6
c = 7.5
result = f(a, b, c)
中断运行代码 CTRL+C
从剪贴板执行代码
x = 5
y = 7
if x > 5:
x += 1
y = 8
In [17]: %paste
x = 5
y = 7
if x > 5:
x += 1
y = 8
## -- End pasted text --
In [18]: %cpaste
Pasting code; enter '--' alone on the line to stop or use Ctrl-D.
:x = 5
:y = 7
:if x > 5:
: x += 1
:
: y = 8
:--
Terminal Keyboard Shortcuts
About Magic Commands
In [20]: a = np.random.randn(100, 100)
In [20]: %timeit np.dot(a, a)
10000 loops, best of 3: 20.9 µs per loop
魔法命令 计时
In [21]: %debug?
Docstring:
::
%debug [--breakpoint FILE:LINE] [statement [statement ...]]
Activate the interactive debugger.
This magic command support two ways of activating debugger.
One is to activate debugger before executing code. This way, you
can set a break point, to step through the code from the point.
You can use this mode by giving statements to execute and optionally
a breakpoint.
The other one is to activate debugger in post-mortem mode. You can
activate this mode simply running %debug without any argument.
If an exception has just occurred, this lets you inspect its stack
frames interactively. Note that this will always work only on the last
traceback that occurred, so you must call this quickly after an
exception that you wish to inspect has fired, because if another one
occurs, it clobbers the previous one.
If you want IPython to automatically do this on every exception, see
the %pdb magic for more details.
positional arguments:
statement Code to run in debugger. You can omit this in cell
magic mode.
optional arguments:
--breakpoint <FILE:LINE>, -b <FILE:LINE>
Set break point at LINE in FILE.
魔法命令 DEBUG 激活交互式调试器。
这个神奇的命令支持两种激活调试器的方法。 一种是在执行代码之前激活调试器。这样,你 可以设置一个断点,从点开始逐步执行代码。 您可以通过给出要执行的语句来使用此模式 一个断点。
另一个是在事后模式下激活调试器。您可以 激活此模式只需运行%debug而不带任何参数。 如果刚刚发生异常,则可以检查其堆栈 交互式地框架。请注意,这始终只适用于最后一个 发生了回溯,所以你必须在一个之后快速调用它 你希望检查的异常已被解雇,因为如果另一个 发生了,它破坏了前一个。
如果您希望IPython在每个异常上自动执行此操作,请参阅 %pdb magic更多细节。
In [22]: %pwd
Out[22]: '/home/wesm/code/pydata-book
In [23]: foo = %pwd
In [24]: foo
Out[24]: '/home/wesm/code/pydata-book'
魔法命令 输出路径
Matplotlib Integration
In [26]: %matplotlib
Using matplotlib backend: Qt4Agg
In [26]: %matplotlib inline
让matolotlib显示在notebook中
Python Language Basics
Language Semantics
规定使用缩进表示代码间的逻辑
for x in array:
if x < pivot:
less.append(x)
else:
greater.append(x)
a = 5; b = 6; c = 7
万物皆是对象
井号后面一行不执行
results = []
for line in file_handle:
# keep the empty lines for now
# if len(line) == 0:
# continue
results.append(line.replace('foo', 'bar'))
print("Reached this line") # Simple status report
函数和对象方法调用
result = f(x, y, z) g()
obj.some_method(x, y, z)
对象.公式(参数1,参数2,参数3)
result = f(a, b, c, d=5, e='foo')
变量和参数传递
a = [1, 2, 3]
b = a
a.append(4)
b# 4
b.append(5)
a
自制append
def append_element(some_list, element):
some_list.append(element)
In [27]: data = [1, 2, 3]
In [28]: append_element(data, 4)
In [29]: data
Out[29]: [1, 2, 3, 4]
Dynamic references, strong types
a = 5
type(a)
a = 'foo'
type(a)
'5' + 5
a = 4.5
b = 2
# String formatting, to be visited later
print('a is {0}, b is {1}'.format(type(a), type(b)))
a / b
a = 5
isinstance(a, int)
a = 5; b = 4.5
isinstance(a, (int, float))
isinstance(b, (int, float))
属性和方法
In [1]: a = 'foo'
In [2]: a.<按Tab出现属性和方法提示m>
a.capitalize a.format a.isupper a.rindex a.strip
a.center a.index a.join a.rjust a.swapcase
a.count a.isalnum a.ljust a.rpartition a.title
a.decode a.isalpha a.lower a.rsplit a.translate
a.encode a.isdigit a.lstrip a.rstrip a.upper
a.endswith a.islower a.partition a.split a.zfill
a.expandtabs a.isspace a.replace a.splitlines
a.find a.istitle a.rfind a.startswith
a = 'foo'
getattr(a, 'split')
Docstring: getattr(object, name[, default]) -> value
Get a named attribute from an object; getattr(x, 'y') is equivalent to x.y. When a default argument is given, it is returned when the attribute doesn't exist; without it, an exception is raised in that case. Type: builtin_function_or_method
Duck typing
在Python中 鸭子类型在Python中被广泛使用。Python术语表这样定义鸭子类型:
Pythonic programming style that determines an object's type by inspection of its method or attribute signature rather than by explicit relationship to some type object ("If it looks like a duck and quacks like a duck, it must be a duck.") By emphasizing interfaces rather than specific types, well-designed code improves its flexibility by allowing polymorphic substitution. Duck-typing avoids tests using type() or isinstance(). Instead, it typically employs the EAFP (Easier to Ask Forgiveness than Permission) style of programming.
It's easier to ask forgiveness than it is to get permission. Variant: If it's a good idea, go ahead and do it. It is much easier to apologize than it is to get permission.---Grace Hopper - Wikiquote
在Python中,鸭子类型的最典型例子就是类似file的类。这些类可以实现file的一些或全部方法,并可以用于file通常使用的地方。例如,GzipFile实现了一个用于访问gzip压缩的数据的类似file的对象。cStringIO允许把一个Python字符串视作一个文件。套接字(socket)也和文件共同拥有许多相同的方法。然而套接字缺少tell()方法,不能用于GzipFile可以使用的所有地方。这体现了鸭子类型的可伸缩性:一个类似file的对象可以实现它有能力实现的方法,且只能被用于它有意义的情形下。
EAFP原则描述了异常处理的使用。例如相对于检查一个自称为类似Duck的对象是否拥有一个quack()方法(使用if hasattr(mallard, "quack"): ...),人们通常更倾向于用异常处理把对quack的调用尝试包裹起来:
try: mallard.quack() except (AttributeError, TypeError): print "mallard並沒有quack()函式" 这个写法的优势在于它鼓励结构化处理其他来自类的错误(这样的话,例如,一个不能完成quack的Duck子类可以抛出一个“QuackException”,这个异常可以简单地添加到包裹它的代码,并不需要影响更多的代码的逻辑。同时,对于其他不同类的对象存在不兼容的成员而造成的命名冲突,它也能够处理(例如,假设有一个医学专家Mallard有一个布尔属性将他分类为“quack=True”,试图执行Mallard.quack()将抛出一个TypeError)。
在更实际的实现类似file的行为的例子中,人们更倾向于使用Python的异常处理机制来处理各种各样的可能因为各种程序员无法控制的环境和operating system问题而发生的I/O错误。在这里,“鸭子类型”产生的异常可以在它们自己的子句中捕获,与操作系统、I/O和其他可能的错误分别处理,从而避开复杂的检测和错误检查逻辑。
def isiterable(obj):
try:
iter(obj)
return True
except TypeError: # 不可迭代
return False
Docstring: iter(iterable) -> iterator iter(callable, sentinel) -> iterator
从对象获取迭代器。 In the first form, the argument must supply its own iterator, or be a sequence. In the second form, the callable is called until it returns the sentinel. Type: builtin_function_or_method
isiterable('a string')
isiterable([1, 2, 3])
isiterable(5)
if not isinstance(x, list) and isiterable(x): x = list(x)
导入
# some_module.py
PI = 3.14159
def f(x):
return x + 2
def g(a, b):
return a + b
import some_module result = some_module.f(5) pi = some_module.PI
from some_module import f, g, PI result = g(5, PI)
import some_module as sm from some_module import PI as pi, g as gf
r1 = sm.f(pi) r2 = gf(6, pi)
二元运算符和比较
5 - 7
12 + 21.5
5 <= 2
a = [1, 2, 3]
b = a
c = list(a)
a is b
a is not c
a == c
a = None
a is None
可变和不可变的对象
a_list = ['foo', 2, [4, 5]]
a_list[2] = (3, 4)
a_list
a_tuple = (3, 5, (4, 5))#元组元素不可变
a_tuple[1] = 'four'
a_tuple = (3, 5, [4, 5])#但是元组内的可变对象可以修改
a_tuple[2][0] = 'four'
a_tuple
Scalar Types
Numeric types
ival = 17239871
ival ** 6
fval = 7.243
fval2 = 6.78e-5
3 / 2
3 // 2 #"//"取整除 - 返回商的整数部分(向下取整)
Strings
a = 'one way of writing a string' b = "another way"
c = """
This is a longer string that
spans multiple lines
"""
c.count('\n')# 换行符在c字符串的计数
a = 'this is a string'
a[10] = 'f'
b = a.replace('string', 'longer string')
b
Docstring: S.replace(old, new[, count]) -> str
Return a copy of S with all occurrences of substring old replaced by new. If the optional argument count is given, only the first count occurrences are replaced.
a
a = 5.6
s = str(a)
print(s)
print(type(s))
s = 'python'
l= list(s)
s[:3]
print(type(l))
print(type(s))
转义符及raw还原
s = '12\\34'
s1 = '12\34'
s2 = r'12\34'
print(s)
print(s1)
print(s2)
s = r'this\has\no\special\characters'
s
字符串拼接
a = 'this is the first half '
b = 'and this is the second half'
a + b
格式化输出
template = '{0:.2f} {1:s} are worth US${2:d}'
template.format(4.5560, 'Argentine Pesos', 1)
S.format(*args, **kwargs) -> str
Return a formatted version of S, using substitutions from args and kwargs. The substitutions are identified by braces ('{' and '}').
Bytes and Unicode
val = "español"
print(val)
print(type(val))
val_utf8 = val.encode('utf-8')
val_utf8
type(val_utf8)
val_utf8.decode('utf-8')
val.encode('latin1')
val.encode('utf-16')
val.encode('utf-16le')
bytes_val = b'this is bytes'
bytes_val
decoded = bytes_val.decode('utf8')
decoded # this is str (Unicode) now
Booleans
True and True
False or True
Type casting
s = '3.14159'
fval = float(s)
type(fval)
int(fval)
bool(fval)
bool(0)
None
a = None
a is None
b = 5
b is not None
def add_and_maybe_multiply(a, b, c=None): result = a + b
if c is not None:
result = result * c
return result
type(None)
Dates and times
from datetime import datetime, date, time
dt = datetime(2011, 10, 29, 20, 30, 21)
dt.day
dt.minute
Init signature: datetime(self, /, *args, **kwargs) Docstring:
datetime(year, month, day[, hour[, minute[, second[, microsecond[,tzinfo]]]]])
年月日必须。 tzinfo may be None, or an instance of a tzinfo subclass. The remaining arguments may be ints.
dt.date()
dt.time()
dt.strftime('%m/%d/%Y %H:%M') #格式化时间显示
datetime.strptime('20091031', '%Y%m%d')
dt.replace(minute=0, second=0)
dt2 = datetime(2011, 11, 15, 22, 30)
delta = dt2 - dt
delta
type(delta)
A duration expressing the difference between two date, time, or datetime instances to microsecond resolution.
dt
dt + delta
Control Flow
if, elif, and else
if x < 0: print('It's negative')
if x < 0: print('It's negative') elif x == 0: print('Equal to zero') elif 0 < x < 5: print('Positive but smaller than 5') else: print('Positive and larger than or equal to 5')
a = 5; b = 7
c = 8; d = 4
if a < b or c > d:
print('Made it')
4 > 3 > 2 > 1
for loops
for value in collection:
# do something with value
sequence = [1, 2, None, 4, None, 5] total = 0 for value in sequence: if value is None: continue total += value
sequence = [1, 2, 0, 4, 6, 5, 2, 1] total_until_5 = 0 for value in sequence: if value == 5: break total_until_5 += value
for i in range(4):
for j in range(4):
if j > i:
break
print((i, j))
Init signature: range(self, /, *args, **kwargs) Docstring:
range(stop) -> range object range(start, stop[, step]) -> range object
Return an object that produces a sequence of integers from start (inclusive) to stop (exclusive) by step. range(i, j) produces i, i+1, i+2, ..., j-1. start defaults to 0, and stop is omitted! range(4) produces 0, 1, 2, 3. These are exactly the valid indices for a list of 4 elements. When step is given, it specifies the increment (or decrement).
for a, b, c in iterator:
# do something
while loops
x = 256 total = 0 while x > 0: if total > 500: break total += x x = x // 2
pass
if x < 0: print('negative!') elif x == 0:
# TODO: put something smart here
pass
else: print('positive!')
range
range(10)
list(range(10))
Init signature: list(self, /, *args, **kwargs) Docstring:
list() -> new empty list list(iterable) -> new list initialized from iterable's items
list(range(0, 20, 2))
list(range(5, 0, -1))
seq = [1, 2, 3, 4] for i in range(len(seq)): val = seq[i]
sum = 0 for i in range(100000):
# % is the modulo operator
if i % 3 == 0 or i % 5 == 0:
sum += i
Ternary expressions
value =
if
x = 5
'Non-negative' if x >= 0 else 'Negative'