《python数据分析（第2版）-阿曼多.凡丹戈》读书笔记第1章-jupyter及常见库

《python数据分析（第2版）-阿曼多.凡丹戈》。

该书是一本介绍如何用Python进行数据分析的学习指南。全书共12章，从Python程序库入门、NumPy数组和Pandas入门开始，陆续介绍了数据的检索、数据加工与存储、数据可视化等内容。同时，本书还介绍了信号处理与时间序列、应用数据库、分析文本数据与社交媒体、预测性分析与机器学习、Python生态系统的外部环境和云计算、性能优化及分析、并发性等内容。在本书的最后，还采用3个附录的形式为读者补充了一些重要概念、常用函数以及在线资源等重要内容。

作者简介：

Armando Fandango是Epic工程咨询集团首席数据科学家，负责与国防和政府机构有关的保密项目。Armando是一位技术精湛的技术人员，拥有全球创业公司和大型公司的工作经历和高级管理经验。他的工作涉及金融科技、证券交易所、银行、生物信息学、基因组学、广告技术、基础设施、交通运输、能源、人力资源和娱乐等多个领域。
Armando在预测分析、数据科学、机器学习、大数据、产品工程、高性能计算和云基础设施等项目中工作了十多年。他的研究兴趣横跨机器学习、深度学习和科学计算等领域。

前言
　　数据分析在自然科学、生物医学和社会科学领域有着悠久的历史。随着数据科学的发展，数据分析也呈现流行之势，几乎已经渗透到工业的方方面面。与数据科学类似，数据分析也致力于从数据中提取有效信息。为此，我们需要用到统计学、机器学习、信号处理、自然语言处理和计算机科学领域中的各种技术。
　　在第1章中，我们将给出一幅描绘与数据分析相关的Python软件的脑图。首先要知道的是，Python生态系统已经非常完备，具有诸如NumPy、SciPy和Matplotlib等著名的程序包。当然，这没有什么好奇怪的，因为Python在1989年就诞生了。Python易学、易用，而且与其他程序设计语言相比语法简练，可读性非常强，即使从未接触过Python的人，也可以在几天内掌握该语言的基本用法，对熟悉其他编程语言的人来说尤其如此。你无需太多的基础知识，就能顺畅地阅读本书。此外，关于Python的书籍、课程和在线教程也非常多。
　　本书内容
　　第1章“Python程序库入门”手把手地指导读者正确安装配置Python和基础的Python数值分析软件库。同时，本章还会展示如何通过NumPy创建一个小程序以及如何利用Matplotlib来绘制简单的图形。
　　第2章“NumPy数组”介绍NumPy和数组的基础知识。通过阅读本章，读者能够基本掌握NumPy数组及其相关函数。
　　第3章“Pandas入门”阐述Pandas的基本功能，其中涉及Pandas的数据结构与相应的操作。
　　第4章“统计学与线性代数”对线性代数和统计函数做了简要回顾。
　　第5章“数据的检索、加工与存储”介绍如何获取不同格式的数据，以及原始数据的清洗和存储方法。
　　第6章“数据可视化”介绍如何利用Matplotlib和Pandas的绘图函数来实现数据的可视化。
　　第7章“信号处理与时间序列”利用太阳黑子周期数据来实例讲解时间序列和信号处理，同时还会介绍一些相关的统计模型。本章使用的主要工具是NumPy和SciPy。
　　第8章“应用数据库”介绍各种数据库和有关API的知识，其中包括关系数据库和NoSQL数据库。
　　第9章“分析文本数据和社交媒体”考察基于文本数据的情感分析和主题抽取。同时，本章还将为读者展示一个网络分析方面的实例。
　　第10章“预测性分析与机器学习”通过一个例子来说明人工智能在天气预报上的应用，这主要借助于scikit-learn。不过，有些机器学习算法在scikit-learn中尚未实现，所以有时还要求助其他API。
　　第11章“Python生态系统的外部环境和云计算”将提供各种实例，来说明如何集成非Python编写的现有代码。此外，本章还将为读者演示如何在云中使用Python。
　　第12章“性能优化、性能分析与并发性”为读者介绍通过性能分析（Profling）和Cython等关键技术来改善性能的各种技巧，同时还为读者介绍多核和分布式系统方面的相关框架。
　　附录A“重要概念”将对本书中涉及的重要概念进行简要介绍。
　　附录B“常用函数”概述本书中用到的程序库中的各种函数，以便于读者查阅。

第1章“Python程序库入门”手把手地指导读者正确安装配置Python和基础的Python数值分析软件库。同时，本章还会展示如何通过NumPy创建一个小程序以及如何利用Matplotlib来绘制简单的图形。

邀月工作室

首先：需要了解的是Python生态系统为数据分析师和数据科学家提供的常用程序库。

☆☆☆☆☆NumPy：这是一个通用程序库，不仅支持常用的数值数组，同时提供了用于高效处理这些数组的函数。

☆☆☆☆☆SciPy：这是Python的科学计算库，对NumPy的功能进行了大量扩充，同时也有部分功能是重合的。Numpy和SciPy曾经共享基础代码，后来分道扬镳了。

☆☆☆☆☆Pandas：这是一个用于数据处理的程序库，不仅提供了丰富的数据结构，同时为处理数据表和时间序列提供了相应的函数。

☆☆☆☆Matplotlib：这是一个2D绘图库，在绘制图形和图像方面提供了良好的支持。当前，Matplotlib已经并入SciPy中并支持NumPy。

☆☆☆☆IPython：这个库为Python提供了强大的交互式Shell，也为Jupyter提供了内核，同时还支持交互式数据可视化功能。我们将在本章稍后介绍IPython shell。

☆☆☆☆Jupyter Notebook：它提供了一个基于Web的交互式shell，可以创建和共享支持可实时代码和可视化的文档。Jupyter Notebook通过IPython提供的内核支持多个版本的Python。

常见官方地址：

NumPy和SciPy的主要文档网站是http://docs.scipy.org/doc/。通过该网站，您可以浏览NumPy和SciPy程序库的用户指南和参考指南，以及一些相关教程

Pandas http://pandas.pydata.org/pandas-docs/stable/

Matplotlib http://matplotlib.org/contents.html

Ipython http://ipython.readthedocs.io/en/stable/

Jupyter Notebook　http://jupyter-notebook.readthedocs.io/en/latest/

1、安装python略。

安装常见库也基本都用pip install XX即可，安装好后，可通过以下命令查看各程序库版本：

 1 import pkgutil as pu
 2 import pydoc
 3 
 4 import numpy as np
 5 import scipy as sp
 6 import pandas as pd
 7 import matplotlib as mpl
 8 
 9 print("NumPy version", np.__version__)
10 print("SciPy version", sp.__version__)
11 print("pandas version", pd.__version__)
12 print("Matplotlib version", mpl.__version__)
13 
14 def clean(astr):
15    s = astr
16    # remove multiple spaces
17    s = ' '.join(s.split())
18    s = s.replace('=','')
19 
20    return s
21 
22 def print_desc(prefix, pkg_path):
23    for pkg in pu.iter_modules(path=pkg_path):
24       name = prefix + "." + pkg[1]
25 
26       if pkg[2] == True:
27          try:
28             docstr = pydoc.plain(pydoc.render_doc(name))
29             docstr = clean(docstr)
30             start = docstr.find("DESCRIPTION")
31             docstr = docstr[start: start + 140]
32             print(name, docstr)
33          except:
34             continue
35 
36 print("\n")
37 print_desc("numpy", np.__path__)
38 print("\n")
39 print_desc("scipy", sp.__path__)
40 print("\n")
41 print_desc("pandas", pd.__path__)
42 print("\n")
43 print_desc("matplotlib", mpl.__path__)

NumPy version 1.18.2
SciPy version 1.4.0
pandas version 1.0.3
Matplotlib version 3.2.1

...
numpy.compat DESCRIPTION This module contains duplicated code from Python itself or 3rd party extensions, which may be included for the following reasons
numpy.tests 


scipy._build_utils 
...
scipy.stats DESCRIPTION  Statistical functions (:mod:`scipy.stats`)  .. currentmodule:: scipy.stats This module contains a large number of probability d


pandas._config DESCRIPTION pandas._config is considered explicitly upstream of everything else in pandas, should have no intra-pandas dependencies. importi
...
pandas.tests 
pandas.tseries 
pandas.util 


matplotlib.axes 
matplotlib.backends 
matplotlib.cbook DESCRIPTION A collection of utility functions and classes. Originally, many (but not all) were from the Python Cookbook -- hence the name cb
matplotlib.compat 
matplotlib.projections 
matplotlib.sphinxext 
matplotlib.style 
matplotlib.testing 
matplotlib.tri

Tips-》查看帮助小技巧：

1 import numpy as np
2 print(help(np.loadtxt))

输出如下：

Help on function loadtxt in module numpy:

loadtxt(fname, dtype=<class 'float'>, comments='#', delimiter=None, converters=None, skiprows=0, usecols=None, unpack=False, ndmin=0, encoding='bytes', max_rows=None)
    Load data from a text file.
    
    Each row in the text file must have the same number of values.
    
    Parameters
    ----------
    fname : file, str, or pathlib.Path
        File, filename, or generator to read.  If the filename extension is
        ``.gz`` or ``.bz2``, the file is first decompressed. Note that
        generators should return byte strings.
    dtype : data-type, optional
        Data-type of the resulting array; default: float.  If this is a
        structured data-type, the resulting array will be 1-dimensional, and
        each row will be interpreted as an element of the array.  In this
        case, the number of columns used must match the number of fields in
        the data-type.
    comments : str or sequence of str, optional
        The characters or list of characters used to indicate the start of a
        comment. None implies no comments. For backwards compatibility, byte
        strings will be decoded as 'latin1'. The default is '#'.
    delimiter : str, optional
        The string used to separate values. For backwards compatibility, byte
        strings will be decoded as 'latin1'. The default is whitespace.
    converters : dict, optional
        A dictionary mapping column number to a function that will parse the
        column string into the desired value.  E.g., if column 0 is a date
        string: ``converters = {0: datestr2num}``.  Converters can also be
        used to provide a default value for missing data (but see also
        `genfromtxt`): ``converters = {3: lambda s: float(s.strip() or 0)}``.
        Default: None.
    skiprows : int, optional
        Skip the first `skiprows` lines, including comments; default: 0.
    usecols : int or sequence, optional
        Which columns to read, with 0 being the first. For example,
        ``usecols = (1,4,5)`` will extract the 2nd, 5th and 6th columns.
        The default, None, results in all columns being read.
    
        .. versionchanged:: 1.11.0
            When a single column has to be read it is possible to use
            an integer instead of a tuple. E.g ``usecols = 3`` reads the
            fourth column the same way as ``usecols = (3,)`` would.
    unpack : bool, optional
        If True, the returned array is transposed, so that arguments may be
        unpacked using ``x, y, z = loadtxt(...)``.  When used with a structured
        data-type, arrays are returned for each field.  Default is False.
    ndmin : int, optional
        The returned array will have at least `ndmin` dimensions.
        Otherwise mono-dimensional axes will be squeezed.
        Legal values: 0 (default), 1 or 2.
    
        .. versionadded:: 1.6.0
    encoding : str, optional
        Encoding used to decode the inputfile. Does not apply to input streams.
        The special value 'bytes' enables backward compatibility workarounds
        that ensures you receive byte arrays as results if possible and passes
        'latin1' encoded strings to converters. Override this value to receive
        unicode arrays and pass strings as input to converters.  If set to None
        the system default is used. The default value is 'bytes'.
    
        .. versionadded:: 1.14.0
    max_rows : int, optional
        Read `max_rows` lines of content after `skiprows` lines. The default
        is to read all the lines.
    
        .. versionadded:: 1.16.0
    
    Returns
    -------
    out : ndarray
        Data read from the text file.
    
    See Also
    --------
    load, fromstring, fromregex
    genfromtxt : Load data with missing values handled as specified.
    scipy.io.loadmat : reads MATLAB data files
    
    Notes
    -----
    This function aims to be a fast reader for simply formatted files.  The
    `genfromtxt` function provides more sophisticated handling of, e.g.,
    lines with missing values.
    
    .. versionadded:: 1.10.0
    
    The strings produced by the Python float.hex method can be used as
    input for floats.
    
    Examples
    --------
    >>> from io import StringIO   # StringIO behaves like a file object
    >>> c = StringIO(u"0 1\n2 3")
    >>> np.loadtxt(c)
    array([[0., 1.],
           [2., 3.]])
    
    >>> d = StringIO(u"M 21 72\nF 35 58")
    >>> np.loadtxt(d, dtype={'names': ('gender', 'age', 'weight'),
    ...                      'formats': ('S1', 'i4', 'f4')})
    array([(b'M', 21, 72.), (b'F', 35, 58.)],
          dtype=[('gender', 'S1'), ('age', '<i4'), ('weight', '<f4')])
    
    >>> c = StringIO(u"1,0,2\n3,0,4")
    >>> x, y = np.loadtxt(c, delimiter=',', usecols=(0, 2), unpack=True)
    >>> x
    array([1., 3.])
    >>> y
    array([2., 4.])

None

2、安装jupyter。

Jupyter Notebook（此前被称为 IPython notebook）是一个交互式笔记本，支持运行 40 多种编程语言。用途包括：数据清理和转换，数值模拟，统计建模，机器学习等等。

安装常见的有两个途径。

1）安装好Anaconda3后，通过Jupyter Notebook (Anaconda3)快捷方式访问即可。

2）在eclipse中通过pip安装。

pip install jupyter

安装界面：

/* 
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Collecting jupyter
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/83/df/0f5dd132200728a86190397e1ea87cd76244e42d39ec5e88efd25b2abd7e/jupyter-1.0.0-py2.py3-none-any.whl
Collecting notebook
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/f5/69/d2ffaf7efc20ce47469187e3a41e6e03e17b45de5a6559f4e7ab3eace5e1/notebook-6.0.2-py3-none-any.whl (9.7MB)
Collecting ipykernel
  Using cached https://pypi.tuna.tsinghua.edu.cn/packages/e1/92/8fec943b5b81078399f969f00557804d884c96fcd0bc296e81a2ed4fd270/ipykernel-5.1.3-py3-none-any.whl
Collecting jupyter-console
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/cb/ee/6374ae8c21b7d0847f9c3722dcdfac986b8e54fa9ad9ea66e1eb6320d2b8/jupyter_console-6.0.0-py2.py3-none-any.whl
Collecting qtconsole
  Using cached https://pypi.tuna.tsinghua.edu.cn/packages/7c/57/3528b84ffa753e2089908bbf74bb5ae60653eb7a63797b6234e88b847d67/qtconsole-4.6.0-py2.py3-none-any.whl
Collecting ipywidgets
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/56/a0/dbcf5881bb2f51e8db678211907f16ea0a182b232c591a6d6f276985ca95/ipywidgets-7.5.1-py2.py3-none-any.whl (121kB)
Collecting nbconvert
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/79/6c/05a569e9f703d18aacb89b7ad6075b404e8a4afde2c26b73ca77bb644b14/nbconvert-5.6.1-py2.py3-none-any.whl (455kB)
Collecting prometheus-client
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/b3/23/41a5a24b502d35a4ad50a5bb7202a5e1d9a0364d0c12f56db3dbf7aca76d/prometheus_client-0.7.1.tar.gz
Collecting traitlets>=4.2.1
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/ca/ab/872a23e29cec3cf2594af7e857f18b687ad21039c1f9b922fac5b9b142d5/traitlets-4.3.3-py2.py3-none-any.whl (75kB)
Collecting nbformat
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/da/27/9a654d2b6cc1eaa517d1c5a4405166c7f6d72f04f6e7eea41855fe808a46/nbformat-4.4.0-py2.py3-none-any.whl (155kB)
Collecting ipython-genutils
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/fa/bc/9bd3b5c2b4774d5f33b2d544f1460be9df7df2fe42f352135381c347c69a/ipython_genutils-0.2.0-py2.py3-none-any.whl
Collecting pyzmq>=17
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/e8/be/9cbcdf37890942a9f8f09102903dd69d275258752a530b87fe7273fa26ba/pyzmq-18.1.1-cp37-cp37m-win_amd64.whl (1.0MB)
Collecting jupyter-core>=4.6.0
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/fb/82/86437f661875e30682e99d04c13ba6c216f86f5f6ca6ef212d3ee8b6ca11/jupyter_core-4.6.1-py2.py3-none-any.whl (82kB)
Collecting terminado>=0.8.1
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/ff/96/1d9a2c23990aea8f8e0b5c3b6627d03196a73771a17a2d9860bbe9823ab6/terminado-0.8.3-py2.py3-none-any.whl
Requirement already satisfied: tornado>=5.0 in d:\tools\python37\lib\site-packages (from notebook->jupyter) (6.0.3)
Collecting Send2Trash
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/49/46/c3dc27481d1cc57b9385aff41c474ceb7714f7935b1247194adae45db714/Send2Trash-1.5.0-py3-none-any.whl
Requirement already satisfied: jinja2 in d:\tools\python37\lib\site-packages (from notebook->jupyter) (2.10.3)
Collecting jupyter-client>=5.3.4
  Using cached https://pypi.tuna.tsinghua.edu.cn/packages/13/81/fe0eee1bcf949851a120254b1f530ae1e01bdde2d3ab9710c6ff81525061/jupyter_client-5.3.4-py2.py3-none-any.whl
Collecting ipython>=5.0.0
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/05/d7/77b7a1988c99227f52402f93fb0f7e88c97239960516f53907ebbc44149c/ipython-7.11.0-py3-none-any.whl (777kB)
Collecting pygments
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/be/39/32da3184734730c0e4d3fa3b2b5872104668ad6dc1b5a73d8e477e5fe967/Pygments-2.5.2-py2.py3-none-any.whl (896kB)
Collecting prompt-toolkit<2.1.0,>=2.0.0
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/87/61/2dfea88583d5454e3a64f9308a686071d58d59a55db638268a6413e1eb6d/prompt_toolkit-2.0.10-py3-none-any.whl (340kB)
Collecting widgetsnbextension~=3.5.0
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/6c/7b/7ac231c20d2d33c445eaacf8a433f4e22c60677eb9776c7c5262d7ddee2d/widgetsnbextension-3.5.1-py2.py3-none-any.whl (2.2MB)
Collecting entrypoints>=0.2.2
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/ac/c6/44694103f8c221443ee6b0041f69e2740d89a25641e62fb4f2ee568f2f9c/entrypoints-0.3-py2.py3-none-any.whl
Collecting bleach
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/ab/05/27e1466475e816d3001efb6e0a85a819be17411420494a1e602c36f8299d/bleach-3.1.0-py2.py3-none-any.whl (157kB)
Collecting testpath
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/1b/9e/1a170feaa54f22aeb5a5d16c9015e82234275a3c8ab630b552493f9cb8a9/testpath-0.4.4-py2.py3-none-any.whl (163kB)
Collecting pandocfilters>=1.4.1
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/4c/ea/236e2584af67bb6df960832731a6e5325fd4441de001767da328c33368ce/pandocfilters-1.4.2.tar.gz
Collecting mistune<2,>=0.8.1
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/09/ec/4b43dae793655b7d8a25f76119624350b4d65eb663459eb9603d7f1f0345/mistune-0.8.4-py2.py3-none-any.whl
Collecting defusedxml
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/06/74/9b387472866358ebc08732de3da6dc48e44b0aacd2ddaa5cb85ab7e986a2/defusedxml-0.6.0-py2.py3-none-any.whl
Requirement already satisfied: decorator in d:\tools\python37\lib\site-packages (from traitlets>=4.2.1->notebook->jupyter) (4.4.1)
Requirement already satisfied: six in d:\tools\python37\lib\site-packages (from traitlets>=4.2.1->notebook->jupyter) (1.12.0)
Collecting jsonschema!=2.5.0,>=2.4
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/c5/8f/51e89ce52a085483359217bc72cdbf6e75ee595d5b1d4b5ade40c7e018b8/jsonschema-3.2.0-py2.py3-none-any.whl (56kB)
Collecting pywin32>=1.0; sys_platform == "win32"
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/bb/23/00fe4fbf9963f3bcb34a443eba0d0283fc51e5887d4045552c87490394e4/pywin32-227-cp37-cp37m-win_amd64.whl (9.1MB)
Collecting pywinpty>=0.5; os_name == "nt"
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/7b/de/c69772738f10140d531b46b7462fc1dccb4175987daaa851a8cda2326251/pywinpty-0.5.7-cp37-cp37m-win_amd64.whl (1.3MB)
Requirement already satisfied: MarkupSafe>=0.23 in d:\tools\python37\lib\site-packages (from jinja2->notebook->jupyter) (1.1.1)
Requirement already satisfied: python-dateutil>=2.1 in d:\tools\python37\lib\site-packages (from jupyter-client>=5.3.4->notebook->jupyter) (2.8.1)
Requirement already satisfied: setuptools>=18.5 in d:\tools\python37\lib\site-packages (from ipython>=5.0.0->ipykernel->jupyter) (41.2.0)
Collecting pickleshare
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/9a/41/220f49aaea88bc6fa6cba8d05ecf24676326156c23b991e80b3f2fc24c77/pickleshare-0.7.5-py2.py3-none-any.whl
Collecting colorama; sys_platform == "win32"
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/c9/dc/45cdef1b4d119eb96316b3117e6d5708a08029992b2fee2c143c7a0a5cc5/colorama-0.4.3-py2.py3-none-any.whl
Collecting jedi>=0.10
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/e9/97/55e575a5b49e5c3df9eb3c116c61021d7badf556c816be13bbd7baf55234/jedi-0.15.2-py2.py3-none-any.whl (1.1MB)
Collecting backcall
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/84/71/c8ca4f5bb1e08401b916c68003acf0a0655df935d74d93bf3f3364b310e0/backcall-0.1.0.tar.gz
Collecting wcwidth
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/7e/9f/526a6947247599b084ee5232e4f9190a38f398d7300d866af3ab571a5bfe/wcwidth-0.1.7-py2.py3-none-any.whl
Requirement already satisfied: webencodings in d:\tools\python37\lib\site-packages (from bleach->nbconvert->jupyter) (0.5.1)
Collecting attrs>=17.4.0
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/a2/db/4313ab3be961f7a763066401fb77f7748373b6094076ae2bda2806988af6/attrs-19.3.0-py2.py3-none-any.whl
Collecting pyrsistent>=0.14.0
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/6c/6f/c1a2e8da80a0029f6b618d7e20e1a6f2a61dd04e2e54225309c2cc4268f7/pyrsistent-0.15.6.tar.gz (107kB)
Collecting importlib-metadata; python_version < "3.8"
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/e9/71/1a1e0ed0981bb6a67bce55a210f168126b7ebd2065958673797ea66489ca/importlib_metadata-1.3.0-py2.py3-none-any.whl
Collecting parso>=0.5.2
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/9b/b0/90353a5ece0987279837835224dead0c424833a224195683e188d384e06b/parso-0.5.2-py2.py3-none-any.whl (99kB)
Collecting zipp>=0.5
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/74/3d/1ee25a26411ba0401b43c6376d2316a71addcc72ef8690b101b4ea56d76a/zipp-0.6.0-py2.py3-none-any.whl
Collecting more-itertools
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/68/03/0604cec1ea13c9f063dd50f900d1a36160334dd3cfb01fd0e638f61b46ba/more_itertools-8.0.2-py3-none-any.whl (40kB)
Building wheels for collected packages: prometheus-client, pandocfilters, backcall, pyrsistent
  Building wheel for prometheus-client (setup.py): started
  Building wheel for prometheus-client (setup.py): finished with status 'done'
  Created wheel for prometheus-client: filename=prometheus_client-0.7.1-cp37-none-any.whl size=41407 sha256=c20f43706024995078fe8c037005d905c614e3cf5d6c3919ef7db82bdd9a4435
  Stored in directory: C:\Users\tony zhang\AppData\Local\pip\Cache\wheels\9d\21\d1\2b2a9a083573001599e830a30085eed8e18abb66255fd9ca31
  Building wheel for pandocfilters (setup.py): started
  Building wheel for pandocfilters (setup.py): finished with status 'done'
  Created wheel for pandocfilters: filename=pandocfilters-1.4.2-cp37-none-any.whl size=7862 sha256=c34d2510ee20c461b08609ca3145b5b81d06aaf57371e96d83c3e9221465dcd2
  Stored in directory: C:\Users\tony zhang\AppData\Local\pip\Cache\wheels\2b\37\58\486b9403bb31231ad05667e3f7f738e1a9bb9cfc03b50a01c6
  Building wheel for backcall (setup.py): started
  Building wheel for backcall (setup.py): finished with status 'done'
  Created wheel for backcall: filename=backcall-0.1.0-cp37-none-any.whl size=10418 sha256=682309dc5afe04018f3d6c7272d7ce4e794b5026d8407d26c381c5928adda585
  Stored in directory: C:\Users\tony zhang\AppData\Local\pip\Cache\wheels\80\42\9d\a372415f5bfc53fa21d72e1d5925595cd5808e9bc1fd0e31a4
  Building wheel for pyrsistent (setup.py): started
  Building wheel for pyrsistent (setup.py): finished with status 'done'
  Created wheel for pyrsistent: filename=pyrsistent-0.15.6-cp37-cp37m-win_amd64.whl size=56465 sha256=8859424fe353ae5b7b0e16500abf63a54e7743f3f66394e41eaf7a0af083e18d
  Stored in directory: C:\Users\tony zhang\AppData\Local\pip\Cache\wheels\4f\d6\5d\18980adb7b24443cb907e7015b112e47e5b32a199690a196ee
Successfully built prometheus-client pandocfilters backcall pyrsistent
Installing collected packages: prometheus-client, ipython-genutils, traitlets, pygments, pickleshare, colorama, parso, jedi, backcall, wcwidth, prompt-toolkit, ipython, pywin32, jupyter-core, pyzmq, jupyter-client, ipykernel, attrs, pyrsistent, more-itertools, zipp, importlib-metadata, jsonschema, nbformat, pywinpty, terminado, Send2Trash, entrypoints, bleach, testpath, pandocfilters, mistune, defusedxml, nbconvert, notebook, jupyter-console, qtconsole, widgetsnbextension, ipywidgets, jupyter
Successfully installed Send2Trash-1.5.0 attrs-19.3.0 backcall-0.1.0 bleach-3.1.0 colorama-0.4.3 defusedxml-0.6.0 entrypoints-0.3 importlib-metadata-1.3.0 ipykernel-5.1.3 ipython-7.11.0 ipython-genutils-0.2.0 ipywidgets-7.5.1 jedi-0.15.2 jsonschema-3.2.0 jupyter-1.0.0 jupyter-client-5.3.4 jupyter-console-6.0.0 jupyter-core-4.6.1 mistune-0.8.4 more-itertools-8.0.2 nbconvert-5.6.1 nbformat-4.4.0 notebook-6.0.2 pandocfilters-1.4.2 parso-0.5.2 pickleshare-0.7.5 prometheus-client-0.7.1 prompt-toolkit-2.0.10 pygments-2.5.2 pyrsistent-0.15.6 pywin32-227 pywinpty-0.5.7 pyzmq-18.1.1 qtconsole-4.6.0 terminado-0.8.3 testpath-0.4.4 traitlets-4.3.3 wcwidth-0.1.7 widgetsnbextension-3.5.1 zipp-0.6.0
FINISHED */

View Code

安装完成后，用管理员身份打开powershell或cmd，输入你所在的jupyter文件路径，即会自动打开一个浏览器http://localhost:8888/tree进程。比如以下命令：

d:
cd D:\Java2018\PythonDataAnalysis2\Chapter02
jupyter notebook

邀月工作室

打开对应的ipynb文件即可进入jupyter运行界面。

这个界面相当友好，赞一个。

3、常见jupyter技巧：

1）在cell首行加上%%writefile filename.py 便会在当前工作目录下创建一个名为filename.py文件，如：

1 %%writefile filename.py
2 print("hello jupyter")

2）安装代码自动补全

#安装插件
pip install jupyter_contrib_nbextensions


#检查插件配置
jupyter contrib nbextension install --user --skip-running-check

在启动后的界面中Files并列的tab页“Nbextensions”中勾选"Hinterland"即可。但这个代码补全与eclipse的代码提示有云泥之别，应付简单代码还行。

3）其他常见Tips可以参照这里：https://www.jianshu.com/p/bb0eab1b2535

4、Numpy与python数组运算的一个简单对比

NumPy在进行数组运算时，速度是相当快的。可是到底有多快呢？下面的程序代码将为我们展示numpysum()和pythonsum()这两个函数的实耗时间，这里以μs（微秒）为单位。同时，它还会显示向量sum最后面的两个元素值。下面来看使用Python和NumPy能否得到相同的答案。

 1 import sys
 2 from datetime import datetime
 3 import numpy as np
 4 
 5 def pythonsum(n):
 6    a = list(range(n))
 7    b = list(range(n))
 8    c = []
 9    for i in range(len(a)):
10        a[i] = i ** 2
11        b[i] = i ** 3
12        c.append(a[i] + b[i])
13    return c
14 
15 def numpysum(n):
16    a = np.arange(n) ** 2
17    b = np.arange(n) ** 3
18    c = a + b
19    return c
20 
21 size=10000
22 
23 start = datetime.now()
24 c = pythonsum(size)
25 delta = datetime.now() - start
26 print("The last 2 elements of the sum", c[-2:])
27 print("PythonSum elapsed time in microseconds", delta.microseconds)
28 
29 start = datetime.now()
30 c = numpysum(size)
31 delta = datetime.now() - start
32 print("The last 2 elements of the sum", c[-2:])
33 print("NumPySum elapsed time in microseconds", delta.microseconds)

/*
The last 2 elements of the sum [999500079996, 999800010000]
PythonSum elapsed time in microseconds 14463
The last 2 elements of the sum [-1227299972  -927369968]
NumPySum elapsed time in microseconds 998
*/

不同的机器运行效果不同，但是差距是显著的。

5、一个简单的matplotlib图。

 1 from sklearn.datasets import load_iris
 2 from sklearn.datasets import load_boston
 3 from matplotlib import pyplot as plt
 4 
 5 # 加载iris数据集，显示数据集的相关描述，同时将第1列（萼片长度）作为x坐标值，将第2列（萼片宽度）作为y坐标值。
 6 iris = load_iris()
 7 print(iris.DESCR)
 8 
 9 data=iris.data
10 plt.plot(data[:,0],data[:,1],".")
11 plt.show()
12 
13 # 加载波士顿数据集，显示数据集的相关描述，同时将第3列（非零售业务的比例）作为x坐标值，将第5列（一氧化氮浓度）做为y坐标值，图上的每个点用“+”号表示。
14 boston = load_boston()
15 print(boston.DESCR)
16 
17 data=boston.data
18 plt.plot(data[:,2],data[:,4],"+")
19 plt.show()

邀月工作室

小结：

本章安装了以后要用到的NumPy、SciPy、Pandas、Matplotlib、IPython和Jupyter Notebook等程序库，并通过一个向量加法程序，体验了NumPy带来的卓越性能。此外，我们还探讨了有关的文档和在线资源。同时，我们还尝试通过运行代码来查找库中的模块，并加载了一些样本数据集，还使用Matplotlib绘制一些简单的图形。

第2章将继续与NumPy有关的内容，以探索数组和数据类型等基本概念。

邀月的体会是：相比上一本实战书，这个要简单的多，但重要的是基础概念的理解，权当作上一阶段的巩固和迭代。

第1章完。

python数据分析个人学习读书笔记-目录索引

随书源码官方下载：
https://www.ptpress.com.cn/shopping/buy?bookId=bae24ecb-a1a1-41c7-be7c-d913b163c111

需要登录后免费下载。

posted @ 2020-04-01 16:42 邀月阅读(1112) 评论(0) 收藏举报

刷新页面返回顶部

《python数据分析（第2版）-阿曼多.凡丹戈》读书笔记第1章-jupyter及常见库

公告