Python爬虫与数据分析之Python的语法、字典、元组、列表

公告

Posted on 2019-04-10 20:03 南城故梦阅读(882) 评论(0) 收藏举报

用python快三年了，一直都没有空去整理学过的知识。趁着这段时间项目比较空闲，业余时间多，开了一个专栏，专门讲述自己对Python的学习经历，与各位共勉！

专栏目录：

Python爬虫与数据分析之python教学视频、python源码分享，python

Python爬虫与数据分析之基础教程：Python的语法、字典、元组、列表

Python爬虫与数据分析之进阶教程：文件操作、lambda表达式、递归、yield生成器

Python爬虫与数据分析之模块：内置模块、开源模块、自定义模块

Python爬虫与数据分析之爬虫技能：urlib库、xpath选择器、正则表达式

Python爬虫与数据分析之京东爬虫实战：爬取京东商品并存入sqlite3数据库

Python爬虫与数据分析之二手车平台数据获取和分析

Python爬虫与数据分析之python开源爬虫项目汇总

入门知识拾遗

一、作用域

对于变量的作用域，执行声明并在内存中存在，该变量就可以在下面的代码中使用。

1 if 1==1:
2     name = 'wupeiqi'
3 print  name

View Code

作用域应用范围：

外层变量，可以被内层变量使用

内层变量，无法被外层变量使用

二、三元运算

result = 值1 if 条件 else 值2

如果条件为真：result = 值1
如果条件为假：result = 值2

三、进制

二进制，01
八进制，01234567
十进制，0123456789
十六进制，0123456789ABCDEF

Python基础

一、整数

如： 18、73、84

整数常用功能函数：

View Code

二、长整型

可能如：2147483649、9223372036854775807

长整型常用功能：

View Code

四、字符串

如：'wupeiqi'、'alex'

字符串常用功能：

View Code

注：编码；字符串的乘法；字符串和格式化

五、列表

如：[11,22,33]、['wupeiqi', 'alex']

列表常用功能：

View Code

注：排序；

六、元组

如：(11,22,33)、('wupeiqi', 'alex')

元组常用功能：

View Code

七、字典

如：{'name': 'wupeiqi', 'age': 18} 、{'host': '2.2.2.2', 'port': 80]}

ps：循环时，默认循环key

字典常用功能：

View Code

练习：元素分类

有如下值集合 [11,22,33,44,55,66,77,88,99,90...]，将所有大于 66 的值

保存至字典的第一个key中，将小于 66 的值保存至第二个key的值中。

即： {'k1': 大于66 , 'k2': 小于66}

八、set集合

set是一个无序且不重复的元素集合，常用功能如下

  1 class set(object):
  2     """
  3     set() -> new empty set object
  4     set(iterable) -> new set object
  5     
  6     Build an unordered collection of unique elements.
  7     """
  8     def add(self, *args, **kwargs): # real signature unknown
  9         """ 添加 """
 10         """
 11         Add an element to a set.
 12         
 13         This has no effect if the element is already present.
 14         """
 15         pass
 16  
 17     def clear(self, *args, **kwargs): # real signature unknown
 18         """ Remove all elements from this set. """
 19         pass
 20  
 21     def copy(self, *args, **kwargs): # real signature unknown
 22         """ Return a shallow copy of a set. """
 23         pass
 24  
 25     def difference(self, *args, **kwargs): # real signature unknown
 26         """
 27         Return the difference of two or more sets as a new set.
 28         
 29         (i.e. all elements that are in this set but not the others.)
 30         """
 31         pass
 32  
 33     def difference_update(self, *args, **kwargs): # real signature unknown
 34         """ 删除当前set中的所有包含在 new set 里的元素 """
 35         """ Remove all elements of another set from this set. """
 36         pass
 37  
 38     def discard(self, *args, **kwargs): # real signature unknown
 39         """ 移除元素 """
 40         """
 41         Remove an element from a set if it is a member.
 42         
 43         If the element is not a member, do nothing.
 44         """
 45         pass
 46  
 47     def intersection(self, *args, **kwargs): # real signature unknown
 48         """ 取交集，新创建一个set """
 49         """
 50         Return the intersection of two or more sets as a new set.
 51         
 52         (i.e. elements that are common to all of the sets.)
 53         """
 54         pass
 55  
 56     def intersection_update(self, *args, **kwargs): # real signature unknown
 57         """ 取交集，修改原来set """
 58         """ Update a set with the intersection of itself and another. """
 59         pass
 60  
 61     def isdisjoint(self, *args, **kwargs): # real signature unknown
 62         """ 如果没有交集，返回true  """
 63         """ Return True if two sets have a null intersection. """
 64         pass
 65  
 66     def issubset(self, *args, **kwargs): # real signature unknown
 67         """ 是否是子集 """
 68         """ Report whether another set contains this set. """
 69         pass
 70  
 71     def issuperset(self, *args, **kwargs): # real signature unknown
 72         """ 是否是父集 """
 73         """ Report whether this set contains another set. """
 74         pass
 75  
 76     def pop(self, *args, **kwargs): # real signature unknown
 77         """ 移除 """
 78         """
 79         Remove and return an arbitrary set element.
 80         Raises KeyError if the set is empty.
 81         """
 82         pass
 83  
 84     def remove(self, *args, **kwargs): # real signature unknown
 85         """ 移除 """
 86         """
 87         Remove an element from a set; it must be a member.
 88         
 89         If the element is not a member, raise a KeyError.
 90         """
 91         pass
 92  
 93     def symmetric_difference(self, *args, **kwargs): # real signature unknown
 94         """ 差集，创建新对象"""
 95         """
 96         Return the symmetric difference of two sets as a new set.
 97         
 98         (i.e. all elements that are in exactly one of the sets.)
 99         """
100         pass
101  
102     def symmetric_difference_update(self, *args, **kwargs): # real signature unknown
103         """ 差集，改变原来 """
104         """ Update a set with the symmetric difference of itself and another. """
105         pass
106  
107     def union(self, *args, **kwargs): # real signature unknown
108         """ 并集 """
109         """
110         Return the union of sets as a new set.
111         
112         (i.e. all elements that are in either set.)
113         """
114         pass
115  
116     def update(self, *args, **kwargs): # real signature unknown
117         """ 更新 """
118         """ Update a set with the union of itself and others. """
119         pass
120  
121     def __len__(self): # real signature unknown; restored from __doc__
122         """ x.__len__() <==> len(x) """
123         pass
124

View Code

练习：寻找差异

# 数据库中原有

old_dict = {

"#1":{ 'hostname':c1, 'cpu_count': 2, 'mem_capicity': 80 },

"#2":{ 'hostname':c1, 'cpu_count': 2, 'mem_capicity': 80 }

"#3":{ 'hostname':c1, 'cpu_count': 2, 'mem_capicity': 80 }

}

# cmdb 新汇报的数据

new_dict = {

"#1":{ 'hostname':c1, 'cpu_count': 2, 'mem_capicity': 800 },

"#3":{ 'hostname':c1, 'cpu_count': 2, 'mem_capicity': 80 }

"#4":{ 'hostname':c2, 'cpu_count': 2, 'mem_capicity': 80 }

}

需要删除：？

需要新建：？

需要更新：？注意：无需考虑内部元素是否改变，只要原来存在，新汇报也存在，就是需要更新

View Code

九、collection系列

1、计数器（counter）

Counter是对字典类型的补充，用于追踪值的出现次数。

ps：具备字典的所有功能 + 自己的功能

c = Counter('abcdeabcdabcaba')

print c

输出：Counter({'a': 5, 'b': 4, 'c': 3, 'd': 2, 'e': 1})

View Code

2、有序字典(orderedDict )

orderdDict是对字典类型的补充，他记住了字典元素添加的顺序

View Code

3、默认字典(defaultdict)

学前需求：

1 2	`有如下值集合` `[11,22,33,44,55,66,77,88,99,90...]，将所有大于` `66` `的值保存至字典的第一个key中，将小于` `66` `的值保存至第二个key的值中。` `即： {'k1': 大于66` `, 'k2': 小于66}`

 1 values = [11, 22, 33,44,55,66,77,88,99,90]
 2  
 3 my_dict = {}
 4  
 5 for value in  values:
 6     if value>66:
 7         if my_dict.has_key('k1'):
 8             my_dict['k1'].append(value)
 9         else:
10             my_dict['k1'] = [value]
11     else:
12         if my_dict.has_key('k2'):
13             my_dict['k2'].append(value)
14         else:
15             my_dict['k2'] = [value]
16  
17 
18  
19 
20  
21 
22 from collections import defaultdict
23  
24 values = [11, 22, 33,44,55,66,77,88,99,90]
25  
26 my_dict = defaultdict(list)
27  
28 for value in  values:
29     if value>66:
30         my_dict['k1'].append(value)
31     else:
32         my_dict['k2'].append(value)
33  
34 
35 defaultdict是对字典的类型的补充，他默认给字典的值设置了一个类型。
36 
37  
38 
39  
40 
41 class defaultdict(dict):
42     """
43     defaultdict(default_factory[, ...]) --> dict with default factory
44     """
45     def copy(self): # real signature unknown; restored from __doc__
46         """ D.copy() -> a shallow copy of D. """
47         pass
48

View Code

4、可命名元组(namedtuple)

根据nametuple可以创建一个包含tuple所有功能以及其他功能的类型。

import collections

Mytuple = collections.namedtuple('Mytuple',['x', 'y', 'z'])

 1 class Mytuple(__builtin__.tuple)
 2  |  Mytuple(x, y)
 3  |  
 4  |  Method resolution order:
 5  |      Mytuple
 6  |      __builtin__.tuple
 7  |      __builtin__.object
 8  |  
 9  |  Methods defined here:
10  |  
11  |  __getnewargs__(self)
12  |      Return self as a plain tuple.  Used by copy and pickle.
13  |  
14  |  __getstate__(self)
15  |      Exclude the OrderedDict from pickling
16  |  
17  |  __repr__(self)
18  |      Return a nicely formatted representation string
19  |  
20  |  _asdict(self)
21  |      Return a new OrderedDict which maps field names to their values
22  |  
23  |  _replace(_self, **kwds)
24  |      Return a new Mytuple object replacing specified fields with new values
25  |  
26  |  
27  |  Static methods defined here:
28  |  
29  |  __new__(_cls, x, y)
30  |      Create new instance of Mytuple(x, y)
31  |  
32  |  ----------------------------------------------------------------------
33  |  Data descriptors defined here:
34  |  
35  |  __dict__
36  |      Return a new OrderedDict which maps field names to their values
37  |  
38  |  x
39  |      Alias for field number 0
40  |  
41  |  y
42  |      Alias for field number 1
43  |  
44  |  ----------------------------------------------------------------------
45  |  Data and other attributes defined here:
46  |  
47  |  _fields = ('x', 'y')
48  |  
49  |  ----------------------------------------------------------------------
50  |  Methods inherited from __builtin__.tuple:
51  |  
52  |  __add__(...)
53  |      x.__add__(y) <==> x+y
54  |  
55  |  __contains__(...)
56  |      x.__contains__(y) <==> y in x
57  |  
58  |  __eq__(...)
59  |      x.__eq__(y) <==> x==y
60  |  
61  |  __ge__(...)
62  |      x.__ge__(y) <==> x>=y
63  |

View Code

5、双向队列(deque)

一个线程安全的双向队列

View Code

注：既然有双向队列，也有单项队列（先进先出 FIFO ）

View Code

迭代器和生成器

一、迭代器

对于Python 列表的 for 循环，他的内部原理：查看下一个元素是否存在，如果存在，则取出，如果不存在，则报异常 StopIteration。（python内部对异常已处理）

View Code

二、生成器

range不是生成器和 xrange 是生成器

readlines不是生成器和 xreadlines 是生成器

>>> print range(10)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

>>> print xrange(10)

xrange(10)

生成器内部基于yield创建，即：对于生成器只有使用时才创建，从而不避免内存浪费

练习：<br>有如下列表：

[13, 22, 6, 99, 11]

请按照一下规则计算：

13 和 22 比较，将大的值放在右侧，即：[13, 22, 6, 99, 11]

22 和 6 比较，将大的值放在右侧，即：[13, 6, 22, 99, 11]

22 和 99 比较，将大的值放在右侧，即：[13, 6, 22, 99, 11]

99 和 42 比较，将大的值放在右侧，即：[13, 6, 22, 11, 99,]

13 和 6 比较，将大的值放在右侧，即：[6, 13, 22, 11, 99,]

...

 1 li = [13, 22, 6, 99, 11]
 2  
 3 for m in range(len(li)-1):
 4  
 5     for n in range(m+1, len(li)):
 6         if li[m]> li[n]:
 7             temp = li[n]
 8             li[n] = li[m]
 9             li[m] = temp
10  
11 print li

View Code

作业

　　开发一个简单的计算器程序
　　*实现对加减乘除、括号优先级的解析,并实现正确运算

公告

更多python源码，视频教程，欢迎关注公众号：南城故梦

>零起点大数据与量化分析PDF及教程源码
>利用python进行数据分析PDF及配套源码
>大数据项目实战之Python金融应用编程(数据分析、定价与量化投资)讲义及源码
>董付国老师Python教学视频
1. 课堂教学管理系统开发：在线考试功能设计与实现
2. Python+pillow图像编程；
3. Python+Socket编程
4. Python+tkinter开发；
5. Python数据分析与科学计算可视化
6. Python文件操作
7. Python多线程与多进程编程
8. Python字符串与正则表达式
.....

>数据分析教学视频
1. 轻松驾驭统计学——数据分析必备技能（12集）；
2. 轻松上手Tableau 软件——让数据可视化（9集）；
3. 竞品分析实战攻略（6集）；
4. 电商数据化运营——三大数据化工具应用（20集）；

>大数据（视频与教案）
1. hadoop
2. Scala
3. spark

>Python网络爬虫分享系列教程PDF

>【千锋】Python爬虫从入门到精通（精华版）（92集）

欢迎关注公众号获取学习资源：南城故梦

刷新页面返回顶部