复习博客2

sorted()函数
sorted()函数对所有可迭代的对象进行排序操作。

sort与sorted区别
sort是应用在list上的方法，sorted可以对所有可迭代的对象进行排序操作。
list 的sort方法返回的是对已经存在的列表进行操作，而内置函数sorted方法返回的是一个新的list，而不是在原来的基础上进行的操作。

语法

sorted 语法：

sorted(iterable[, cmp[, key[, reverse]]])

参数说明：

iterable -- 可迭代对象。
cmp -- 比较的函数，这个具有两个参数，参数的值都是从可迭代对象中取出，此函数必须遵守的规则为，大于则返回1，小于则返回-1，等于则返回0。
key -- 主要是用来进行比较的元素，只有一个参数，具体的函数的参数就是取自于可迭代对象中，指定可迭代对象中的一个元素来进行排序。
reverse -- 排序规则，reverse = True 降序， reverse = False 升序（默认）。

返回值
返回重新排序的列表

示例：

>>>a = [5,7,6,3,4,1,2]
>>> b = sorted(a)       # 保留原列表
>>> a 
[5, 7, 6, 3, 4, 1, 2]
>>> b
[1, 2, 3, 4, 5, 6, 7]

>>> L=[('b',2),('a',1),('c',3),('d',4)]
>>> sorted(L, cmp=lambda x,y:cmp(x[1],y[1]))   # 利用cmp函数
[('a', 1), ('b', 2), ('c', 3), ('d', 4)]
>>> sorted(L, key=lambda x:x[1])               # 利用key
[('a', 1), ('b', 2), ('c', 3), ('d', 4)]
 
 
>>> students = [('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)]
>>> sorted(students, key=lambda s: s[2])            # 按年龄排序
[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
 
>>> sorted(students, key=lambda s: s[2], reverse=True)       # 按降序
[('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)]

collections模块
1、计数器（counter）：统计元素的个数，并以字典的形式返回{元素：元素个数}

编写Python脚本，分析xx.log文件，按域名统计访问次数

xx.log文件内容如下：
https://www.sogo.com/ale.html
https://www.qq.com/3asd.html
https://www.sogo.com/teoans.html
https://www.bilibili.com/2
https://www.sogo.com/asd_sa.html
https://y.qq.com/
https://www.bilibili.com/1
https://dig.chouti.com/
https://www.bilibili.com/imd.html
https://www.bilibili.com/

输出：
4 www.bilibili.com
3 www.sogo.com
1 www.qq.com
1 y.qq.com
1 dig.chouti.com

方式一：
sorted()内置函数
#1、读取出文件的内容
with open('xx.log','r',encode='utf-8') as f:
      data = f.read()

#2、取域名信息
ret = re.findall(r'https://(.*?)/.*?', data)

#3、统计
dic = {}
for i in ret:
     if i not in dic:
          dic[i] = 1
     elsx:
          dic[i] += 1

#4、排序
ret2 = sorted(dic,key=lambda x: dic[x],reverse=True)

for k in ret2:
       print(dic[k], k)


方式二：
用Counter（ret）
#1、读取出文件的内容
with open('xx.log','r',encode='utf-8') as f:
      data = f.read()

#2、取域名信息
ret = re.findall(r'https://(.*?)/.*?', data)

dic = Counter(ret)
print(dic)

# 排序
ret2 = sorted(dic.items(), key=lambda x:x[1], reverse=True)
for k, v in ret2:
    print(v, k)

文件处理

文件操作：默认打开文件的模式是rt模式，r代表只读，t代表文本模式

操作文件的三种模式：r，w，a

r模式，只读模式

f.read() :一次性把文件的全部内容读取到内存（文件太大时不建议使用此方法）

readline() : 循环的读取文件中的内容，一次读取一行，因为自带换行符所以后面要加（"end=()"）

w模式，只写模式（当文件存在时清空，当文件不存在时，创建空文档）

f.write()

a模式，只追加模式（当文件不存在时，创建文件，当文件存在时光标跑到文件末尾）

posted on 2018-06-07 19:52 muzinianhua 阅读(92) 评论(0) 收藏举报