3-2 从单词中获取单词出现的频率信息,并把他们写进对应的列表里
流畅的python字典中的示例3-2
创建一个单词从其出现情况的映射
1 import sys 2 import re 3 4 WORD_RE = re.compile(r'\w+') 5 6 index = {} 7 8 with open(sys.argv[1], encoding='utf-8') as fp: 9 for line_no, line in enumerate(fp, 1): 10 for match in WORD_RE.finditer(line): 11 word = match.group() 12 column_no = match.start() + 1 13 location = (line_no, column_no) 14 occurrences = index.get(word, []) 15 occurrences.append(location) 16 index[word] = occurrences 17 18 for word in sorted(index, key=str.upper): 19 print(word, index[word])
【理解点】
- sys.argv[1]的作用是什么?如何使用?
- enumerate()函数的作用是什么,如何使用?
- word的查询频率是什么?
【运行结果】
1 # 在index0.py下创建一个aa.txt文件,存入单词 2 F:\python_interface_test\python_interface_test\prepare_data>python index0.py aa.txt 3 16 [(14, 19)] 4 2006 [(11, 93)] 5 21 [(1, 51)] 6 21st [(8, 29)] 7 27 [(1, 41)] 8 a [(2, 60), (5, 30), (6, 73)] 9 against [(4, 61)]
【优化】
1 import sys 2 import re 3 4 WORD_RE = re.compile(r'\w+') 5 6 index = {} 7 8 with open(sys.argv[1], encoding='utf-8') as fp: 9 for line_no, line in enumerate(fp, 1): 10 for match in WORD_RE.finditer(line): 11 word = match.group() 12 column_no = match.start() + 1 13 location = (line_no, column_no) 14 15 index.setdefault(word, []).append(location ) 16 17 for word in sorted(index, key=str.upper): 18 print(word, index[word])