Character frequency
地址:http://www.codewars.com/kata/53e895e28f9e66a56900011a/train/python
Write a function that takes a piece of text in the form of a string and returns the letter frequency count for the text. This count excludes numbers, spaces and all punctuation marks. Upper and lower case versions of a character are equivalent and the result should all be in lowercase.
The function should return a list of tuples sorted by the most frequent letters first. Letters with the same frequency are ordered alphabetically.
For example:
letter_frequency('aaAabb dddDD hhcc')
will return
[('d',5), ('a',4), ('b',2), ('c',2), ('h',2)]
Letter frequency analysis is often used to analyse simple substitution cipher texts like those created by the Caesar cipher.
代码,注释比较详细:
def letter_frequency(text): ans = [] dic = {} #长度计算放在循环里效率低 lenOfText = len(text) for i in range(0,lenOfText): #提前处理成小写 alp = text.lower()[i] #非字母不统计 if alp.isalpha() == False: continue #用字典统计字母个数 if dic.has_key(alp): dic[alp] += 1 else: dic[alp] = 1 #反转字典元素存入list for k,v in dic.items(): ans.append((v,k)) #按出现频率由高到底排序 ans.sort(reverse=True) #频次相同,按字母序 lenOfAns = len(ans) for i in range(0,lenOfAns-1): for j in range(i+1,lenOfAns): if ans[i][:1] == ans[j][:1] and ans[i][-1:] > ans[j][-1:]: tmp = ans[i] ans[i] = ans[j] ans[j] = tmp #交换字母和频次位置 nans = [] for i in range(0,lenOfAns): nans.append((ans[i][1],ans[i][0])) return nans