Character frequency

地址:http://www.codewars.com/kata/53e895e28f9e66a56900011a/train/python

 

Write a function that takes a piece of text in the form of a string and returns the letter frequency count for the text. This count excludes numbers, spaces and all punctuation marks. Upper and lower case versions of a character are equivalent and the result should all be in lowercase.

The function should return a list of tuples sorted by the most frequent letters first. Letters with the same frequency are ordered alphabetically. 
For example:
 letter_frequency('aaAabb dddDD hhcc')
will return
 [('d',5), ('a',4), ('b',2), ('c',2), ('h',2)]

Letter frequency analysis is often used to analyse simple substitution cipher texts like those created by the Caesar cipher.

 

代码,注释比较详细:

def letter_frequency(text):
  ans = []
  dic = {}
  #长度计算放在循环里效率低
  lenOfText = len(text)
  
  for i in range(0,lenOfText):
      #提前处理成小写
      alp = text.lower()[i]

      #非字母不统计
      if alp.isalpha() == False:
          continue
        
      #用字典统计字母个数    
      if dic.has_key(alp):
          dic[alp] += 1
      else:
          dic[alp] = 1
  
  #反转字典元素存入list
  for k,v in dic.items():
      ans.append((v,k))
  #按出现频率由高到底排序
  ans.sort(reverse=True)

  #频次相同,按字母序
  lenOfAns = len(ans)
  for i in range(0,lenOfAns-1):
	for j in range(i+1,lenOfAns):
		if ans[i][:1] == ans[j][:1] and ans[i][-1:] > ans[j][-1:]:
			tmp = ans[i]
			ans[i] = ans[j]
			ans[j] = tmp
  #交换字母和频次位置        
  nans = []
  for i in range(0,lenOfAns):
      nans.append((ans[i][1],ans[i][0]))

 return nans

  

posted @ 2014-10-11 09:49  开开甲  阅读(332)  评论(0编辑  收藏  举报