python dict 排序
我们知道Python的内置dictionary数据类型是无序的,通过key来获取对应的value。可是有时我们需要对dictionary中 的item进行排序输出,可能根据key,也可能根据value来排。到底有多少种方法可以实现对dictionary的内容进行排序输出呢?下面摘取了 一些精彩的解决办法。
python对容器内数据的排序有两种,一种是容器自己的sort函数,一种是内建的sorted函数。
sort函数和sorted函数唯一的不同是,sort是在容器内(in-place)排序,sorted生成一个新的排好序的容器。
对于一个简单的数组 L=[5,2,3,1,4].
(1) L.sort(), sort(comp=None, key=None, reverse=False) --> in place sort
(2) sorted(iterable, cmp=None, key=None, reverse=False) --> return a new sorted list
- cmp specifies a custom comparison function of two arguments (iterable elements) which should return a negative, zero or positive number depending on whether the first argument is considered smaller than, equal to, or larger than the second argument:
cmp=lambda x,y: cmp(x.lower(), y.lower())
. The default value isNone
. - key specifies a function of one argument that is used to extract a comparison key from each list element:
key=str.lower
. The default value isNone
(compare the elements directly). - reverse is a boolean value. If set to
True
, then the list elements are sorted as if each comparison were reversed.
In general, the key and reverse conversion processes are much faster than specifying an equivalent cmp function. This is because cmp is called multiple times for each list element while key and reverse touch each element only once. Use functools.cmp_to_key()
to convert an old-style cmp function to a key function.
The built-in sorted()
function is guaranteed to be stable. A sort is stable if it guarantees not to change the relative order of elements that compare equal — this is helpful for sorting in multiple passes (for example, sort by department, then by salary grade).
算法的稳定性,基数排序正确性的保证,在低位排序好后,相同的高位排在一起,但要保证之前的地位相对顺序不变。
1 按照Key值排序
#最简单的方法,这个是按照key值排序:
def sortedDictValues1(adict):
items = adict.items()
items.sort()
return [value for key, value in items]
#又一个按照key值排序,貌似比上一个速度要快点
def sortedDictValues2(adict):
keys = adict.keys()
keys.sort()
return [dict[key] for key in keys]
#还是按key值排序,据说更快。。。而且当key为tuple的时候照样适用
def sortedDictValues3(adict):
keys = adict.keys()
keys.sort()
return map(adict.get, keys)
#一行语句搞定:
[(k,di[k]) for k in sorted(di.keys())]
#用sorted函数的key参数(func)排序: #按照key进行排序 print sorted(dict1.items(), key=lambda d: d[0])
2 按照value值排序
#来一个根据value排序的,先把item的key和value交换位置放入一个list中,再根据list每个元素的第一个值,即原来的value值,排序:
def sort_by_value(d):
items=d.items()
backitems=[[v[1],v[0]] for v in items]
backitems.sort()
return [ backitems[i][1] for i in range(0,len(backitems))]
#还是一行搞定:
[ v for v in sorted(di.values())]
#用lambda表达式来排序,更灵活:
sorted(d.items(), lambda x, y: cmp(x[1], y[1])), 或反序:
sorted(d.items(), lambda x, y: cmp(x[1], y[1]), reverse=True)
#用sorted函数的key参数(func)排序: # 按照value进行排序
print sorted(dict1.items(), key=lambda d: d[1])
3 扩展用法:Key Function
从Python2.4开始,list.sort() 和 sorted() 都增加了一个 ‘key’ 参数用来在进行比较之前指定每个列表元素上要调用的函数。
例1: 不区分大小写的字符串比较排序:
>>> sorted("This is a test string from Andrew".split(), key=str.lower)
['a', 'Andrew', 'from', 'is', 'string', 'test', 'This']
key应该是一个函数,其接收一个参数,并且返回一个用于排序依据的key。其执行效率很高,因为对于输入记录key function能够准确的被调用。
例2 对于复杂的对象,使用对象的下标作为key:
>>> student_tuples = [
... ('john', 'A', 15),
... ('jane', 'B', 12),
... ('dave', 'B', 10),
... ]
>>> sorted(student_tuples, key=lambda student: student[2]) # sort by age
[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
例3 使用对象的属性进行操作:
>>> class Student:
... def __init__(self, name, grade, age):
... self.name = name
... self.grade = grade
... self.age = age
... def __repr__(self):
... return repr((self.name, self.grade, self.age))
>>>
>>> student_objects = [
... Student('john', 'A', 15),
... Student('jane', 'B', 12),
... Student('dave', 'B', 10),
... ]
>>> sorted(student_objects, key=lambda student: student.age) # sort by age
[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
当列表里面每一个元素不止一个元素(比如:列表里面,元素为元祖类型),我们除了想对第一个关键字排序之外,还想在第一次的基础上面根据第二个关键字进行排序:
>>> list2 = [('d', 3), ('a', 5), ('d', 1), ('c', 2), ('d', 2)]
>>> list2
[('d', 3), ('a', 5), ('d', 1), ('c', 2), ('d', 2)]
>>> list2.sort()
>>> list2
[('a', 5), ('c', 2), ('d', 1), ('d', 2), ('d', 3)]
>>> list3 = [('d', 3), ('a', 5), ('d', 1), ('c', 2), ('d', 2)]
>>> sorted(list3, key = lambda x:(x[0],x[1]))
[('a', 5), ('c', 2), ('d', 1), ('d', 2), ('d', 3)]
4 Operator Module Functions
这个操作模块有:
operator.itemgetter() ----- 通过下标
operator.attrgetter() ----- 通过参数
operator.methodcaller() -----python 2.5 被引入,下文详细介绍
使用这几个函数,对于上面 Key Function 的例子处理起来将会更加的简便和快速
先一块介绍 operator.itemgetter() 和 operator.attrgetter() 这俩个,会更加容易理解:
例如:
>>> from operator import itemgetter, attrgetter
>>>
>>> sorted(student_tuples, key=itemgetter(2))
[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
>>>
>>> sorted(student_objects, key=attrgetter('age'))
[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
这个操作模块也允许多层次的进行排序,例如可以先排序 “成绩grand” 再排序 “年龄age”
例如:
>>> sorted(student_tuples, key=itemgetter(1,2))
[('john', 'A', 15), ('dave', 'B', 10), ('jane', 'B', 12)]
>>>
>>> sorted(student_objects, key=attrgetter('grade', 'age'))
[('john', 'A', 15), ('dave', 'B', 10), ('jane', 'B', 12)]
现在回过头来发现,上面在前几天遇到的问题,可以用这个operator.itemgetter进行解决:
>>> list = [('d',3),('a',5),('d',1),('c',2),('d',2)]
>>> from operator import itemgetter
>>> sorted(list, key=itemgetter(0,1))
[('a', 5), ('c', 2), ('d', 1), ('d', 2), ('d', 3)]
但是还是推荐 1.key function 中的方法,因为为了这一个排序而引入一个库文件,相对来说得不偿失。
下面介绍operator.methodcaller() 函数:
这个函数是对某个对象的使用固定参数进行排序,例如:str.count() 函数可以计算每个字符串对象中含有某个参数的个数,那运用这个函数我就可以通过 str.count() 计算出某个字符的个数从而来确定排序的优先级:
>>> from operator import methodcaller
>>> messages = ['critical!!!', 'hurry!', 'standby', 'immediate!!']
>>> sorted(messages, key=methodcaller('count', '!'))
['standby', 'hurry!', 'immediate!!', 'critical!!!']
5 python collections
This module implements specialized container datatypes providing alternatives to Python’s general purpose built-in containers, dict
, list
, set
, and tuple
namedtuple() factory function for creating tuple subclasses with named fields
deque list-like container with fast appends and pops on either end
Counter dict subclass for counting hashable objects
OrderedDict dict subclass that remembers the order entries were added
defaultdict dict subclass that calls a factory function to supply missing values
In addition to the concrete container classes, the collections module provides abstract base classes that can be used to test whether a class provides a particular interface, for example, whether it is hashable or a mapping.
参考:
Python: sort,sorted,OrderedDict的用法
collections — High-performance container datatypes
-- stackoverflow
How does tuple comparison work in Python
How do I sort a list of dictionaries by values of the dictionary in Python
How do I sort a dictionary by value