排序实练（1）:列表排序-插入法及排序基础认知

1.1 插入法排序：

有一个已经有序的数据序列，要求在这个已经排好的数据序列中插入一个数，但要求插入后此数据序列仍然有序，这个时候就要用到一种新的排序方法——插入排序法,插入排序的基本操作就是将一个数据插入到已经排好序的有序数据中，从而得到一个新的、个数加一的有序数据，算法适用于少量数据的排序，时间复杂度为O(n^2)。是稳定的排序方法。插入算法把要排序的数组分成两部分：第一部分包含了这个数组的所有元素，但将最后一个元素除外，而第二部分就只包含这一个元素。在第一部分排序后，再把这个最后元素插入到此刻已是有序的第一部分里的位置。

假定这个数组的序是排好的，然后从头往后，如果有数比当前外层元素的值大，则将这个数的位置往后挪，直到当前外层元素的值大于或等于它前面的位置为止.这具算法在排完前k个数之后，可以保证a[1…k]是局部有序的，保证了插入过程的正确性.

⒈从有序数列和无序数列{a2,a3，…，an}开始进行排序；　　

⒉处理第i个元素时（i=2,3，…，n），数列{a1,a2，…，ai-1}是已有序的，而数列{ai,ai+1，…，an}是无序的。用ai与ai-1，a i-2，…，a1进行比较，找出合适的位置将ai插入；

⒊重复第二步，共进行n-i次插入处理，数列全部有序。

一般来说，插入排序都采用in-place在数组上实现。具体算法描述如下：

　　⒈ 从第一个元素开始，该元素可以认为已经被排序

　　⒉ 取出下一个元素，在已经排序的元素序列中从后向前扫描

　　⒊ 如果该元素（已排序）大于新元素，将该元素移到下一位置

　　⒋ 重复步骤3，直到找到已排序的元素小于或者等于新元素的位置

　　⒌ 将新元素插入到下一位置中

　　⒍ 重复步骤2

　　如果比较操作的代价比交换操作大的话，可以采用二分查找法来减少比较操作的数目。该算法可以认为是插入排序的一个变种，称为二分查找排序。

如果目标是把n个元素的序列升序排列，那么采用插入排序存在最好情况和最坏情况。最好情况就是，序列已经是升序排列了，在这种情况下，需要进行的比较操作需（n-1）次即可。最坏情况就是，序列是降序排列，那么此时需要进行的比较共有n(n-1)/2次。插入排序的赋值操作是比较操作的次数加上 (n-1）次。平均来说插入排序算法的时间复杂度为O(n^2）。因而，插入排序不适合对于数据量比较大的排序应用。但是，如果需要排序的数据量很小，例如，量级小于千，那么插入排序还是一个不错的选择。

今天打算研究下数字排序的，结果弄巧成拙，发现下面这个例子可以用来进行字母、数字文件名的排序，程序如下：

#!/usr/bin/python
# -*- coding: UTF-8 -*-

def insertion_sort(sort_list):
    iter_len = len(sort_list)
    if iter_len < 2:
        return sort_list
    for i in range (1, iter_len):
        key = sort_list[i]
        j = i - 1
        while j >= 0 and sort_list[j] > key:
            sort_list[j+1] = sort_list[j]
            j = j - 1
        sort_list[j+1] = key


    return sort_list

if __name__ == "__main__":

    num = raw_input("enter the numbers & Comma Separated Value: ").split(" ")
    print num
    print "the sorted nums are :"
    print insertion_sort(num)

测试了以下多组数据，觉得应该还可以，当然要使用的话，肯定还有很多需要完善的地方，其实这个程序的初衷是进行插入排序

>>>
enter the numbers & Comma Separated Value: 2 4 32 64 34 78 23 2345 2345 12 1 3
['2', '4', '32', '64', '34', '78', '23', '2345', '2345', '12', '1', '3']
the sorted nums are :
['1', '12', '2', '23', '2345', '2345', '3', '32', '34', '4', '64', '78']
>>> ================================ RESTART ================================
>>>
enter the numbers & Comma Separated Value: a c d w ai ui o p
['a', 'c', 'd', 'w', 'ai', 'ui', 'o', 'p']
the sorted nums are :
['a', 'ai', 'c', 'd', 'o', 'p', 'ui', 'w']
>>> ================================ RESTART ================================
>>>
enter the numbers & Comma Separated Value: 34 we uy ia a der jb 78 34 fg qwe gh
['34', 'we', 'uy', 'ia', 'a', 'der', 'jb', '78', '34', 'fg', 'qwe', 'gh', '']
the sorted nums are :
['', '34', '34', '78', 'a', 'der', 'fg', 'gh', 'ia', 'jb', 'qwe', 'uy', 'we']
>>> ================================ RESTART ================================
>>>
enter the numbers & Comma Separated Value: awe yur awq as zx za er ea eh
['awe', 'yur', 'awq', 'as', 'zx', 'za', 'er', 'ea', 'eh']
the sorted nums are :
['as', 'awe', 'awq', 'ea', 'eh', 'er', 'yur', 'za', 'zx']
>>> ================================ RESTART ================================
>>>
enter the numbers & Comma Separated Value: AW aw ef Ef bs bS
['AW', 'aw', 'ef', 'Ef', 'bs', 'bS']
the sorted nums are :
['AW', 'Ef', 'aw', 'bS', 'bs', 'ef']
>>> ================================ RESTART ================================
>>>
enter the numbers & Comma Separated Value: 2-3 2-1 2a-7 2a-1 2a -3
['2-3', '2-1', '2a-7', '2a-1', '2a', '-3']
the sorted nums are :
['-3', '2-1', '2-3', '2a', '2a-1', '2a-7']
>>>

通过以上实践发现，其实在进行比较的过程或者说从输入那时起，整个num的值就被当做一个由字符串元素组成的list，而并不是数字，所以最终排序是按字符的Ascii进行的；

下方代码提供对指定的由数字组成的list进行排序：

#!/usr/bin/python
# -*- coding: UTF-8 -*-

def insertion_sort(sort_list):
    iter_len = len(sort_list)
    if iter_len < 2:
        return sort_list
    for i in range (1, iter_len):
        key = sort_list[i]
        j = i - 1
        while j >= 0 and sort_list[j] > key:
            sort_list[j+1] = sort_list[j]
            j = j - 1
        sort_list[j+1] = key


    return sort_list


if __name__ == "__main__":

    num = [2, 4, 32, 64, 34, 78, 23, 2345, 2345, 12, 1, 3]
    print num
    print "the sorted nums are :"
    print insertion_sort(num)

输出：

>>>
[2, 4, 32, 64, 34, 78, 23, 2345, 2345, 12, 1, 3]
the sorted nums are :
[1, 2, 3, 4, 12, 23, 32, 34, 64, 78, 2345, 2345]
>>>
通过以上两个比较，其最终的区别不外乎就是到底是对什么类型的数据进行排序；

假如我现在要对程序运行后输入的数字进行排序，又该如何处理呢？

if __name__ == "__main__":

    num = raw_input("enter the numbers : ").split(' ')
    print num
    print "the sorted nums are :"

    x = num
    x.sort(key=lambda x:int(x))
    print x

>>>
enter the numbers : 12 35 87 8 3 2
['12', '35', '87', '8', '3', '2']
the sorted nums are :
['2', '3', '8', '12', '35', '87']
>>>
以上还是将输入当做字符串来进行的，只是排序中用了类型转换，方法也不错。

python列表排序基础

List的元素可以是各种东西，字符串，字典，自己定义的类等。

Python lists have a built-in sort() method that modifies the list in-placeand asorted() built-in function that builds a new sorted list from an iterable.

There are many ways to use them to sort data and there doesn't appearto be a single, central place in the various manuals describing them,so I'll do so here.

Sorting Basics 基础排序

A simple ascending sort is very easy -- just call the sorted() function. Itreturns a new sorted list: 按升序排列 list

>>> sorted([5, 2, 3, 1, 4])
[1, 2, 3, 4, 5]

You can also use the list.sort() method of a list. It modifies the list in-place (and returns None to avoid confusion). Usually it's less convenient than sorted() - but if you don't need the original list, it's slightly more efficient. 使用方法名排序，原list值被排序后的新值替换，只对list有效

>>> a = [5, 2, 3, 1, 4]
>>> a.sort()
>>> a
[1, 2, 3, 4, 5]

Another difference is that the list.sort() method is only defined for lists. In contrast, thesorted() function accepts any iterable .sorted()可以对任何iterable，但前者只能list list.sort()方法仅被定义在list中，相反地sorted()方法对所有的可迭代序列都有效

>>> sorted({1: 'D', 2: 'B', 3: 'B', 4: 'E', 5: 'A'})
[1, 2, 3, 4, 5]

Key Functions

Starting with Python 2.4, both list.sort() and sorted() added akey parameter to specify a function to be calledon each list element prior to making comparisons.

For example, here's a case-insensitive string comparison: 增加了key来指定函数调用时对元素排序的优先级比较参数通过key指定的函数来忽略字符串的大小写

>>> sorted("This is a test string from Andrew".split(), key=str.lower)        
['a', 'Andrew', 'from', 'is', 'string', 'test', 'This']

The value of the key parameter should be a function that takes a single argument and returns a key to use for sorting purposes.This technique is fast because the key function is called exactly once for each input record.

A common pattern is to sort complex objects using some of the object'sindices as a key. For example: 使用对象的指定参数作为key参数来进行复杂对象的排序

>>> student_tuples = [
        ('john', 'A', 15),
        ('jane', 'B', 12),
        ('dave', 'B', 10),
]
>>> sorted(student_tuples, key=lambda student: student[2])   # sort by age
[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]

The same technique works for objects with named attributes. For example: 使用名字属性也一样对拥有命名属性的复杂对象也适用

>>> class Student:
        def __init__(self, name, grade, age):
                self.name = name
                self.grade = grade
                self.age = age
        def __repr__(self):
                return repr((self.name, self.grade, self.age))

>>> student_objects = [
        Student('john', 'A', 15),
        Student('jane', 'B', 12),
        Student('dave', 'B', 10),
]
>>> sorted(student_objects, key=lambda student: student.age)   # sort by age
[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]

Operator Module Functions 调用模块函数进行排序

The key-function patterns shown above are very common, so Python providesconvenience functions to make accessor functions easier and faster. Theoperator module has itemgetter,attrgetter,and starting in Python 2.6 a methodcaller function.

Using those functions, the above examples become simpler and faster.

>>> from operator import itemgetter, attrgetter

>>> sorted(student_tuples, key=itemgetter(2))
[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]

>>> sorted(student_objects, key=attrgetter('age'))
[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]

The operator module functions allow multiple levels of sorting.For example, to sort by grade then by age:

>>> sorted(student_tuples, key=itemgetter(1,2))
[('john', 'A', 15), ('dave', 'B', 10), ('jane', 'B', 12)]        先排参数1，再排参数2

>>> sorted(student_objects, key=attrgetter('grade', 'age'))
[('john', 'A', 15), ('dave', 'B', 10), ('jane', 'B', 12)]

Ascending and Descending 对指定key值排序后，再进行升序或降序排列

Both list.sort() and sorted() accept a reverse parameterwith a boolean value. This is using to flag descending sorts.For example, to get the student data in reverse age order: list.sort()和sorted()都接受一个参数reverse（True or False）来表示升序或降序排序

>>> sorted(student_tuples, key=itemgetter(2), reverse=True)
[('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)]

>>> sorted(student_objects, key=attrgetter('age'), reverse=True)
[('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)]

Sort Stability and Complex Sorts

Starting with Python 2.2, sorts are guaranteed to be stable.That means that when multiple records have the same key,their original order is preserved.

>>> data = [('red', 1), ('blue', 1), ('red', 2), ('blue', 2)]              指定的key参数排序相同时，原顺序保持不变
>>> sorted(data, key=itemgetter(0))
[('blue', 1), ('blue', 2), ('red', 1), ('red', 2)]

Notice how the two records for 'blue' retain their originalorder so that('blue', 1) is guaranteed to precede('blue', 2).

>>> data = [('red', 1), ('blue', 2), ('red', 2), ('blue', 1)]

>>> from operator import itemgetter, attrgetter
>>> sorted(data, key=itemgetter(0))
[('blue', 2), ('blue', 1), ('red', 1), ('red', 2)]

原顺序保持不变
 >>>

This wonderful property lets you build complex sorts in a series of sorting steps. For example, to sort the student data by descending grade and then ascending age, do the age sort first and then sort again using grade: 通过先后分步排列来实现多个要求的排列

>>> s = sorted(student_objects, key=attrgetter('age'))     # sort on secondary key   
>>> sorted(s, key=attrgetter('grade'), reverse=True)       # now sort on primary key, descending
[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]

The Timsort algorithm used in Python does multiple sorts efficiently because it can take advantage of any ordering already present in a dataset.

The Old Way Using Decorate-Sort-Undecorate

This idiom is called Decorate-Sort-Undecorate after its three steps:

First, the initial list is decorated with new values that control the sort order.
Second, the decorated list is sorted.
Finally, the decorations are removed, creating a list that contains only the initial values in the new order.

最老土的排序方法-DSU

排序的过程需要下列三步：

第一：对原始的list进行装饰，使得新list的值可以用来控制排序；

第二：对装饰后的list排序；

第三：将装饰删除，将排序后的装饰list重新构建为原来类型的list；

For example, to sort the student data by grade using the DSU approach:

>>> decorated = [(student.grade, i, student) for i, student in enumerate(student_objects)]
>>> decorated.sort()
>>> [student for grade, i, student in decorated]               # undecorate
[('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)]

This idiom works because tuples are compared lexicographically; the first items are compared; if they are the same then the second items are compared, and so on.

It is not strictly necessary in all cases to include the indexi in the decorated list. Including it gives two benefits:

The sort is stable - if two items have the same key, their order will be preserved in the sorted list.
The original items do not have to be comparable because the ordering of the decorated tuples will be determined by at most the first two items. So for example the original list could containcomplex numbers which cannot be sorted directly.

tuples间的比较首先比较tuples的第一个元素，如果第一个相同再比较第二个元素，以此类推；

并不是所有的情况下都需要在以上的tuples中包含索引，但是包含索引可以有以下好处：

第一：排序是稳定的，如果两个元素有相同的key，则他们的原始先后顺序保持不变；

第二：原始的元素不必用来做比较，因为tuples的第一和第二元素用来比较已经是足够了。

Another name for this idiom is Schwartzian transform, after Randal L. Schwartz, who popularized it among Perl programmers.

For large lists and lists where the comparison information is expensive to calculate, and Python versions before 2.4, DSU is likely to be thefastest way to sort the list. For 2.4 and later, key functions provide the samefunctionality.

The Old Way Using the cmp Parameter

Many constructs given in this HOWTO assume Python 2.4 or later.Before that, there was nosorted() builtin andlist.sort() took no keyword arguments.Instead, all of the Py2.x versions supported acmp parameter to handle user specified comparison functions.

In Py3.0, the cmp parameter was removed entirely (as part of a larger effort to simplify and unify the language, eliminating the conflict between rich comparisons and the __cmp__ methods).

其他语言普遍使用的排序方法-cmp函数

在python3.0中，cmp参数被彻底的移除了，从而简化和统一语言，减少了高级比较和__cmp__方法的冲突

In Py2.x, sort allowed an optional function which can be called for doing thecomparisons. That function should take two arguments to be compared andthen return a negative value for less-than, return zero if they are equal,or return a positive value for greater-than. For example, we can do:

此函数需要2个参数，然后返回负数表示小于，0表示等于，正数表示大于

>>> def numeric_compare(x, y):
        return x - y
>>> sorted([5, 2, 4, 1, 3], cmp=numeric_compare)
[1, 2, 3, 4, 5]

Or you can reverse the order of comparison with:

>>> def reverse_numeric(x, y):
        return y - x
>>> sorted([5, 2, 4, 1, 3], cmp=reverse_numeric)
[5, 4, 3, 2, 1]

When porting code from Python 2.x to 3.x, the situation can arisewhen you have the user supplying a comparison function and you need to convert that to a key function. The following wrappermakes that easy to do:

将现有的2.x的代码移植到3.x时，需要将cmp函数转化为key函数

def cmp_to_key(mycmp):
    'Convert a cmp= function into a key= function'
    class K(object):
        def __init__(self, obj, *args):
            self.obj = obj
        def __lt__(self, other):
            return mycmp(self.obj, other.obj) < 0
        def __gt__(self, other):
            return mycmp(self.obj, other.obj) > 0
        def __eq__(self, other):
            return mycmp(self.obj, other.obj) == 0
        def __le__(self, other):
            return mycmp(self.obj, other.obj) <= 0
        def __ge__(self, other):
            return mycmp(self.obj, other.obj) >= 0
        def __ne__(self, other):
            return mycmp(self.obj, other.obj) != 0
    return K

To convert to a key function, just wrap the old comparison function:

>>> sorted([5, 2, 4, 1, 3], key=cmp_to_key(reverse_numeric))
[5, 4, 3, 2, 1]

In Python 2.7, the cmp_to_key() tool was added to thefunctools module.

Odd and Ends

For locale aware sorting, use locale.strxfrm() for a key function orlocale.strcoll() for a comparison function.
The reverse parameter still maintains sort stability (i.e. records with equal keys retain the original order). Interestingly, that effect can be simulated without the parameter by using the built inreversed function twice:
- ```
>>> data = [('red', 1), ('blue', 1), ('red', 2), ('blue', 2)]
>>> assert sorted(data, reverse=True) == list(reversed(sorted(reversed(data))))
```
To create a standard sort order for a class, just add the appropriate rich comparison methods:

排序在内部是调用元素的__cmp__来进行的，所以我们可以为元素类型增加__cmp__方法使得元素可比较

>>> Student.__eq__ = lambda self, other: self.age == other.age
>>> Student.__ne__ = lambda self, other: self.age != other.age
>>> Student.__lt__ = lambda self, other: self.age < other.age
>>> Student.__le__ = lambda self, other: self.age <= other.age
>>> Student.__gt__ = lambda self, other: self.age > other.age
>>> Student.__ge__ = lambda self, other: self.age >= other.age
>>> sorted(student_objects)
[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]

For general purpose comparisons, the recommended approach is to define all six rich comparison operators. Thefunctools.total_ordering class decorator makes this easy to implement.

Key functions need not access data internal to objects being sorted. A key function can also access external resources. For instance, if the student grades are stored in a dictionary, they can be used to sort a separate list of student names:
- key函数不仅可以访问需要排序元素的内部数据，还可以访问外部的资源
- >>> students = ['dave', 'john', 'jane']
- >>> newgrades = {'john': 'F', 'jane':'A', 'dave': 'C'}
- >>> sorted(students, key=newgrades.__getitem__)['jane', 'dave', 'john']

当需要在处理数据的同时进行排序的话，sort(),sorted()或bisect.insort()不是最好的方法。在这种情况下，可以使用heap，red-black tree或treap

由一个排序算法引出了这么多文字，其实还没有完，鉴于篇幅太长，决定下篇再继续讨论和实践

posted on 2022-07-05 18:13 我在全球村阅读(211) 评论(0) 编辑收藏举报

刷新页面返回顶部