数据结构( Pyhon 语言描述 ) — — 第3章:搜索、排序和复杂度分析
- 评估算法的性能
- 评价标准
- 正确性
- 可读性和易维护性
- 运行时间性能
- 空间性能(内存)
- 度量算法的运行时间
- 示例
-
"""
Print the running times for problem sizes that double,
using a aingle loop
"""
import time
problemSize = 1000000
print("%12s%16s" % ("Problem Size", "Seconds"))
for count in range(5):
start = time.time()
#The start of algorithm
work = 1
for x in range(problemSize):
work += 1
work -= 1
#The end of algorithm
elapsed = time.time() - start
print( "%12d%16.3f" %( problemSize,elapsed ) )
problemSize *= 2
- 运行结果
-
Problem Size Seconds
1000000 1.065
2000000 2.078
4000000 4.433
8000000 7.733
16000000 18.676
- 测试时间会受到硬件和软件平台的影响
- 统计指令
- 一个嵌套循环所执行的迭代次数
- 示例
- 嵌套循环
-
#!/usr/bin/env python
# -*- coding:utf-8 -*-
# Author:Lijunjie
"""
File: counting.py
Prints the number of iterations for problem sizes that double,
using a nested loop.
"""
problemSize = 1000
print( "%12s%15s" % ( "Problem Size", "Iteration" ) )
for counter in range(5):
number = 0
#The start of the algorithm
work = 1
for j in range( problemSize ):
for k in range( problemSize ):
number += 1
work += 1
work -= 1
#The end of the algorithm
print( "%12d%15d" % ( problemSize, number ) )
problemSize *= 2
- 结果
-
Problem Size Iteration
1000 1000000
2000 4000000
4000 16000000
8000 64000000
16000 256000000
- Fibonacci数列
-
"""
Print the numbers of calls of a recursive Fibonacci
function with problem size that double
"""
from counter import Counter
def fibonacci( n, counter ):
"""Counter the numbers of calls of the fibonacci function."""
counter.increment()
if n < 3:return 1
else: return fibonacci( n - 2, counter ) + fibonacci( n - 1, counter)
problemSize = 2
print( "%12s%15s" % ( "Problem Size", "Calls" ) )
for count in range(5):
counter = Counter()
#The start of the algorithm
fibonacci( problemSize, counter )
#The end of the algorithm
print( "%12d%15s" % ( problemSize, counter ) )
problemSize *= 2
- 结果
-
Problem Size Calls
2 1
4 5
8 41
16 1973
32 4356617
- 可以显示出算法工作量递增或递减的速率,而且独立于硬件和软件平台
- 度量算法所使用的内存
- 复杂度分析
- 表示方法
- 算法工作量增加的速率,可以用问题规模的函数来表示。
- 复杂度分析要查看算法的代码,以得出这些表示
- 这些表达式使得程序员能够预计,在任何计算机上执行一个算法会表现在有多好或多差
- 复杂度的阶与大O表示法
- 搜索算法
- 搜索最小值
- 算法复杂度为 O(n)
- 顺序搜索一个列表
- 最好情况
- 复杂度为 O(1)
- 有序列表的二叉搜索( 二分搜索 )
- 比较数据项
- 二叉搜索与搜索最小项,都是假设列表中的项是可以相互比较的
- 在Python中,这意味着这些项具有相同的类型,并且它们都识别比较运算符 ==, < 和 >
- 为了允许算法对一个新对象的类使用比较运算符 ==, > 和 <,应该在类中定义 __eq__,__lt__和__gt__方法,__lt__方法的定义如下:
-
def __lt__( self, other ):
- 基本排序算法
- 选择排序
- 每一次遍历,先选出最小值,再将其移到指定位置
-
def selectionSort( lyst ):
""" The selection sort algorithm """
i = 0
while i < len(lyst) - 1:
minIndex = i
j = i + 1
while j < len(lyst):
if lyst[j] < lyst[minIndex]:
minIndex = j
j += 1
if minIndex != i: #Exchange if needed
swap( lyst, i, minIndex )
i += 1
- 复杂度分析
- 冒泡排序
- 每次遍历,把最大项交换到最后
-
def bubbleSort( lyst ):
n = len( lyst )
while n > 1:
swapped = False
i = 1 #start eack bubble
while i < n:
if lyst[i] < lyst[i-1]:
swap( lyst, i, i - 1 ) #exchange if needed
swapped = True
i += 1
if not swapped: return
n -= 1
- 修正:如果一次遍历的过程中,没有发生交换,说明列表己排好序,可以改进在最好情况下的行为,但是不会影响平均复杂度
- 复杂度
- 插入排序
- 相当于排列扑克牌的顺序
-
def insertionSort( lyst ):
i = 1
while i < len( lyst ):
itemToInsert = lyst[i]
j = i - 1
while j >= 0:
if lyst[j] > itemToInsert:
lyst[j+1] = lyst[j]
j -= 1
else:
break
lyst[j + 1] = itemToInsert
i += 1
- 复杂度分析
- 最好情况,最坏情况和平均情况的性能
- 有些算法,在最好情况和平均情况下的性能是类似的,但是在最坏的情况下,性能可能下降
- 更快的排序
- 快速排序法
- 围绕基准点项来重新排列各项,并且递归的排序基准点两侧的列表
- 代码
-
def quicksort( lyst ):
""" The quicksort algorithm """
quicksortHelper( lyst, 0, len( lyst ) - 1 )
def quicksortHelper( lyst, left, right ):
""" The quicksort helper function """
if left < right:
pivotLocation = partition( lyst, left, right )
quicksortHelper( lyst, left, pivotLocation - 1 )
quicksortHelper( lyst, pivotLocation + 1, right )
def partition( lyst, left, right ):
""" partition the list with the middle item """
#find the pivot item and exchange it with the last item
middle = ( left + right ) // 2
pivot = lyst[middle]
swap(lyst, middle, right)
#set boundary point to the first position
boundary = left
#move item less than pivot to the left
for index in range( left, right ):
if lyst[index] < pivot:
swap( lyst, index, boundary )
boundary += 1
#exchange the pivot item and boundary item, then return the boundary
swap( lyst, right, boundary )
return boundary
- 复杂度分析
- 空间分析
- 合并排序
- 平均分割一个列表,递归的对每一半列表进行排序,然后合并结果
- 代码
-
from arrays import Array
def mergeSort( lyst ):
"""The mergeSort algorithm."""
#copyBuffer temporary space needed during merge
copyBuffer = Array( len( lyst ) )
mergeSortHelper( lyst, copyBuffer, 0, len( lyst ) - 1 )
def mergeSortHelper( lyst, copyBuffer, left, right ):
""" The helper function of mergeSort algorithm """
if left < right:
middle = ( left + right ) // 2
mergeSortHelper( lyst, copyBuffer, left, middle )
mergeSortHelper( lyst, copyBuffer, middle + 1, right )
merge( lyst, copyBuffer, left, middle, right )
def merge( lyst, copyBuffer, left, middle, right ):
""" merge two sorted list into a big one """
#i1: The start of the sublist1
#i2: The start of the sublist2
i1 = left
i2 = middle + 1
for i in range( left, right + 1 ):
if i1 > middle:
copyBuffer[i] = lyst[i2]
i2 += 1
elif i2 > right:
copyBuffer[i] = lyst[i1]
i1 += 1
elif lyst[i1] < lyst[i2]:
copyBuffer[i] = lyst[i1]
i1 += 1
else:
copyBuffer[i] = lyst[i2]
i2 += 1
for i in range( left, right + 1 ):
lyst[i] = copyBuffer[i]
- arrays 为自定义的文件,如下:
-
"""
File: arrays.py
An Array is a restricted list whose clients can use
only [], len, iter, and str.
To instantiate, use
<variable> = array(<capacity>, <optional fill value>)
The fill value is None by default.
"""
class Array(object):
"""Represents an array."""
def __init__(self, capacity, fillValue = None):
"""Capacity is the static size of the array.
fillValue is placed at each position."""
self._items = list()
for count in range(capacity):
self._items.append(fillValue)
def __len__(self):
"""-> The capacity of the array."""
return len(self._items)
def __str__(self):
"""-> The string representation of the array."""
return str(self._items)
def __iter__(self):
"""Supports iteration over a view of an array."""
return iter(self._items)
def __getitem__(self, index):
"""Subscript operator for access at index."""
return self._items[index]
def __setitem__(self, index, newItem):
"""Subscript operator for replacement at index."""
self._items[index] = newItem
- 指数算法:递归式的 Fibonacci 算法
- 调用树
- 将 Fibnocci 转换为一个线性算法
-
def fibonacci( n ):
""" The liner algorithm of fibonacci """
first = 1
second = 1
count = 3
while count <= n:
sum = first + second
first = second
second = sum
count += 1
return sum
- 探查器
- 需求
- 编写一个程序,允许程序员探查不同的排序算法
- 分析
- 定义一个排序函数,并且在排序算法的函数头中包含一个 Profiler 对象
- 在排序算法的代码中,涉及统计、统计和交换的地方,用 Profiler 对象运行 comparison() 和 exchange() 方法
- Profiler 类的接口
-
探查器方法
作用
p.test( function, lyst = None, size = 10, unique = True, comp = Ture, exch = True, trace = False)
用给定的设置运行Function,并输出结果
p.comparison()
自增比较次数
p.exchange()
自增交换次数
p.__str__()
和 str(p) 相同
- 设计
- 两个模块
- Profiler ——定义了 Profiler 类
- Algorithms ——定义了用来进行探查的排序函数