学习笔记之Problem Solving with Algorithms and Data Structures using Python
Problem Solving with Algorithms and Data Structures using Python — Problem Solving with Algorithms and Data Structures
- By Brad Miller and David Ranum, Luther College
- http://interactivepython.org/runestone/static/pythonds/index.html
- https://runestone.academy/runestone/static/pythonds/index.html
Introduction · python-data-structure-cn
- https://facert.gitbooks.io/python-data-structure-cn/
Problem Solving with Algorithms and Data Structures Using Python SECOND EDITION: Bradley N. Miller, David L. Ranum: 9781590282571: Amazon.com: Books
- https://www.amazon.com/Problem-Solving-Algorithms-Structures-Python/dp/1590282574
- THIS TEXTBOOK is about computer science. It is also about Python. However, there is much more. The study of algorithms and data structures is central to understanding what computer science is all about. Learning computer science is not unlike learning any other type of difficult subject matter. The only way to be successful is through deliberate and incremental exposure to the fundamental ideas. A beginning computer scientist needs practice so that there is a thorough understanding before continuing on to the more complex parts of the curriculum. In addition, a beginner needs to be given the opportunity to be successful and gain confidence. This textbook is designed to serve as a text for a first course on data structures and algorithms, typically taught as the second course in the computer science curriculum. Even though the second course is considered more advanced than the first course, this book assumes you are beginners at this level. You may still be struggling with some of the basic ideas and skills from a first computer science course and yet be ready to further explore the discipline and continue to practice problem solving. We cover abstract data types and data structures, writing algorithms, and solving problems. We look at a number of data structures and solve classic problems that arise. The tools and techniques that you learn here will be applied over and over as you continue your study of computer science.
1. Introduction
1.8.2. Built-in Collection Data Types
Method Name | Use | Explanation |
---|---|---|
append |
alist.append(item) |
Adds a new item to the end of a list |
insert |
alist.insert(i,item) |
Inserts an item at the ith position in a list |
pop |
alist.pop() |
Removes and returns the last item in a list |
pop |
alist.pop(i) |
Removes and returns the ith item in a list |
sort |
alist.sort() |
Modifies a list to be sorted |
reverse |
alist.reverse() |
Modifies a list to be in reverse order |
del |
del alist[i] |
Deletes the item in the ith position |
index |
alist.index(item) |
Returns the index of the first occurrence of item |
count |
alist.count(item) |
Returns the number of occurrences of item |
remove |
alist.remove(item) |
Removes the first occurrence of item |
Method Name | Use | Explanation |
---|---|---|
center |
astring.center(w) |
Returns a string centered in a field of size w |
count |
astring.count(item) |
Returns the number of occurrences of item in the string |
ljust |
astring.ljust(w) |
Returns a string left-justified in a field of size w |
lower |
astring.lower() |
Returns a string in all lowercase |
rjust |
astring.rjust(w) |
Returns a string right-justified in a field of size w |
find |
astring.find(item) |
Returns the index of the first occurrence of item |
split |
astring.split(schar) |
Splits a string into substrings at schar |
Operation Name | Operator | Explanation |
---|---|---|
membership | in | Set membership |
length | len | Returns the cardinality of the set |
| |
aset | otherset |
Returns a new set with all elements from both sets |
& |
aset & otherset |
Returns a new set with only those elements common to both sets |
- |
aset - otherset |
Returns a new set with all items from the first set not in second |
<= |
aset <= otherset |
Asks whether all elements of the first set are in the second |
Method Name | Use | Explanation |
---|---|---|
union |
aset.union(otherset) |
Returns a new set with all elements from both sets |
intersection |
aset.intersection(otherset) |
Returns a new set with only those elements common to both sets |
difference |
aset.difference(otherset) |
Returns a new set with all items from first set not in second |
issubset |
aset.issubset(otherset) |
Asks whether all elements of one set are in the other |
add |
aset.add(item) |
Adds item to the set |
remove |
aset.remove(item) |
Removes item from the set |
pop |
aset.pop() |
Removes an arbitrary element from the set |
clear |
aset.clear() |
Removes all elements from the set |
Operator | Use | Explanation |
---|---|---|
[] |
myDict[k] |
Returns the value associated with k , otherwise its an error |
in |
key in adict |
Returns True if key is in the dictionary, False otherwise |
del |
del adict[key] |
Removes the entry from the dictionary |
Method Name | Use | Explanation |
---|---|---|
keys |
adict.keys() |
Returns the keys of the dictionary in a dict_keys object |
values |
adict.values() |
Returns the values of the dictionary in a dict_values object |
items |
adict.items() |
Returns the key-value pairs in a dict_items object |
get |
adict.get(k) |
Returns the value associated with k , None otherwise |
get |
adict.get(k,alt) |
Returns the value associated with k , alt otherwise |
1.9. Input and Output
- aName = input('Please enter your name: ')
-
print("Hello", "World", sep="***")
-
print("Hello", "World", end="***")
1.13. Object-Oriented Programming in Python: Defining Classes
- When designing classes, it is very important to distinguish between those that have the IS-A relationship (which requires inheritance) and those that have HAS-A relationships (with no inheritance).
1.14. Summary
- Computer science uses abstraction as a tool for representing both processes and data.
- Abstract data types allow programmers to manage the complexity of a problem domain by hiding the details of the data.
- Python is a powerful, yet easy-to-use, object-oriented language.
- Lists, tuples, and strings are built in Python sequential collections.
- Dictionaries and sets are nonsequential collections of data.
- Classes allow programmers to implement abstract data types.
- Programmers can override standard methods as well as create new methods.
- Classes can be organized into hierarchies.
- A class constructor should always invoke the constructor of its parent before continuing on with its own data and behavior.
4. Basic Data Structures
4.2. What Are Linear Structures?
- Stacks, queues, deques, and lists are examples of data collections whose items are ordered depending on how they are added or removed. Once an item is added, it stays in that position relative to the other elements that came before and came after it. Collections such as these are often referred to as linear data structures.
4.5. Implementing a Stack in Python
1 class Stack: 2 def __init__(self): 3 self.items = [] 4 5 def isEmpty(self): 6 return self.items == [] 7 8 def push(self, item): 9 self.items.append(item) 10 11 def pop(self): 12 return self.items.pop() 13 14 def peek(self): 15 return self.items[len(self.items)-1] 16 17 def size(self): 18 return len(self.items)
4.7. Balanced Symbols (A General Case)
1 from pythonds.basic import Stack 2 3 def parChecker(symbolString): 4 s = Stack() 5 balanced = True 6 index = 0 7 while index < len(symbolString) and balanced: 8 symbol = symbolString[index] 9 if symbol in "([{": 10 s.push(symbol) 11 else: 12 if s.isEmpty(): 13 balanced = False 14 else: 15 top = s.pop() 16 if not matches(top,symbol): 17 balanced = False 18 index = index + 1 19 if balanced and s.isEmpty(): 20 return True 21 else: 22 return False 23 24 def matches(open,close): 25 opens = "([{" 26 closers = ")]}" 27 return opens.index(open) == closers.index(close) 28 29 30 print(parChecker('{({([][])}())}')) 31 print(parChecker('[{()]'))
4.8. Converting Decimal Numbers to Binary Numbers
1 from pythonds.basic import Stack 2 3 def baseConverter(decNumber,base): 4 digits = "0123456789ABCDEF" 5 6 remstack = Stack() 7 8 while decNumber > 0: 9 rem = decNumber % base 10 remstack.push(rem) 11 decNumber = decNumber // base 12 13 newString = "" 14 while not remstack.isEmpty(): 15 newString = newString + digits[remstack.pop()] 16 17 return newString 18 19 print(baseConverter(25,2)) 20 print(baseConverter(25,16))
4.9. Infix, Prefix and Postfix Expressions
1 from pythonds.basic import Stack 2 3 def infixToPostfix(infixexpr): 4 prec = {} 5 prec["*"] = 3 6 prec["/"] = 3 7 prec["+"] = 2 8 prec["-"] = 2 9 prec["("] = 1 10 opStack = Stack() 11 postfixList = [] 12 tokenList = infixexpr.split() 13 14 for token in tokenList: 15 if token in "ABCDEFGHIJKLMNOPQRSTUVWXYZ" or token in "0123456789": 16 postfixList.append(token) 17 elif token == '(': 18 opStack.push(token) 19 elif token == ')': 20 topToken = opStack.pop() 21 while topToken != '(': 22 postfixList.append(topToken) 23 topToken = opStack.pop() 24 else: 25 while (not opStack.isEmpty()) and \ 26 (prec[opStack.peek()] >= prec[token]): 27 postfixList.append(opStack.pop()) 28 opStack.push(token) 29 30 while not opStack.isEmpty(): 31 postfixList.append(opStack.pop()) 32 return " ".join(postfixList) 33 34 print(infixToPostfix("A * B + C * D")) 35 print(infixToPostfix("( A + B ) * C - ( D - E ) * ( F + G )"))
1 from pythonds.basic import Stack 2 3 def postfixEval(postfixExpr): 4 operandStack = Stack() 5 tokenList = postfixExpr.split() 6 7 for token in tokenList: 8 if token in "0123456789": 9 operandStack.push(int(token)) 10 else: 11 operand2 = operandStack.pop() 12 operand1 = operandStack.pop() 13 result = doMath(token,operand1,operand2) 14 operandStack.push(result) 15 return operandStack.pop() 16 17 def doMath(op, op1, op2): 18 if op == "*": 19 return op1 * op2 20 elif op == "/": 21 return op1 / op2 22 elif op == "+": 23 return op1 + op2 24 else: 25 return op1 - op2 26 27 print(postfixEval('7 8 + 3 2 + /'))
4.12. Implementing a Queue in Python
1 class Queue: 2 def __init__(self): 3 self.items = [] 4 5 def isEmpty(self): 6 return self.items == [] 7 8 def enqueue(self, item): 9 self.items.insert(0,item) 10 11 def dequeue(self): 12 return self.items.pop() 13 14 def size(self): 15 return len(self.items)
4.15. What Is a Deque?
- A deque, also known as a double-ended queue, is an ordered collection of items similar to the queue.
4.17. Implementing a Deque in Python
1 class Deque: 2 def __init__(self): 3 self.items = [] 4 5 def isEmpty(self): 6 return self.items == [] 7 8 def addFront(self, item): 9 self.items.append(item) 10 11 def addRear(self, item): 12 self.items.insert(0,item) 13 14 def removeFront(self): 15 return self.items.pop() 16 17 def removeRear(self): 18 return self.items.pop(0) 19 20 def size(self): 21 return len(self.items)
4.21. Implementing an Unordered List: Linked Lists
- In order to implement an unordered list, we will construct what is commonly known as a linked list.
1 class Node: 2 def __init__(self,initdata): 3 self.data = initdata 4 self.next = None 5 6 def getData(self): 7 return self.data 8 9 def getNext(self): 10 return self.next 11 12 def setData(self,newdata): 13 self.data = newdata 14 15 def setNext(self,newnext): 16 self.next = newnext 17 18 19 class UnorderedList: 20 21 def __init__(self): 22 self.head = None 23 24 def isEmpty(self): 25 return self.head == None 26 27 def add(self,item): 28 temp = Node(item) 29 temp.setNext(self.head) 30 self.head = temp 31 32 def size(self): 33 current = self.head 34 count = 0 35 while current != None: 36 count = count + 1 37 current = current.getNext() 38 39 return count 40 41 def search(self,item): 42 current = self.head 43 found = False 44 while current != None and not found: 45 if current.getData() == item: 46 found = True 47 else: 48 current = current.getNext() 49 50 return found 51 52 def remove(self,item): 53 current = self.head 54 previous = None 55 found = False 56 while not found: 57 if current.getData() == item: 58 found = True 59 else: 60 previous = current 61 current = current.getNext() 62 63 if previous == None: 64 self.head = current.getNext() 65 else: 66 previous.setNext(current.getNext())
4.22. The Ordered List Abstract Data Type
- The structure of an ordered list is a collection of items where each item holds a relative position that is based upon some underlying characteristic of the item. The ordering is typically either ascending or descending and we assume that list items have a meaningful comparison operation that is already defined.
4.23. Implementing an Ordered List
1 class OrderedList: 2 def __init__(self): 3 self.head = None 4 5 def search(self,item): 6 current = self.head 7 found = False 8 stop = False 9 while current != None and not found and not stop: 10 if current.getData() == item: 11 found = True 12 else: 13 if current.getData() > item: 14 stop = True 15 else: 16 current = current.getNext() 17 18 return found 19 20 def add(self,item): 21 current = self.head 22 previous = None 23 stop = False 24 while current != None and not stop: 25 if current.getData() > item: 26 stop = True 27 else: 28 previous = current 29 current = current.getNext() 30 31 temp = Node(item) 32 if previous == None: 33 temp.setNext(self.head) 34 self.head = temp 35 else: 36 temp.setNext(current) 37 previous.setNext(temp)
4.24. Summary
- Linear data structures maintain their data in an ordered fashion.
- Stacks are simple data structures that maintain a LIFO, last-in first-out, ordering.
- The fundamental operations for a stack are push, pop, and isEmpty.
- Queues are simple data structures that maintain a FIFO, first-in first-out, ordering.
- The fundamental operations for a queue are enqueue, dequeue, and isEmpty.
- Prefix, infix, and postfix are all ways to write expressions.
- Stacks are very useful for designing algorithms to evaluate and translate expressions.
- Stacks can provide a reversal characteristic.
- Queues can assist in the construction of timing simulations.
- Simulations use random number generators to create a real-life situation and allow us to answer “what if” types of questions.
- Deques are data structures that allow hybrid behavior like that of stacks and queues.
- The fundamental operations for a deque are addFront, addRear, removeFront, removeRear, and isEmpty.
- Lists are collections of items where each item holds a relative position.
- A linked list implementation maintains logical order without requiring physical storage requirements.
- Modification to the head of the linked list is a special case.
5. Recursion
5.6. Stack Frames: Implementing Recursion
1 from pythonds.basic import Stack 2 3 rStack = Stack() 4 5 def toStr(n,base): 6 convertString = "0123456789ABCDEF" 7 while n > 0: 8 if n < base: 9 rStack.push(convertString[n]) 10 else: 11 rStack.push(convertString[n % base]) 12 n = n // base 13 res = "" 14 while not rStack.isEmpty(): 15 res = res + str(rStack.pop()) 16 return res 17 18 print(toStr(1453,16))
5.7. Introduction: Visualizing Recursion
1 import turtle 2 3 myTurtle = turtle.Turtle() 4 myWin = turtle.Screen() 5 6 def drawSpiral(myTurtle, lineLen): 7 if lineLen > 0: 8 myTurtle.forward(lineLen) 9 myTurtle.right(90) 10 drawSpiral(myTurtle,lineLen-5) 11 12 drawSpiral(myTurtle,100) 13 myWin.exitonclick()
1 import turtle 2 3 def tree(branchLen,t): 4 if branchLen > 5: 5 t.forward(branchLen) 6 t.right(20) 7 tree(branchLen-15,t) 8 t.left(40) 9 tree(branchLen-15,t) 10 t.right(20) 11 t.backward(branchLen) 12 13 def main(): 14 t = turtle.Turtle() 15 myWin = turtle.Screen() 16 t.left(90) 17 t.up() 18 t.backward(100) 19 t.down() 20 t.color("green") 21 tree(75,t) 22 myWin.exitonclick() 23 24 main()
5.8. Sierpinski Triangle
1 import turtle 2 3 def drawTriangle(points,color,myTurtle): 4 myTurtle.fillcolor(color) 5 myTurtle.up() 6 myTurtle.goto(points[0][0],points[0][1]) 7 myTurtle.down() 8 myTurtle.begin_fill() 9 myTurtle.goto(points[1][0],points[1][1]) 10 myTurtle.goto(points[2][0],points[2][1]) 11 myTurtle.goto(points[0][0],points[0][1]) 12 myTurtle.end_fill() 13 14 def getMid(p1,p2): 15 return ( (p1[0]+p2[0]) / 2, (p1[1] + p2[1]) / 2) 16 17 def sierpinski(points,degree,myTurtle): 18 colormap = ['blue','red','green','white','yellow', 19 'violet','orange'] 20 drawTriangle(points,colormap[degree],myTurtle) 21 if degree > 0: 22 sierpinski([points[0], 23 getMid(points[0], points[1]), 24 getMid(points[0], points[2])], 25 degree-1, myTurtle) 26 sierpinski([points[1], 27 getMid(points[0], points[1]), 28 getMid(points[1], points[2])], 29 degree-1, myTurtle) 30 sierpinski([points[2], 31 getMid(points[2], points[1]), 32 getMid(points[0], points[2])], 33 degree-1, myTurtle) 34 35 def main(): 36 myTurtle = turtle.Turtle() 37 myWin = turtle.Screen() 38 myPoints = [[-100,-50],[0,100],[100,-50]] 39 sierpinski(myPoints,3,myTurtle) 40 myWin.exitonclick() 41 42 main()
5.10. Tower of Hanoi
1 def moveTower(height,fromPole, toPole, withPole): 2 if height >= 1: 3 moveTower(height-1,fromPole,withPole,toPole) 4 moveDisk(fromPole,toPole) 5 moveTower(height-1,withPole,toPole,fromPole) 6 7 def moveDisk(fp,tp): 8 print("moving disk from",fp,"to",tp) 9 10 moveTower(3,"A","B","C")
5.11. Exploring a Maze
1 """ 2 maze2.txt 3 4 ++++++++++++++++++++++ 5 + + ++ ++ + 6 + ++++++++++ 7 + + ++ ++++ +++ ++ 8 + + + + ++ +++ + 9 + ++ ++ + + 10 +++++ + + ++ + + 11 +++++ +++ + + ++ + 12 + + + S+ + + 13 +++++ + + + + + + 14 ++++++++++++++++++++++ 15 """ 16 17 18 import turtle 19 20 PART_OF_PATH = 'O' 21 TRIED = '.' 22 OBSTACLE = '+' 23 DEAD_END = '-' 24 25 26 class Maze: 27 def __init__(self,mazeFileName): 28 rowsInMaze = 0 29 columnsInMaze = 0 30 self.mazelist = [] 31 mazeFile = open(mazeFileName,'r') 32 rowsInMaze = 0 33 for line in mazeFile: 34 rowList = [] 35 col = 0 36 for ch in line[:-1]: 37 rowList.append(ch) 38 if ch == 'S': 39 self.startRow = rowsInMaze 40 self.startCol = col 41 col = col + 1 42 rowsInMaze = rowsInMaze + 1 43 self.mazelist.append(rowList) 44 columnsInMaze = len(rowList) 45 46 self.rowsInMaze = rowsInMaze 47 self.columnsInMaze = columnsInMaze 48 self.xTranslate = -columnsInMaze/2 49 self.yTranslate = rowsInMaze/2 50 self.t = turtle.Turtle() 51 self.t.shape('turtle') 52 self.wn = turtle.Screen() 53 self.wn.setworldcoordinates(-(columnsInMaze-1)/2-.5,-(rowsInMaze-1)/2-.5,(columnsInMaze-1)/2+.5,(rowsInMaze-1)/2+.5) 54 55 def drawMaze(self): 56 self.t.speed(10) 57 self.wn.tracer(0) 58 for y in range(self.rowsInMaze): 59 for x in range(self.columnsInMaze): 60 if self.mazelist[y][x] == OBSTACLE: 61 self.drawCenteredBox(x+self.xTranslate,-y+self.yTranslate,'orange') 62 self.t.color('black') 63 self.t.fillcolor('blue') 64 self.wn.update() 65 self.wn.tracer(1) 66 67 def drawCenteredBox(self,x,y,color): 68 self.t.up() 69 self.t.goto(x-.5,y-.5) 70 self.t.color(color) 71 self.t.fillcolor(color) 72 self.t.setheading(90) 73 self.t.down() 74 self.t.begin_fill() 75 for i in range(4): 76 self.t.forward(1) 77 self.t.right(90) 78 self.t.end_fill() 79 80 def moveTurtle(self,x,y): 81 self.t.up() 82 self.t.setheading(self.t.towards(x+self.xTranslate,-y+self.yTranslate)) 83 self.t.goto(x+self.xTranslate,-y+self.yTranslate) 84 85 def dropBreadcrumb(self,color): 86 self.t.dot(10,color) 87 88 def updatePosition(self,row,col,val=None): 89 if val: 90 self.mazelist[row][col] = val 91 self.moveTurtle(col,row) 92 93 if val == PART_OF_PATH: 94 color = 'green' 95 elif val == OBSTACLE: 96 color = 'red' 97 elif val == TRIED: 98 color = 'black' 99 elif val == DEAD_END: 100 color = 'red' 101 else: 102 color = None 103 104 if color: 105 self.dropBreadcrumb(color) 106 107 def isExit(self,row,col): 108 return (row == 0 or 109 row == self.rowsInMaze-1 or 110 col == 0 or 111 col == self.columnsInMaze-1 ) 112 113 def __getitem__(self,idx): 114 return self.mazelist[idx] 115 116 117 def searchFrom(maze, startRow, startColumn): 118 # try each of four directions from this point until we find a way out. 119 # base Case return values: 120 # 1. We have run into an obstacle, return false 121 maze.updatePosition(startRow, startColumn) 122 if maze[startRow][startColumn] == OBSTACLE : 123 return False 124 # 2. We have found a square that has already been explored 125 if maze[startRow][startColumn] == TRIED or maze[startRow][startColumn] == DEAD_END: 126 return False 127 # 3. We have found an outside edge not occupied by an obstacle 128 if maze.isExit(startRow,startColumn): 129 maze.updatePosition(startRow, startColumn, PART_OF_PATH) 130 return True 131 maze.updatePosition(startRow, startColumn, TRIED) 132 # Otherwise, use logical short circuiting to try each direction 133 # in turn (if needed) 134 found = searchFrom(maze, startRow-1, startColumn) or \ 135 searchFrom(maze, startRow+1, startColumn) or \ 136 searchFrom(maze, startRow, startColumn-1) or \ 137 searchFrom(maze, startRow, startColumn+1) 138 if found: 139 maze.updatePosition(startRow, startColumn, PART_OF_PATH) 140 else: 141 maze.updatePosition(startRow, startColumn, DEAD_END) 142 return found 143 144 145 myMaze = Maze('maze2.txt') 146 myMaze.drawMaze() 147 myMaze.updatePosition(myMaze.startRow,myMaze.startCol) 148 149 searchFrom(myMaze, myMaze.startRow, myMaze.startCol)
5.12. Dynamic Programming
1 def dpMakeChange(coinValueList,change,minCoins,coinsUsed): 2 for cents in range(change+1): 3 coinCount = cents 4 newCoin = 1 5 for j in [c for c in coinValueList if c <= cents]: 6 if minCoins[cents-j] + 1 < coinCount: 7 coinCount = minCoins[cents-j]+1 8 newCoin = j 9 minCoins[cents] = coinCount 10 coinsUsed[cents] = newCoin 11 return minCoins[change] 12 13 def printCoins(coinsUsed,change): 14 coin = change 15 while coin > 0: 16 thisCoin = coinsUsed[coin] 17 print(thisCoin) 18 coin = coin - thisCoin 19 20 def main(): 21 amnt = 63 22 clist = [1,5,10,21,25] 23 coinsUsed = [0]*(amnt+1) 24 coinCount = [0]*(amnt+1) 25 26 print("Making change for",amnt,"requires") 27 print(dpMakeChange(clist,amnt,coinCount,coinsUsed),"coins") 28 print("They are:") 29 printCoins(coinsUsed,amnt) 30 print("The used list is as follows:") 31 print(coinsUsed) 32 33 main()
5.16. Glossary
- immutable data type
- A data type which cannot be modified. Assignments to elements or slices of immutable types cause a runtime error.
- mutable data type
- A data type which can be modified. All mutable types are compound types. Lists and dictionaries (see next chapter) are mutable data types; strings and tuples are not.
6. Sorting and Searching
6.3. The Sequential Search
- When data items are stored in a collection such as a list, we say that they have a linear or sequential relationship.
6.4. The Binary Search
1 def binarySearch(alist, item): 2 first = 0 3 last = len(alist)-1 4 found = False 5 6 while first <= last and not found: 7 midpoint = (first + last) // 2 8 if alist[midpoint] == item: 9 found = True 10 else: 11 if item < alist[midpoint]: 12 last = midpoint-1 13 else: 14 first = midpoint+1 15 16 return found 17 18 19 testlist = [0, 1, 2, 8, 13, 17, 19, 32, 42,] 20 print(binarySearch(testlist, 3)) 21 print(binarySearch(testlist, 13))
- Divide and conquer means that we divide the problem into smaller pieces, solve the smaller pieces in some way, and then reassemble the whole problem to get the result.
def binarySearch(alist, item): if len(alist) == 0: return False else: midpoint = len(alist)//2 if alist[midpoint]==item: return True else: if item < alist[midpoint]: return binarySearch(alist[:midpoint], item) else: return binarySearch(alist[midpoint+1:], item) testlist = [0, 1, 2, 8, 13, 17, 19, 32, 42,] print(binarySearch(testlist, 3)) print(binarySearch(testlist, 13))
6.5. Hashing
- A hash table is a collection of items which are stored in such a way as to make it easy to find them later. Each position of the hash table, often called a slot, can hold an item and is named by an integer value starting at 0.
- The mapping between an item and the slot where that item belongs in the hash table is called the hash function.
- Once the hash values have been computed, we can insert each item into the hash table at the designated position as shown in Figure 5. Note that 6 of the 11 slots are now occupied. This is referred to as the load factor, and is commonly denoted by 𝜆=𝑛𝑢𝑚𝑏𝑒𝑟𝑜𝑓𝑖𝑡𝑒𝑚𝑠/𝑡𝑎𝑏𝑙𝑒𝑠𝑖𝑧𝑒. For this example, 𝜆=6/11.
- According to the hash function, two or more items would need to be in the same slot. This is referred to as a collision (it may also be called a “clash”).
6.5.1. Hash Functions
- Given a collection of items, a hash function that maps each item into a unique slot is referred to as a perfect hash function.
- One way to always have a perfect hash function is to increase the size of the hash table so that each possible value in the item range can be accommodated. This guarantees that each item will have a unique slot. Although this is practical for small numbers of items, it is not feasible when the number of possible items is large.
- The folding method for constructing hash functions begins by dividing the item into equal-size pieces (the last piece may not be of equal size). These pieces are then added together to give the resulting hash value.
- Some folding methods go one step further and reverse every other piece before the addition.
- Another numerical technique for constructing a hash function is called the mid-square method. We first square the item, and then extract some portion of the resulting digits.
1 def hash(astring, tablesize): 2 sum = 0 3 for pos in range(len(astring)): 4 sum = sum + ord(astring[pos]) 5 6 return sum % tablesize
6.5.2. Collision Resolution
- When two items hash to the same slot, we must have a systematic method for placing the second item in the hash table. This process is called collision resolution.
- One method for resolving collisions looks into the hash table and tries to find another open slot to hold the item that caused the collision. A simple way to do this is to start at the original hash value position and then move in a sequential manner through the slots until we encounter the first slot that is empty. Note that we may need to go back to the first slot (circularly) to cover the entire hash table. This collision resolution process is referred to as open addressing in that it tries to find the next open slot or address in the hash table. By systematically visiting each slot one at a time, we are performing an open addressing technique called linear probing.
- Once we have built a hash table using open addressing and linear probing, it is essential that we utilize the same methods to search for items.
- A disadvantage to linear probing is the tendency for clustering; items become clustered in the table. This means that if many collisions occur at the same hash value, a number of surrounding slots will be filled by the linear probing resolution.
- The general name for this process of looking for another slot after a collision is rehashing.
- In general, 𝑟𝑒ℎ𝑎𝑠ℎ(𝑝𝑜𝑠)=(𝑝𝑜𝑠+𝑠𝑘𝑖𝑝)%𝑠𝑖𝑧𝑒𝑜𝑓𝑡𝑎𝑏𝑙𝑒.
- A variation of the linear probing idea is called quadratic probing.
- In general, the i will be i^2 𝑟𝑒ℎ𝑎𝑠ℎ(𝑝𝑜𝑠)=(ℎ+𝑖2). In other words, quadratic probing uses a skip consisting of successive perfect squares.
- An alternative method for handling the collision problem is to allow each slot to hold a reference to a collection (or chain) of items. Chaining allows many items to exist at the same location in the hash table. When collisions happen, the item is still placed in the proper slot of the hash table. As more and more items hash to the same location, the difficulty of searching for the item in the collection increases.
6.5.3. Implementing the Map Abstract Data Type
- One of the most useful Python collections is the dictionary. Recall that a dictionary is an associative data type where you can store key–data pairs. The key is used to look up the associated data value. We often refer to this idea as a map.
1 class HashTable: 2 def __init__(self): 3 self.size = 11 4 self.slots = [None] * self.size 5 self.data = [None] * self.size 6 7 def put(self,key,data): 8 hashvalue = self.hashfunction(key,len(self.slots)) 9 10 if self.slots[hashvalue] == None: 11 self.slots[hashvalue] = key 12 self.data[hashvalue] = data 13 else: 14 if self.slots[hashvalue] == key: 15 self.data[hashvalue] = data #replace 16 else: 17 nextslot = self.rehash(hashvalue,len(self.slots)) 18 while self.slots[nextslot] != None and \ 19 self.slots[nextslot] != key: 20 nextslot = self.rehash(nextslot,len(self.slots)) 21 22 if self.slots[nextslot] == None: 23 self.slots[nextslot]=key 24 self.data[nextslot]=data 25 else: 26 self.data[nextslot] = data #replace 27 28 def hashfunction(self,key,size): 29 return key%size 30 31 def rehash(self,oldhash,size): 32 return (oldhash+1)%size 33 34 def get(self,key): 35 startslot = self.hashfunction(key,len(self.slots)) 36 37 data = None 38 stop = False 39 found = False 40 position = startslot 41 while self.slots[position] != None and \ 42 not found and not stop: 43 if self.slots[position] == key: 44 found = True 45 data = self.data[position] 46 else: 47 position=self.rehash(position,len(self.slots)) 48 if position == startslot: 49 stop = True 50 return data 51 52 def __getitem__(self,key): 53 return self.get(key) 54 55 def __setitem__(self,key,data): 56 self.put(key,data) 57 58 H=HashTable() 59 H[54]="cat" 60 H[26]="dog" 61 H[93]="lion" 62 H[17]="tiger" 63 H[77]="bird" 64 H[31]="cow" 65 H[44]="goat" 66 H[55]="pig" 67 H[20]="chicken" 68 print(H.slots) 69 print(H.data) 70 71 print(H[20]) 72 73 print(H[17]) 74 H[20]='duck' 75 print(H[20]) 76 print(H[99])
6.6. Sorting
- Sorting is the process of placing elements from a collection in some kind of order.
6.7. The Bubble Sort
- The bubble sort makes multiple passes through a list. It compares adjacent items and exchanges those that are out of order. Each pass through the list places the next largest value in its proper place. In essence, each item “bubbles” up to the location where it belongs.
- The exchange operation, sometimes called a “swap,” is slightly different in Python than in most other programming languages.
- In Python, it is possible to perform simultaneous assignment. The statement a,b=b,a will result in two assignment statements being done at the same time (see Figure 2). Using simultaneous assignment, the exchange operation can be done in one statement.
1 def bubbleSort(alist): 2 for passnum in range(len(alist) - 1, 0, -1): 3 for i in range(passnum): 4 if alist[i] > alist[i+1]: 5 temp = alist[i] 6 alist[i] = alist[i+1] 7 alist[i+1] = temp 8 9 alist = [54,26,93,17,77,31,44,55,20] 10 bubbleSort(alist) 11 print(alist)
- In particular, if during a pass there are no exchanges, then we know that the list must be sorted. A bubble sort can be modified to stop early if it finds that the list has become sorted. This means that for lists that require just a few passes, a bubble sort may have an advantage in that it will recognize the sorted list and stop. ActiveCode 2 shows this modification, which is often referred to as the short bubble.
1 def shortBubbleSort(alist): 2 exchanges = True 3 passnum = len(alist)-1 4 while passnum > 0 and exchanges: 5 exchanges = False 6 for i in range(passnum): 7 if alist[i] > alist[i+1]: 8 exchanges = True 9 temp = alist[i] 10 alist[i] = alist[i+1] 11 alist[i+1] = temp 12 passnum = passnum-1 13 14 alist=[20,30,40,90,50,60,70,80,100,110] 15 shortBubbleSort(alist) 16 print(alist)
6.8. The Selection Sort
- The selection sort improves on the bubble sort by making only one exchange for every pass through the list. In order to do this, a selection sort looks for the largest value as it makes a pass and, after completing the pass, places it in the proper location. As with a bubble sort, after the first pass, the largest item is in the correct place. After the second pass, the next largest is in place. This process continues and requires 𝑛−1 passes to sort n items, since the final item must be in place after the (𝑛−1) st pass.
1 def selectionSort(alist): 2 for fillslot in range(len(alist) - 1, 0, -1): 3 positionOfMax = 0 4 for location in range(1, fillslot+1): 5 if alist[location] > alist[positionOfMax]: 6 positionOfMax = location 7 8 temp = alist[fillslot] 9 alist[fillslot] = alist[positionOfMax] 10 alist[positionOfMax] = temp 11 12 alist = [54,26,93,17,77,31,44,55,20] 13 selectionSort(alist) 14 print(alist)
6.9. The Insertion Sort
- The insertion sort, although still 𝑂(𝑛2), works in a slightly different way. It always maintains a sorted sublist in the lower positions of the list. Each new item is then “inserted” back into the previous sublist such that the sorted sublist is one item larger.
1 def insertionSort(alist): 2 for index in range(1, len(alist)): 3 currentvalue = alist[index] 4 position = index 5 6 while position > 0 and alist[position-1] > currentvalue: 7 alist[position] = alist[position-1] 8 position = position-1 9 10 alist[position] = currentvalue 11 12 alist = [54,26,93,17,77,31,44,55,20] 13 insertionSort(alist) 14 print(alist)
6.10. The Shell Sort
- The shell sort, sometimes called the “diminishing increment sort,” improves on the insertion sort by breaking the original list into a number of smaller sublists, each of which is sorted using an insertion sort. The unique way that these sublists are chosen is the key to the shell sort. Instead of breaking the list into sublists of contiguous items, the shell sort uses an increment i, sometimes called the gap, to create a sublist by choosing all items that are i items apart.
1 def shellSort(alist): 2 sublistcount = len(alist)//2 3 while sublistcount > 0: 4 5 for startposition in range(sublistcount): 6 gapInsertionSort(alist, startposition, sublistcount) 7 8 print("After increments of size", sublistcount, 9 "The list is", alist) 10 11 sublistcount = sublistcount // 2 12 13 14 def gapInsertionSort(alist, start, gap): 15 for i in range(start + gap, len(alist), gap): 16 currentvalue = alist[i] 17 position = i 18 19 while position >= gap and alist[position-gap] > currentvalue: 20 alist[position] = alist[position-gap] 21 position = position-gap 22 23 alist[position] = currentvalue 24 25 alist = [54,26,93,17,77,31,44,55,20] 26 shellSort(alist) 27 print(alist) 28 29 """ 30 After increments of size 4 The list is [20, 26, 44, 17, 54, 31, 93, 55, 77] 31 After increments of size 2 The list is [20, 17, 44, 26, 54, 31, 77, 55, 93] 32 After increments of size 1 The list is [17, 20, 26, 31, 44, 54, 55, 77, 93] 33 [17, 20, 26, 31, 44, 54, 55, 77, 93] 34 """
6.11. The Merge Sort
- We now turn our attention to using a divide and conquer strategy as a way to improve the performance of sorting algorithms. The first algorithm we will study is the merge sort. Merge sort is a recursive algorithm that continually splits a list in half. If the list is empty or has one item, it is sorted by definition (the base case). If the list has more than one item, we split the list and recursively invoke a merge sort on both halves. Once the two halves are sorted, the fundamental operation, called a merge, is performed. Merging is the process of taking two smaller sorted lists and combining them together into a single, sorted, new list.
1 def mergeSort(alist): 2 print("Splitting ", alist) 3 if len(alist) > 1: 4 mid = len(alist)//2 5 lefthalf = alist[:mid] 6 righthalf = alist[mid:] 7 8 mergeSort(lefthalf) 9 mergeSort(righthalf) 10 11 i = 0 12 j = 0 13 k = 0 14 while i < len(lefthalf) and j < len(righthalf): 15 if lefthalf[i] <= righthalf[j]: 16 alist[k] = lefthalf[i] 17 i = i + 1 18 else: 19 alist[k] = righthalf[j] 20 j = j + 1 21 k = k + 1 22 23 while i < len(lefthalf): 24 alist[k] = lefthalf[i] 25 i = i + 1 26 k = k + 1 27 28 while j < len(righthalf): 29 alist[k] = righthalf[j] 30 j = j + 1 31 k = k + 1 32 print("Merging ", alist) 33 34 alist = [54,26,93,17,77,31,44,55,20] 35 mergeSort(alist) 36 print(alist) 37 38 """ 39 Splitting [54, 26, 93, 17, 77, 31, 44, 55, 20] 40 Splitting [54, 26, 93, 17] 41 Splitting [54, 26] 42 Splitting [54] 43 Merging [54] 44 Splitting [26] 45 Merging [26] 46 Merging [26, 54] 47 Splitting [93, 17] 48 Splitting [93] 49 Merging [93] 50 Splitting [17] 51 Merging [17] 52 Merging [17, 93] 53 Merging [17, 26, 54, 93] 54 Splitting [77, 31, 44, 55, 20] 55 Splitting [77, 31] 56 Splitting [77] 57 Merging [77] 58 Splitting [31] 59 Merging [31] 60 Merging [31, 77] 61 Splitting [44, 55, 20] 62 Splitting [44] 63 Merging [44] 64 Splitting [55, 20] 65 Splitting [55] 66 Merging [55] 67 Splitting [20] 68 Merging [20] 69 Merging [20, 55] 70 Merging [20, 44, 55] 71 Merging [20, 31, 44, 55, 77] 72 Merging [17, 20, 26, 31, 44, 54, 55, 77, 93] 73 [17, 20, 26, 31, 44, 54, 55, 77, 93] 74 """
- A stable algorithm maintains the order of duplicate items in a list and is preferred in most cases.
- Recall that the slicing operator is 𝑂(𝑘) where k is the size of the slice.
6.12. The Quick Sort
- The quick sort uses divide and conquer to gain the same advantages as the merge sort, while not using additional storage. As a trade-off, however, it is possible that the list may not be divided in half. When this happens, we will see that performance is diminished.
- A quick sort first selects a value, which is called the pivot value. Although there are many different ways to choose the pivot value, we will simply use the first item in the list. The role of the pivot value is to assist with splitting the list. The actual position where the pivot value belongs in the final sorted list, commonly called the split point, will be used to divide the list for subsequent calls to the quick sort.
- Figure 12 shows that 54 will serve as our first pivot value. Since we have looked at this example a few times already, we know that 54 will eventually end up in the position currently holding 31. The partition process will happen next. It will find the split point and at the same time move other items to the appropriate side of the list, either less than or greater than the pivot value.
- Partitioning begins by locating two position markers—let’s call them leftmark and rightmark—at the beginning and end of the remaining items in the list (positions 1 and 8 in Figure 13). The goal of the partition process is to move items that are on the wrong side with respect to the pivot value while also converging on the split point.
- We mentioned earlier that there are different ways to choose the pivot value. In particular, we can attempt to alleviate some of the potential for an uneven division by using a technique called median of three. To choose the pivot value, we will consider the first, the middle, and the last element in the list.
1 def quickSort(alist): 2 quickSortHelper(alist, 0, len(alist)-1) 3 4 5 def quickSortHelper(alist, first, last): 6 if first < last: 7 8 splitpoint = partition(alist, first, last) 9 10 quickSortHelper(alist, first, splitpoint-1) 11 quickSortHelper(alist, splitpoint + 1, last) 12 13 14 def partition(alist, first, last): 15 pivotvalue = alist[first] 16 17 leftmark = first+1 18 rightmark = last 19 20 done = False 21 while not done: 22 23 while leftmark <= rightmark and alist[leftmark] <= pivotvalue: 24 leftmark = leftmark + 1 25 26 while alist[rightmark] >= pivotvalue and rightmark >= leftmark: 27 rightmark = rightmark - 1 28 29 if rightmark < leftmark: 30 done = True 31 else: 32 temp = alist[leftmark] 33 alist[leftmark] = alist[rightmark] 34 alist[rightmark] = temp 35 36 temp = alist[first] 37 alist[first] = alist[rightmark] 38 alist[rightmark] = temp 39 40 return rightmark 41 42 alist = [54,26,93,17,77,31,44,55,20] 43 quickSort(alist) 44 print(alist)
6.13. Summary
- A sequential search is 𝑂(𝑛) for ordered and unordered lists.
- A binary search of an ordered list is 𝑂(log𝑛) in the worst case.
- Hash tables can provide constant time searching.
- A bubble sort, a selection sort, and an insertion sort are 𝑂(𝑛2) algorithms.
- A shell sort improves on the insertion sort by sorting incremental sublists. It falls between 𝑂(𝑛) and 𝑂(𝑛2).
- A merge sort is 𝑂(𝑛log𝑛), but requires additional space for the merging process.
- A quick sort is 𝑂(𝑛log𝑛), but may degrade to 𝑂(𝑛2) if the split points are not near the middle of the list. It does not require additional space.
7. Trees and Tree Algorithms
7.4. List of Lists Representation
1 def BinaryTree(r): 2 return [r, [], []] 3 4 5 def insertLeft(root, newBranch): 6 t = root.pop(1) 7 if len(t) > 1: 8 root.insert(1, [newBranch, t, []]) 9 else: 10 root.insert(1, [newBranch, [], []]) 11 return root 12 13 14 def insertRight(root, newBranch): 15 t = root.pop(2) 16 if len(t) > 1: 17 root.insert(2, [newBranch, [], t]) 18 else: 19 root.insert(2, [newBranch, [], []]) 20 return root 21 22 23 def getRootVal(root): 24 return root[0] 25 26 27 def setRootVal(root, newVal): 28 root[0] = newVal 29 30 31 def getLeftChild(root): 32 return root[1] 33 34 35 def getRightChild(root): 36 return root[2] 37 38 39 r = BinaryTree(3) 40 insertLeft(r, 4) 41 insertLeft(r, 5) 42 insertRight(r, 6) 43 insertRight(r, 7) 44 l = getLeftChild(r) 45 print(l) 46 47 setRootVal(l, 9) 48 print(r) 49 insertLeft(l, 11) 50 print(r) 51 print(getRightChild(getRightChild(r))) 52 53 """ 54 [5, [4, [], []], []] 55 [3, [9, [4, [], []], []], [7, [], [6, [], []]]] 56 [3, [9, [11, [4, [], []], []], []], [7, [], [6, [], []]]] 57 [6, [], []] 58 """
7.5. Nodes and References
1 class BinaryTree: 2 def __init__(self, rootObj): 3 self.key = rootObj 4 self.leftChild = None 5 self.rightChild = None 6 7 def insertLeft(self, newNode): 8 if self.leftChild is None: 9 self.leftChild = BinaryTree(newNode) 10 else: 11 t = BinaryTree(newNode) 12 t.leftChild = self.leftChild 13 self.leftChild = t 14 15 def insertRight(self, newNode): 16 if self.rightChild is None: 17 self.rightChild = BinaryTree(newNode) 18 else: 19 t = BinaryTree(newNode) 20 t.rightChild = self.rightChild 21 self.rightChild = t 22 23 def getRightChild(self): 24 return self.rightChild 25 26 def getLeftChild(self): 27 return self.leftChild 28 29 def setRootVal(self, obj): 30 self.key = obj 31 32 def getRootVal(self): 33 return self.key 34 35 36 r = BinaryTree('a') 37 print(r.getRootVal()) 38 print(r.getLeftChild()) 39 r.insertLeft('b') 40 print(r.getLeftChild()) 41 print(r.getLeftChild().getRootVal()) 42 r.insertRight('c') 43 print(r.getRightChild()) 44 print(r.getRightChild().getRootVal()) 45 r.getRightChild().setRootVal('hello') 46 print(r.getRightChild().getRootVal()) 47 48 """ 49 a 50 None 51 <__main__.BinaryTree object> 52 b 53 <__main__.BinaryTree object> 54 c 55 hello 56 """
7.6. Parse Tree
1 from pythonds.basic import Stack 2 from pythonds.trees import BinaryTree 3 4 def buildParseTree(fpexp): 5 fplist = fpexp.split() 6 pStack = Stack() 7 eTree = BinaryTree('') 8 pStack.push(eTree) 9 currentTree = eTree 10 11 for i in fplist: 12 if i == '(': 13 currentTree.insertLeft('') 14 pStack.push(currentTree) 15 currentTree = currentTree.getLeftChild() 16 17 elif i in ['+', '-', '*', '/']: 18 currentTree.setRootVal(i) 19 currentTree.insertRight('') 20 pStack.push(currentTree) 21 currentTree = currentTree.getRightChild() 22 23 elif i == ')': 24 currentTree = pStack.pop() 25 26 elif i not in ['+', '-', '*', '/', ')']: 27 try: 28 currentTree.setRootVal(int(i)) 29 parent = pStack.pop() 30 currentTree = parent 31 32 except ValueError: 33 raise ValueError("token '{}' is not a valid integer".format(i)) 34 35 return eTree 36 37 pt = buildParseTree("( ( 10 + 5 ) * 3 )") 38 pt.postorder() #defined and explained in the next section 39 40 """ 41 10 42 5 43 + 44 3 45 * 46 """
1 import operator 2 3 4 def evaluate(parseTree): 5 opers = {'+': operator.add, '-': operator.sub, '*': operator.mul, '/': operator.truediv} 6 7 leftC = parseTree.getLeftChild() 8 rightC = parseTree.getRightChild() 9 10 if leftC and rightC: 11 fn = opers[parseTree.getRootVal()] 12 return fn(evaluate(leftC), evaluate(rightC)) 13 else: 14 return parseTree.getRootVal()
7.7. Tree Traversals
1 def preorder(tree): 2 if tree: 3 print(tree.getRootVal()) 4 preorder(tree.getLeftChild()) 5 preorder(tree.getRightChild()) 6 7 8 def preorder(self): 9 print(self.key) 10 if self.leftChild: 11 self.leftChild.preorder() 12 if self.rightChild: 13 self.rightChild.preorder() 14 15 16 def postorder(tree): 17 if tree is not None: 18 postorder(tree.getLeftChild()) 19 postorder(tree.getRightChild()) 20 print(tree.getRootVal()) 21 22 23 def postordereval(tree): 24 opers = {'+':operator.add, '-':operator.sub, '*':operator.mul, '/':operator.truediv} 25 res1 = None 26 res2 = None 27 if tree: 28 res1 = postordereval(tree.getLeftChild()) 29 res2 = postordereval(tree.getRightChild()) 30 if res1 and res2: 31 return opers[tree.getRootVal()](res1,res2) 32 else: 33 return tree.getRootVal() 34 35 36 def inorder(tree): 37 if tree is not None: 38 inorder(tree.getLeftChild()) 39 print(tree.getRootVal()) 40 inorder(tree.getRightChild()) 41 42 43 def printexp(tree): 44 sVal = "" 45 if tree: 46 sVal = '(' + printexp(tree.getLeftChild()) 47 sVal = sVal + str(tree.getRootVal()) 48 sVal = sVal + printexp(tree.getRightChild())+')' 49 return sVal
7.8. Priority Queues with Binary Heaps
- In earlier sections you learned about the first-in first-out data structure called a queue. One important variation of a queue is called a priority queue. A priority queue acts like a queue in that you dequeue an item by removing it from the front. However, in a priority queue the logical order of items inside a queue is determined by their priority. The highest priority items are at the front of the queue and the lowest priority items are at the back. Thus when you enqueue an item on a priority queue, the new item may move all the way to the front.
- The classic way to implement a priority queue is using a data structure called a binary heap. A binary heap will allow us both enqueue and dequeue items in 𝑂(log𝑛).
- The binary heap is interesting to study because when we diagram the heap it looks a lot like a tree, but when we implement it we use only a single list as an internal representation. The binary heap has two common variations: the min heap, in which the smallest key is always at the front, and the max heap, in which the largest key value is always at the front.
7.10. Binary Heap Implementation
1 class BinHeap: 2 def __init__(self): 3 self.heapList = [0] 4 self.currentSize = 0 5 6 def percUp(self, i): 7 while i // 2 > 0: 8 if self.heapList[i] < self.heapList[i // 2]: 9 tmp = self.heapList[i // 2] 10 self.heapList[i // 2] = self.heapList[i] 11 self.heapList[i] = tmp 12 i = i // 2 13 14 def insert(self, k): 15 self.heapList.append(k) 16 self.currentSize = self.currentSize + 1 17 self.percUp(self.currentSize) 18 19 def percDown(self, i): 20 while (i * 2) <= self.currentSize: 21 mc = self.minChild(i) 22 if self.heapList[i] > self.heapList[mc]: 23 tmp = self.heapList[i] 24 self.heapList[i] = self.heapList[mc] 25 self.heapList[mc] = tmp 26 i = mc 27 28 def minChild(self, i): 29 if i * 2 + 1 > self.currentSize: 30 return i * 2 31 else: 32 if self.heapList[i*2] < self.heapList[i*2+1]: 33 return i * 2 34 else: 35 return i * 2 + 1 36 37 def delMin(self): 38 retval = self.heapList[1] 39 self.heapList[1] = self.heapList[self.currentSize] 40 self.currentSize = self.currentSize - 1 41 self.heapList.pop() 42 self.percDown(1) 43 return retval 44 45 def buildHeap(self, alist): 46 i = len(alist) // 2 47 self.currentSize = len(alist) 48 self.heapList = [0] + alist[:] 49 while (i > 0): 50 self.percDown(i) 51 i = i - 1 52 53 54 bh = BinHeap() 55 bh.buildHeap([9,5,6,2,3]) 56 57 print(bh.delMin()) 58 print(bh.delMin()) 59 print(bh.delMin()) 60 print(bh.delMin()) 61 print(bh.delMin()) 62 63 """ 64 2 65 3 66 5 67 6 68 9 69 """
7.13. Search Tree Implementation
- With the put method defined, we can easily overload the [] operator for assignment by having the __setitem__ method call (see Listing 4) the put method.
- By implementing the __getitem__ method we can write a Python statement that looks just like we are accessing a dictionary, when in fact we are using a binary search tree, for example z = myZipTree['Fargo'].
- Using get, we can implement the in operation by writing a __contains__ method for the BinarySearchTree.
- Python provides us with a very powerful function to use when creating an iterator. The function is called yield. yield is similar to return in that it returns a value to the caller. However, yield also takes the additional step of freezing the state of the function so that the next time the function is called it continues executing from the exact point it left off earlier. Functions that create objects that can be iterated are called generator functions.
- remember that __iter__ overrides the for x in operation for iteration
1 class TreeNode: 2 def __init__(self, key, val, left=None, right=None, parent=None): 3 self.key = key 4 self.payload = val 5 self.leftChild = left 6 self.rightChild = right 7 self.parent = parent 8 9 def hasLeftChild(self): 10 return self.leftChild 11 12 def hasRightChild(self): 13 return self.rightChild 14 15 def isLeftChild(self): 16 return self.parent and self.parent.leftChild == self 17 18 def isRightChild(self): 19 return self.parent and self.parent.rightChild == self 20 21 def isRoot(self): 22 return not self.parent 23 24 def isLeaf(self): 25 return not (self.rightChild or self.leftChild) 26 27 def hasAnyChildren(self): 28 return self.rightChild or self.leftChild 29 30 def hasBothChildren(self): 31 return self.rightChild and self.leftChild 32 33 def spliceOut(self): 34 if self.isLeaf(): 35 if self.isLeftChild(): 36 self.parent.leftChild = None 37 else: 38 self.parent.rightChild = None 39 elif self.hasAnyChildren(): 40 if self.hasLeftChild(): 41 if self.isLeftChild(): 42 self.parent.leftChild = self.leftChild 43 else: 44 self.parent.rightChild = self.leftChild 45 self.leftChild.parent = self.parent 46 else: 47 if self.isLeftChild(): 48 self.parent.leftChild = self.rightChild 49 else: 50 self.parent.rightChild = self.rightChild 51 self.rightChild.parent = self.parent 52 53 def findSuccessor(self): 54 succ = None 55 if self.hasRightChild(): 56 succ = self.rightChild.findMin() 57 else: 58 if self.parent: 59 if self.isLeftChild(): 60 succ = self.parent 61 else: 62 self.parent.rightChild = None 63 succ = self.parent.findSuccessor() 64 self.parent.rightChild = self 65 return succ 66 67 def findMin(self): 68 current = self 69 while current.hasLeftChild(): 70 current = current.leftChild 71 return current 72 73 def replaceNodeData(self, key, value, lc, rc): 74 self.key = key 75 self.payload = value 76 self.leftChild = lc 77 self.rightChild = rc 78 if self.hasLeftChild(): 79 self.leftChild.parent = self 80 if self.hasRightChild(): 81 self.rightChild.parent = self 82 83 84 class BinarySearchTree: 85 86 def __init__(self): 87 self.root = None 88 self.size = 0 89 90 def length(self): 91 return self.size 92 93 def __len__(self): 94 return self.size 95 96 def put(self, key, val): 97 if self.root: 98 self._put(key, val, self.root) 99 else: 100 self.root = TreeNode(key, val) 101 self.size = self.size + 1 102 103 def _put(self, key, val, currentNode): 104 if key < currentNode.key: 105 if currentNode.hasLeftChild(): 106 self._put(key, val, currentNode.leftChild) 107 else: 108 currentNode.leftChild = TreeNode(key, val, parent=currentNode) 109 else: 110 if currentNode.hasRightChild(): 111 self._put(key, val, currentNode.rightChild) 112 else: 113 currentNode.rightChild = TreeNode(key, val, parent=currentNode) 114 115 def __setitem__(self, k, v): 116 self.put(k, v) 117 118 def get(self, key): 119 if self.root: 120 res = self._get(key, self.root) 121 if res: 122 return res.payload 123 else: 124 return None 125 else: 126 return None 127 128 def _get(self, key, currentNode): 129 if not currentNode: 130 return None 131 elif currentNode.key == key: 132 return currentNode 133 elif key < currentNode.key: 134 return self._get(key, currentNode.leftChild) 135 else: 136 return self._get(key, currentNode.rightChild) 137 138 def __getitem__(self, key): 139 return self.get(key) 140 141 def __contains__(self, key): 142 if self._get(key, self.root): 143 return True 144 else: 145 return False 146 147 def delete(self, key): 148 if self.size > 1: 149 nodeToRemove = self._get(key, self.root) 150 if nodeToRemove: 151 self.remove(nodeToRemove) 152 self.size = self.size-1 153 else: 154 raise KeyError('Error, key not in tree') 155 elif self.size == 1 and self.root.key == key: 156 self.root = None 157 self.size = self.size - 1 158 else: 159 raise KeyError('Error, key not in tree') 160 161 def __delitem__(self, key): 162 self.delete(key) 163 164 def remove(self, currentNode): 165 if currentNode.isLeaf(): # leaf 166 if currentNode == currentNode.parent.leftChild: 167 currentNode.parent.leftChild = None 168 else: 169 currentNode.parent.rightChild = None 170 elif currentNode.hasBothChildren(): # interior 171 succ = currentNode.findSuccessor() 172 succ.spliceOut() 173 currentNode.key = succ.key 174 currentNode.payload = succ.payload 175 else: # this node has one child 176 if currentNode.hasLeftChild(): 177 if currentNode.isLeftChild(): 178 currentNode.leftChild.parent = currentNode.parent 179 currentNode.parent.leftChild = currentNode.leftChild 180 elif currentNode.isRightChild(): 181 currentNode.leftChild.parent = currentNode.parent 182 currentNode.parent.rightChild = currentNode.leftChild 183 else: 184 currentNode.replaceNodeData( 185 currentNode.leftChild.key, 186 currentNode.leftChild.payload, 187 currentNode.leftChild.leftChild, 188 currentNode.leftChild.rightChild 189 ) 190 else: 191 if currentNode.isLeftChild(): 192 currentNode.rightChild.parent = currentNode.parent 193 currentNode.parent.leftChild = currentNode.rightChild 194 elif currentNode.isRightChild(): 195 currentNode.rightChild.parent = currentNode.parent 196 currentNode.parent.rightChild = currentNode.rightChild 197 else: 198 currentNode.replaceNodeData( 199 currentNode.rightChild.key, 200 currentNode.rightChild.payload, 201 currentNode.rightChild.leftChild, 202 currentNode.rightChild.rightChild 203 ) 204 205 206 mytree = BinarySearchTree() 207 mytree[3] = "red" 208 mytree[4] = "blue" 209 mytree[6] = "yellow" 210 mytree[2] = "at" 211 212 print(mytree[6]) 213 print(mytree[2]) 214 """ 215 yellow 216 at 217 """
7.15. Balanced Binary Search Trees
- As we learned, the performance of the binary search tree can degrade to 𝑂(𝑛) for operations like get and put when the tree becomes unbalanced. In this section we will look at a special kind of binary search tree that automatically makes sure that the tree remains balanced at all times. This tree is called an AVL tree and is named for its inventors: G.M. Adelson-Velskii and E.M. Landis.
- To implement our AVL tree we need to keep track of a balance factor for each node in the tree.
- 𝑏𝑎𝑙𝑎𝑛𝑐𝑒𝐹𝑎𝑐𝑡𝑜𝑟=ℎ𝑒𝑖𝑔ℎ𝑡(𝑙𝑒𝑓𝑡𝑆𝑢𝑏𝑇𝑟𝑒𝑒)−ℎ𝑒𝑖𝑔ℎ𝑡(𝑟𝑖𝑔ℎ𝑡𝑆𝑢𝑏𝑇𝑟𝑒𝑒)
7.17. AVL Tree Implementation
1 def _put(self, key, val, currentNode): 2 if key < currentNode.key: 3 if currentNode.hasLeftChild(): 4 self._put(key, val, currentNode.leftChild) 5 else: 6 currentNode.leftChild = TreeNode(key, val, parent=currentNode) 7 self.updateBalance(currentNode.leftChild) 8 else: 9 if currentNode.hasRightChild(): 10 self._put(key, val, currentNode.rightChild) 11 else: 12 currentNode.rightChild = TreeNode(key, val, parent=currentNode) 13 self.updateBalance(currentNode.rightChild) 14 15 16 def updateBalance(self, node): 17 if node.balanceFactor > 1 or node.balanceFactor < -1: 18 self.rebalance(node) 19 return 20 if node.parent is not None: 21 if node.isLeftChild(): 22 node.parent.balanceFactor += 1 23 elif node.isRightChild(): 24 node.parent.balanceFactor -= 1 25 26 if node.parent.balanceFactor != 0: 27 self.updateBalance(node.parent) 28 29 30 def rotateLeft(self, rotRoot): 31 newRoot = rotRoot.rightChild 32 rotRoot.rightChild = newRoot.leftChild 33 if newRoot.leftChild is not None: 34 newRoot.leftChild.parent = rotRoot 35 newRoot.parent = rotRoot.parent 36 if rotRoot.isRoot(): 37 self.root = newRoot 38 else: 39 if rotRoot.isLeftChild(): 40 rotRoot.parent.leftChild = newRoot 41 else: 42 rotRoot.parent.rightChild = newRoot 43 newRoot.leftChild = rotRoot 44 rotRoot.parent = newRoot 45 rotRoot.balanceFactor = rotRoot.balanceFactor + 1 - min(newRoot.balanceFactor, 0) 46 newRoot.balanceFactor = newRoot.balanceFactor + 1 + max(rotRoot.balanceFactor, 0)
7.18. Summary of Map ADT Implementations
Table 1: Comparing the Performance of Different Map Implementations
operation |
Sorted List |
Hash Table |
Binary Search Tree |
AVL Tree |
---|---|---|---|---|
put |
O(n) |
O(1) |
O(n) |
O(log2n) |
get |
O(log2n) |
O(1) |
O(n) |
O(log2n) |
in |
O(log2n) |
O(1) |
O(n) |
O(log2n) |
del |
O(n) |
O(1) |
O(n) |
𝑂(log2𝑛) |
7.19. Summary
- In this chapter we have looked at the tree data structure. The tree data structure enables us to write many interesting algorithms. In this chapter we have looked at algorithms that use trees to do the following:
- A binary tree for parsing and evaluating expressions.
- A binary tree for implementing the map ADT.
- A balanced binary tree (AVL tree) for implementing the map ADT.
- A binary tree to implement a min heap.
- A min heap used to implement a priority queue.
8. Graphs and Graph Algorithms
8.6. Implementation
1 class Vertex: 2 def __init__(self, key): 3 self.id = key 4 self.connectedTo = {} 5 6 def addNeighbor(self, nbr, weight=0): 7 self.connectedTo[nbr] = weight 8 9 def __str__(self): 10 return str(self.id) + ' connectedTo: ' + str([x.id for x in self.connectedTo]) 11 12 def getConnections(self): 13 return self.connectedTo.keys() 14 15 def getId(self): 16 return self.id 17 18 def getWeight(self, nbr): 19 return self.connectedTo[nbr] 20 21 22 class Graph: 23 def __init__(self): 24 self.vertList = {} 25 self.numVertices = 0 26 27 def addVertex(self, key): 28 self.numVertices = self.numVertices + 1 29 newVertex = Vertex(key) 30 self.vertList[key] = newVertex 31 return newVertex 32 33 def getVertex(self, n): 34 if n in self.vertList: 35 return self.vertList[n] 36 else: 37 return None 38 39 def __contains__(self, n): 40 return n in self.vertList 41 42 def addEdge(self, f, t, weight=0): 43 if f not in self.vertList: 44 nv = self.addVertex(f) 45 if t not in self.vertList: 46 nv = self.addVertex(t) 47 self.vertList[f].addNeighbor(self.vertList[t], weight) 48 49 def getVertices(self): 50 return self.vertList.keys() 51 52 def __iter__(self): 53 return iter(self.vertList.values())
8.8. Building the Word Ladder Graph
1 from pythonds.graphs import Graph 2 3 4 def buildGraph(wordFile): 5 d = {} 6 g = Graph() 7 wfile = open(wordFile, 'r') 8 # create buckets of words that differ by one letter 9 for line in wfile: 10 word = line[:-1] 11 for i in range(len(word)): 12 bucket = word[:i] + '_' + word[i+1:] 13 if bucket in d: 14 d[bucket].append(word) 15 else: 16 d[bucket] = [word] 17 # add vertices and edges for words in the same bucket 18 for bucket in d.keys(): 19 for word1 in d[bucket]: 20 for word2 in d[bucket]: 21 if word1 != word2: 22 g.addEdge(word1, word2) 23 return g
8.9. Implementing Breadth First Search
1 from pythonds.graphs import Graph, Vertex 2 from pythonds.basic import Queue 3 4 5 def bfs(g, start): 6 start.setDistance(0) 7 start.setPred(None) 8 vertQueue = Queue() 9 vertQueue.enqueue(start) 10 while (vertQueue.size() > 0): 11 currentVert = vertQueue.dequeue() 12 for nbr in currentVert.getConnections(): 13 if (nbr.getColor() == 'white'): 14 nbr.setColor('gray') 15 nbr.setDistance(currentVert.getDistance() + 1) 16 nbr.setPred(currentVert) 17 vertQueue.enqueue(nbr) 18 currentVert.setColor('black') 19 20 21 def traverse(y): 22 x = y 23 while (x.getPred()): 24 print(x.getId()) 25 x = x.getPred() 26 print(x.getId()) 27 28 29 traverse(g.getVertex('sage'))
8.10. Breadth First Search Analysis
- Before we continue with other graph algorithms let us analyze the run time performance of the breadth first search algorithm. The first thing to observe is that the while loop is executed, at most, one time for each vertex in the graph |𝑉|. You can see that this is true because a vertex must be white before it can be examined and added to the queue. This gives us 𝑂(𝑉) for the while loop. The for loop, which is nested inside the while is executed at most once for each edge in the graph, |𝐸|. The reason is that every vertex is dequeued at most once and we examine an edge from node 𝑢 to node 𝑣 only when node 𝑢 is dequeued. This gives us 𝑂(𝐸) for the for loop. combining the two loops gives us 𝑂(𝑉+𝐸).
- Of course doing the breadth first search is only part of the task. Following the links from the starting node to the goal node is the other part of the task. The worst case for this would be if the graph was a single long chain. In this case traversing through all of the vertices would be 𝑂(𝑉). The normal case is going to be some fraction of |𝑉| but we would still write 𝑂(𝑉).
- Finally, at least for this problem, there is the time required to build the initial graph.
8.12. Building the Knight’s Tour Graph
1 from pythonds.graphs import Graph 2 3 4 def knightGraph(bdSize): 5 ktGraph = Graph() 6 for row in range(bdSize): 7 for col in range(bdSize): 8 nodeId = posToNodeId(row, col, bdSize) 9 newPositions = genLegalMoves(row, col, bdSize) 10 for e in newPositions: 11 nid = posToNodeId(e[0], e[1], bdSize) 12 ktGraph.addEdge(nodeId, nid) 13 return ktGraph 14 15 16 def posToNodeId(row, column, board_size): 17 return (row * board_size) + column 18 19 20 def genLegalMoves(x, y, bdSize): 21 newMoves = [] 22 moveOffsets = [(-1, -2), (-1, 2), (-2, -1), (-2, 1), 23 (1, -2), (1, 2), (2, -1), (2, 1)] 24 for i in moveOffsets: 25 newX = x + i[0] 26 newY = y + i[1] 27 if legalCoord(newX, bdSize) and legalCoord(newY, bdSize): 28 newMoves.append((newX, newY)) 29 return newMoves 30 31 32 def legalCoord(x, bdSize): 33 if x >= 0 and x < bdSize: 34 return True 35 else: 36 return False
8.13. Implementing Knight’s Tour
1 from pythonds.graphs import Graph, Vertex 2 3 4 def knightTour(n, path, u, limit): 5 u.setColor('gray') 6 path.append(u) 7 if n < limit: 8 nbrList = list(u.getConnections()) 9 i = 0 10 done = False 11 while i < len(nbrList) and not done: 12 if nbrList[i].getColor() == 'white': 13 done = knightTour(n+1, path, nbrList[i], limit) 14 i = i + 1 15 if not done: # prepare to backtrack 16 path.pop() 17 u.setColor('white') 18 else: 19 done = True 20 return done
8.14. Knight’s Tour Analysis
- The critical line in the orderByAvail function is line 10. This line ensures that we select the vertex to go next that has the fewest available moves.
- The problem with using the vertex with the most available moves as your next vertex on the path is that it tends to have the knight visit the middle squares early on in the tour. When this happens it is easy for the knight to get stranded on one side of the board where it cannot reach unvisited squares on the other side of the board. On the other hand, visiting the squares with the fewest available moves first pushes the knight to visit the squares around the edges of the board first. This ensures that the knight will visit the hard-to-reach corners early and can use the middle squares to hop across the board only when necessary. Utilizing this kind of knowledge to speed up an algorithm is called a heuristic. Humans use heuristics every day to help make decisions, heuristic searches are often used in the field of artificial intelligence. This particular heuristic is called Warnsdorff’s algorithm, named after H. C. Warnsdorff who published his idea in 1823.
1 def orderByAvail(n): 2 resList = [] 3 for v in n.getConnections(): 4 if v.getColor() == 'white': 5 c = 0 6 for w in v.getConnections(): 7 if w.getColor() == 'white': 8 c = c + 1 9 resList.append((c, v)) 10 resList.sort(key=lambda x: x[0]) 11 return [y[1] for y in resList]
8.15. General Depth First Search
- When the depth first search algorithm creates a group of trees we call this a depth first forest.
- It is interesting to note that where bfs uses a queue, dfsvisit uses a stack.
1 from pythonds.graphs import Graph 2 3 4 class DFSGraph(Graph): 5 def __init__(self): 6 super().__init__() 7 self.time = 0 8 9 def dfs(self): 10 for aVertex in self: 11 aVertex.setColor('white') 12 aVertex.setPred(-1) 13 for aVertex in self: 14 if aVertex.getColor() == 'white': 15 self.dfsvisit(aVertex) 16 17 def dfsvisit(self, startVertex): 18 startVertex.setColor('gray') 19 self.time += 1 20 startVertex.setDiscovery(self.time) 21 for nextVertex in startVertex.getConnections(): 22 if nextVertex.getColor() == 'white': 23 nextVertex.setPred(startVertex) 24 self.dfsvisit(nextVertex) 25 startVertex.setColor('black') 26 self.time += 1 27 startVertex.setFinish(self.time)
8.16. Depth First Search Analysis
- The general running time for depth first search is as follows. The loops in dfs both run in 𝑂(𝑉), not counting what happens in dfsvisit, since they are executed once for each vertex in the graph. In dfsvisit the loop is executed once for each edge in the adjacency list of the current vertex. Since dfsvisit is only called recursively if the vertex is white, the loop will execute a maximum of once for every edge in the graph or 𝑂(𝐸). So, the total time for depth first search is 𝑂(𝑉+𝐸).
8.17. Topological Sorting
- To help us decide the precise order in which we should do each of the steps required to make our pancakes we turn to a graph algorithm called the topological sort.
8.18. Strongly Connected Components
- One graph algorithm that can help find clusters of highly interconnected vertices in a graph is called the strongly connected components algorithm (SCC). We formally define a strongly connected component, 𝐶, of a graph 𝐺, as the largest subset of vertices 𝐶⊂𝑉 such that for every pair of vertices 𝑣,𝑤∈𝐶 we have a path from 𝑣 to 𝑤 and a path from 𝑤 to 𝑣.
8.20. Dijkstra’s Algorithm
- The algorithm we are going to use to determine the shortest path is called “Dijkstra’s algorithm.” Dijkstra’s algorithm is an iterative algorithm that provides us with the shortest path from one particular starting node to all other nodes in the graph. Again this is similar to the results of a breadth first search.
- Dijkstra’s algorithm uses a priority queue. You may recall that a priority queue is based on the heap that we implemented in the Tree Chapter. There are a couple of differences between that simple implementation and the implementation we use for Dijkstra’s algorithm.
- It is important to note that Dijkstra’s algorithm works only when the weights are all positive.
1 from pythonds.graphs import PriorityQueue, Graph, Vertex 2 3 4 def dijkstra(aGraph, start): 5 pq = PriorityQueue() 6 start.setDistance(0) 7 pq.buildHeap([(v.getDistance(), v) for v in aGraph]) 8 while not pq.isEmpty(): 9 currentVert = pq.delMin() 10 for nextVert in currentVert.getConnections(): 11 newDist = currentVert.getDistance() \ 12 + currentVert.getWeight(nextVert) 13 if newDist < nextVert.getDistance(): 14 nextVert.setDistance(newDist) 15 nextVert.setPred(currentVert) 16 pq.decreaseKey(nextVert, newDist)
8.21. Analysis of Dijkstra’s Algorithm
- Finally, let us look at the running time of Dijkstra’s algorithm. We first note that building the priority queue takes 𝑂(𝑉) time since we initially add every vertex in the graph to the priority queue. Once the queue is constructed the while loop is executed once for every vertex since vertices are all added at the beginning and only removed after that. Within that loop each call to delMin, takes 𝑂(log𝑉) time. Taken together that part of the loop and the calls to delMin take 𝑂(𝑉log(𝑉)). The for loop is executed once for each edge in the graph, and within the for loop the call to decreaseKey takes time 𝑂(𝐸log(𝑉)). So the combined running time is 𝑂((𝑉+𝐸)log(𝑉)).
8.22. Prim’s Spanning Tree Algorithm
- For our last graph algorithm let’s consider a problem that online game designers and Internet radio providers face. The problem is that they want to efficiently transfer a piece of information to anyone and everyone who may be listening.
- A brute force solution is for the broadcast host to send a single copy of the broadcast message and let the routers sort things out. In this case, the easiest solution is a strategy called uncontrolled flooding.
- The solution to this problem lies in the construction of a minimum weight spanning tree. Formally we define the minimum spanning tree 𝑇 for a graph 𝐺=(𝑉,𝐸) as follows. 𝑇 is an acyclic subset of 𝐸 that connects all the vertices in 𝑉. The sum of the weights of the edges in T is minimized.
- The algorithm we will use to solve this problem is called Prim’s algorithm. Prim’s algorithm belongs to a family of algorithms called the “greedy algorithms” because at each step we will choose the cheapest next step. In this case the cheapest next step is to follow the edge with the lowest weight. Our last step is to develop Prim’s algorithm.
- The trick is in the step that directs us to “find an edge that is safe.” We define a safe edge as any edge that connects a vertex that is in the spanning tree to a vertex that is not in the spanning tree. This ensures that the tree will always remain a tree and therefore have no cycles.
- Prim’s algorithm is similar to Dijkstra’s algorithm in that they both use a priority queue to select the next vertex to add to the growing graph.
1 from pythonds.graphs import PriorityQueue, Graph, Vertex 2 3 4 def prim(G, start): 5 pq = PriorityQueue() 6 for v in G: 7 v.setDistance(sys.maxsize) 8 v.setPred(None) 9 start.setDistance(0) 10 pq.buildHeap([(v.getDistance(), v) for v in G]) 11 while not pq.isEmpty(): 12 currentVert = pq.delMin() 13 for nextVert in currentVert.getConnections(): 14 newCost = currentVert.getWeight(nextVert) 15 if nextVert in pq and newCost < nextVert.getDistance(): 16 nextVert.setPred(currentVert) 17 nextVert.setDistance(newCost) 18 pq.decreaseKey(nextVert, newCost)
8.23. Summary
- In this chapter we have looked at the graph abstract data type, and some implementations of a graph. A graph enables us to solve many problems provided we can transform the original problem into something that can be represented by a graph. In particular, we have seen that graphs are useful to solve problems in the following general areas.
- Breadth first search for finding the unweighted shortest path.
- Dijkstra’s algorithm for weighted shortest path.
- Depth first search for graph exploration.
- Strongly connected components for simplifying a graph.
- Topological sort for ordering tasks.
- Minimum weight spanning trees for broadcasting messages.