Python 线程使用模式
参考阅读:http://www.ibm.com/developerworks/aix/library/au-threadingpython/
一个小例子:
1: import threading
2: import datetime
3:
4: class ThreadClass(threading.Thread):
5: def run(self):
6: now = datetime.datetime.now()
7: print "%s says Hello World at time: %s" %
8: (self.getName(), now)
9:
10: for i in range(2):
11: t = ThreadClass()
12: t.start()
自己写的线程类要从threading.Thread继承,要实现run方法
Noah Gift推荐在使用python的线程时使用queue模式
1: #!/usr/bin/env python
2: import Queue
3: import threading
4: import urllib2
5: import time
6:
7: hosts = ["http://yahoo.com", "http://google.com", "http://amazon.com",
8: "http://ibm.com", "http://apple.com"]
9:
10: queue = Queue.Queue()
11:
12: class ThreadUrl(threading.Thread):
13: """Threaded Url Grab"""
14: def __init__(self, queue):
15: threading.Thread.__init__(self)
16: self.queue = queue
17:
18: def run(self):
19: while True:
20: #grabs host from queue
21: host = self.queue.get()
22:
23: #grabs urls of hosts and prints first 1024 bytes of page
24: url = urllib2.urlopen(host)
25: print url.read(1024)
26:
27: #signals to queue job is done
28: self.queue.task_done()
29:
30: start = time.time()
31: def main():
32:
33: #spawn a pool of threads, and pass them queue instance
34: for i in range(5):
35: t = ThreadUrl(queue)
36: t.setDaemon(True)
37: t.start()
38:
39: #populate queue with data
40: for host in hosts:
41: queue.put(host)
42:
43: #wait on the queue until everything has been processed
44: queue.join()
45:
46: main()
47: print "Elapsed Time: %s" % (time.time() - start)
这个例子给出使用queue的模式:
1.用Queue.Queue()创建队列实例,然后用其操作数据
2.把该队列实例传入线程类中
3.生成守护线程池
4.每次从队列中取一个数据,在线程中使用改数据,使用run方法,完成工作
5.工作完成后,用queue.task_done()发送信号给队列,以表明任务已经结束
6.在queue上使用join(),这意味着一直等到queue为空时,再退出主程序
这里设置守护线程为真,是为了让主线程能够在只有守护线程时还在运行时退出,简化程序执行流程
链式处理:
1: import Queue
2: import threading
3: import urllib2
4: import time
5: from BeautifulSoup import BeautifulSoup
6:
7: hosts = ["http://yahoo.com", "http://google.com", "http://amazon.com",
8: "http://ibm.com", "http://apple.com"]
9:
10: queue = Queue.Queue()
11: out_queue = Queue.Queue()
12:
13: class ThreadUrl(threading.Thread):
14: """Threaded Url Grab"""
15: def __init__(self, queue, out_queue):
16: threading.Thread.__init__(self)
17: self.queue = queue
18: self.out_queue = out_queue
19:
20: def run(self):
21: while True:
22: #grabs host from queue
23: host = self.queue.get()
24:
25: #grabs urls of hosts and then grabs chunk of webpage
26: url = urllib2.urlopen(host)
27: chunk = url.read()
28:
29: #place chunk into out queue
30: self.out_queue.put(chunk)
31:
32: #signals to queue job is done
33: self.queue.task_done()
34:
35: class DatamineThread(threading.Thread):
36: """Threaded Url Grab"""
37: def __init__(self, out_queue):
38: threading.Thread.__init__(self)
39: self.out_queue = out_queue
40:
41: def run(self):
42: while True:
43: #grabs host from queue
44: chunk = self.out_queue.get()
45:
46: #parse the chunk
47: soup = BeautifulSoup(chunk)
48: print soup.findAll(['title'])
49:
50: #signals to queue job is done
51: self.out_queue.task_done()
52:
53: start = time.time()
54: def main():
55:
56: #spawn a pool of threads, and pass them queue instance
57: for i in range(5):
58: t = ThreadUrl(queue, out_queue)
59: t.setDaemon(True)
60: t.start()
61:
62: #populate queue with data
63: for host in hosts:
64: queue.put(host)
65:
66: for i in range(5):
67: dt = DatamineThread(out_queue)
68: dt.setDaemon(True)
69: dt.start()
70:
71:
72: #wait on the queue until everything has been processed
73: queue.join()
74: out_queue.join()
75:
76: main()
77: print "Elapsed Time: %s" % (time.time() - start)
可见使用queue来使用线程真的是简单方便,而且还可以通过链式queue来扩展。上面的小程序可以看做是搜索引擎和数据挖掘的基础组成部分