Python 线程使用模式

参考阅读:http://www.ibm.com/developerworks/aix/library/au-threadingpython/

一个小例子:

   1: import threading
   2: import datetime
   3:  
   4: class ThreadClass(threading.Thread):
   5:   def run(self):
   6:     now = datetime.datetime.now()
   7:     print "%s says Hello World at time: %s" % 
   8:     (self.getName(), now)
   9:  
  10: for i in range(2):
  11:   t = ThreadClass()
  12:   t.start()
 
自己写的线程类要从threading.Thread继承,要实现run方法
 

Noah Gift推荐在使用python的线程时使用queue模式

   1: #!/usr/bin/env python
   2: import Queue
   3: import threading
   4: import urllib2
   5: import time
   6:  
   7: hosts = ["http://yahoo.com", "http://google.com", "http://amazon.com",
   8: "http://ibm.com", "http://apple.com"]
   9:  
  10: queue = Queue.Queue()
  11:  
  12: class ThreadUrl(threading.Thread):
  13: """Threaded Url Grab"""
  14: def __init__(self, queue):
  15:   threading.Thread.__init__(self)
  16:   self.queue = queue
  17:  
  18: def run(self):
  19:   while True:
  20:     #grabs host from queue
  21:     host = self.queue.get()
  22:  
  23:     #grabs urls of hosts and prints first 1024 bytes of page
  24:     url = urllib2.urlopen(host)
  25:     print url.read(1024)
  26:  
  27:     #signals to queue job is done
  28:     self.queue.task_done()
  29:  
  30: start = time.time()
  31: def main():
  32:  
  33: #spawn a pool of threads, and pass them queue instance 
  34: for i in range(5):
  35:   t = ThreadUrl(queue)
  36:   t.setDaemon(True)
  37:   t.start()
  38:   
  39: #populate queue with data   
  40:   for host in hosts:
  41:     queue.put(host)
  42:  
  43: #wait on the queue until everything has been processed     
  44: queue.join()
  45:  
  46: main()
  47: print "Elapsed Time: %s" % (time.time() - start)

这个例子给出使用queue的模式:

1.用Queue.Queue()创建队列实例,然后用其操作数据

2.把该队列实例传入线程类中

3.生成守护线程池

4.每次从队列中取一个数据,在线程中使用改数据,使用run方法,完成工作

5.工作完成后,用queue.task_done()发送信号给队列,以表明任务已经结束

6.在queue上使用join(),这意味着一直等到queue为空时,再退出主程序

 

这里设置守护线程为真,是为了让主线程能够在只有守护线程时还在运行时退出,简化程序执行流程

 

链式处理:

 

   1: import Queue
   2: import threading
   3: import urllib2
   4: import time
   5: from BeautifulSoup import BeautifulSoup
   6:  
   7: hosts = ["http://yahoo.com", "http://google.com", "http://amazon.com",
   8:         "http://ibm.com", "http://apple.com"]
   9:  
  10: queue = Queue.Queue()
  11: out_queue = Queue.Queue()
  12:  
  13: class ThreadUrl(threading.Thread):
  14:     """Threaded Url Grab"""
  15:     def __init__(self, queue, out_queue):
  16:         threading.Thread.__init__(self)
  17:         self.queue = queue
  18:         self.out_queue = out_queue
  19:  
  20:     def run(self):
  21:         while True:
  22:             #grabs host from queue
  23:             host = self.queue.get()
  24:  
  25:             #grabs urls of hosts and then grabs chunk of webpage
  26:             url = urllib2.urlopen(host)
  27:             chunk = url.read()
  28:  
  29:             #place chunk into out queue
  30:             self.out_queue.put(chunk)
  31:  
  32:             #signals to queue job is done
  33:             self.queue.task_done()
  34:  
  35: class DatamineThread(threading.Thread):
  36:     """Threaded Url Grab"""
  37:     def __init__(self, out_queue):
  38:         threading.Thread.__init__(self)
  39:         self.out_queue = out_queue
  40:  
  41:     def run(self):
  42:         while True:
  43:             #grabs host from queue
  44:             chunk = self.out_queue.get()
  45:  
  46:             #parse the chunk
  47:             soup = BeautifulSoup(chunk)
  48:             print soup.findAll(['title'])
  49:  
  50:             #signals to queue job is done
  51:             self.out_queue.task_done()
  52:  
  53: start = time.time()
  54: def main():
  55:  
  56:     #spawn a pool of threads, and pass them queue instance
  57:     for i in range(5):
  58:         t = ThreadUrl(queue, out_queue)
  59:         t.setDaemon(True)
  60:         t.start()
  61:  
  62:     #populate queue with data
  63:     for host in hosts:
  64:         queue.put(host)
  65:  
  66:     for i in range(5):
  67:         dt = DatamineThread(out_queue)
  68:         dt.setDaemon(True)
  69:         dt.start()
  70:  
  71:  
  72:     #wait on the queue until everything has been processed
  73:     queue.join()
  74:     out_queue.join()
  75:  
  76: main()
  77: print "Elapsed Time: %s" % (time.time() - start)

 

可见使用queue来使用线程真的是简单方便,而且还可以通过链式queue来扩展。上面的小程序可以看做是搜索引擎和数据挖掘的基础组成部分

posted @ 2012-05-27 23:09  Orcus  阅读(307)  评论(0编辑  收藏  举报