1.tornado队列的特点
和python标准队列queue相比,tornado的队列Queue支持异步
2.Queue常用方法
Queue.get()
会暂停,直到queue中有元素
Queue.put()
对有最大长度限制的队列,会暂停,直到队列有空闲空间
Queue.task_done()
对每一个get元素,紧接着调用task_done(),表示这个任务执行完毕
Queue.join()
等待,直到所有任务都执行完毕,即所有元素都调用了task_done()
3.示例
给出一个地址http://www.tornadoweb.org/en/stable/,分析页面中所有以这个url为前缀的链接,
并依次访问,解析,直到找出所有的url
1 #!/usr/bin/env python 2 3 import time 4 from datetime import timedelta 5 6 try: 7 from HTMLParser import HTMLParser 8 from urlparse import urljoin, urldefrag 9 except ImportError: 10 from html.parser import HTMLParser 11 from urllib.parse import urljoin, urldefrag 12 13 from tornado import httpclient, gen, ioloop, queues 14 15 base_url = 'http://www.tornadoweb.org/en/stable/' 16 concurrency = 10 17 18 19 @gen.coroutine 20 def get_links_from_url(url): 21 """Download the page at `url` and parse it for links. 22 23 Returned links have had the fragment after `#` removed, and have been made 24 absolute so, e.g. the URL 'gen.html#tornado.gen.coroutine' becomes 25 'http://www.tornadoweb.org/en/stable/gen.html'. 26 """ 27 try: 28 response = yield httpclient.AsyncHTTPClient().fetch(url) 29 print('fetched %s' % url) 30 31 html = response.body if isinstance(response.body, str) \ 32 else response.body.decode() 33 urls = [urljoin(url, remove_fragment(new_url)) 34 for new_url in get_links(html)] 35 except Exception as e: 36 print('Exception: %s %s' % (e, url)) 37 raise gen.Return([]) 38 39 raise gen.Return(urls) 40 41 42 def remove_fragment(url): 43 pure_url, frag = urldefrag(url) 44 return pure_url 45 46 47 def get_links(html): 48 class URLSeeker(HTMLParser): 49 def __init__(self): 50 HTMLParser.__init__(self) 51 self.urls = [] 52 53 def handle_starttag(self, tag, attrs): 54 href = dict(attrs).get('href') 55 if href and tag == 'a': 56 self.urls.append(href) 57 58 url_seeker = URLSeeker() 59 url_seeker.feed(html) 60 return url_seeker.urls 61 62 63 @gen.coroutine 64 def main(): 65 q = queues.Queue() 66 start = time.time() 67 fetching, fetched = set(), set() 68 69 @gen.coroutine 70 def fetch_url(): 71 current_url = yield q.get() 72 try: 73 if current_url in fetching: 74 return 75 76 print('fetching %s' % current_url) 77 fetching.add(current_url) 78 urls = yield get_links_from_url(current_url) 79 fetched.add(current_url) 80 81 for new_url in urls: 82 # Only follow links beneath the base URL 83 if new_url.startswith(base_url): 84 yield q.put(new_url) 85 86 finally: 87 q.task_done() 88 89 @gen.coroutine 90 def worker(): 91 while True: 92 yield fetch_url() 93 94 q.put(base_url) 95 96 # Start workers, then wait for the work queue to be empty. 97 for _ in range(concurrency): 98 worker() 99 yield q.join(timeout=timedelta(seconds=300)) 100 assert fetching == fetched 101 print('Done in %d seconds, fetched %s URLs.' % ( 102 time.time() - start, len(fetched))) 103 104 105 if __name__ == '__main__': 106 import logging 107 logging.basicConfig() 108 io_loop = ioloop.IOLoop.current() 109 io_loop.run_sync(main)
------
往事如烟,伴着远去的步伐而愈加朦胧。未来似雾,和着前进的风儿而逐渐清晰!
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· AI与.NET技术实操系列:基于图像分类模型对图像进行分类
· go语言实现终端里的倒计时
· 如何编写易于单元测试的代码
· 10年+ .NET Coder 心语,封装的思维:从隐藏、稳定开始理解其本质意义
· .NET Core 中如何实现缓存的预热?
· 分享一个免费、快速、无限量使用的满血 DeepSeek R1 模型,支持深度思考和联网搜索!
· 基于 Docker 搭建 FRP 内网穿透开源项目(很简单哒)
· ollama系列01:轻松3步本地部署deepseek,普通电脑可用
· 25岁的心里话
· 按钮权限的设计及实现