爬虫基础知识三
使用超时参数
- requests.get(url,headers=headers,timeout=3)
- 3秒内必须返回响应,否则报错
retrying模块学习
- pip install retrying
from retrying import retry @retry(stop_max_attempt_number=3) def fun(): print("this is fun") paise ValueError("this is test error")
附上一段两者配合使用的代码
import requests import retrying headers={"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36", "Referer": "https://movie.douban.com/tag/" } @retrying.retry(stop_max_attempt_number=3) def _parse_url(url): print("*"*100) response=requests.get(url,headers=headers,timeout=5) return response.content.decode() def parse_url(url): try: html_str=_parse_url(url) except: html_str=None return html_str if __name__=='__main__': url="https://www.baidu.com" print(parse_url(url)[:100])