爬虫基础知识三

使用超时参数

  • requests.get(url,headers=headers,timeout=3)
    • 3秒内必须返回响应,否则报错

retrying模块学习

  • pip install retrying
from retrying import retry

@retry(stop_max_attempt_number=3)
def fun():
    print("this is fun")
    paise ValueError("this is test error")

附上一段两者配合使用的代码

import requests
import retrying
headers={"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36",
"Referer": "https://movie.douban.com/tag/"
}
@retrying.retry(stop_max_attempt_number=3)
def _parse_url(url):
    print("*"*100)
    response=requests.get(url,headers=headers,timeout=5)
    return response.content.decode()

def parse_url(url):
    try:
        html_str=_parse_url(url)
    except:
        html_str=None
    return html_str

if __name__=='__main__':
    url="https://www.baidu.com"
    print(parse_url(url)[:100])

 

posted @ 2019-08-26 22:32  ctrl_TT豆  阅读(199)  评论(0编辑  收藏  举报