python爬虫练习3——豆瓣电影
和豆瓣读书类似,需要对爬虫的headers进行处理。
import requests import re for i in range(0,2): j = i*20 url = 'https://movie.douban.com/j/search_subjects?type=movie&tag=%E8%B1%86%E7%93%A3%E9%AB%98%E5%88%86&sort=rank&page_limit=20&page_start='+str(j) ua = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36 Edg/91.0.864.59'} r = requests.get(url ,headers = ua , timeout = 30) #print(r.encoding,r.status_code) pat = '"title":"(.*?)","url"' pat1 = '"rate":"(.*?)","cover_x"' til = re.compile(pat,re.S).findall(r.text) rat = re.compile(pat1,re.S).findall(r.text) print(til,rat) print('---------------')
至于如何把电影名称和评分放在一起,小编初学还未想好,欢迎读者留言指导。