0. 网址分类
- 大类:
- https://book.douban.com/
- https://music.douban.com/
- https://movie.douban.com/
https://movie.douban.com/subject/电影ID/
- 小类:
- 评论:
https://movie.douban.com/subject/xxx/comments
- 评论:
1. 爬取“喜欢这部剧集的人也喜欢 ”
import requests
from bs4 import BeautifulSoup
url = "https://movie.douban.com/subject/25953429/"
soup = BeautifulSoup(requests.get(url).text, 'html.parser')
also_likes = set()
links = soup.find_all('dd')
for link in links:
also_like = link.find_next('a')['href']
also_likes.add(also_like)
2. 电影评论
https://mp.weixin.qq.com/s/uTIhyNVE7W6mGMneSKQNlw