爬取百度热搜榜前十
1.找到url:https://tophub.today/n/Jb0vmloB1G
2.按F12找headers
3.找源代码
4
import requests from bs4 import BeautifulSoup import pandas as pd url="http://top.baidu.com/buzz.php?p=hotstocks" headers = {'User-Agent:Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'} r=requests.get(url) r.encoding=r.apparent_encoding x=r.text soup=BeautifulSoup(x,'lxml') a=[] b=[] for i in soup.find_all(class_="al"): a.append(i.get_text().strip()) for j in soup.find_all(align_="center"): b.append(l.get_text().strip()) data=[a,b] print(data) h=pd.DataFrame(data,index=["标题","热度"]) print(h.T)
结果:
[[], []] Empty DataFrame Columns: [标题, 热度] Index: []