爬取百度热搜榜前十

1.找到url：https://tophub.today/n/Jb0vmloB1G

2.按F12找headers

3.找源代码

import requests
from bs4 import BeautifulSoup
import pandas as pd
url="http://top.baidu.com/buzz.php?p=hotstocks"
headers = {'User-Agent:Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'}
r=requests.get(url)
r.encoding=r.apparent_encoding
x=r.text
soup=BeautifulSoup(x,'lxml')
a=[]
b=[]
for i in soup.find_all(class_="al"):
    a.append(i.get_text().strip())  
for j in soup.find_all(align_="center"):
    b.append(l.get_text().strip())
data=[a,b]
print(data)
h=pd.DataFrame(data,index=["标题","热度"])
print(h.T)

　　结果：

[[], []]
Empty DataFrame
Columns: [标题, 热度]
Index: []

posted @ 2020-03-20 23:35 锦衣夜行408 阅读(289) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

爬取百度热搜榜前十

公告