爬取百度热搜榜前十

1.找到url:https://tophub.today/n/Jb0vmloB1G

2.按F12找headers

3.找源代码

4

 

import requests
from bs4 import BeautifulSoup
import pandas as pd
url="http://top.baidu.com/buzz.php?p=hotstocks"
headers = {'User-Agent:Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'}
r=requests.get(url)
r.encoding=r.apparent_encoding
x=r.text
soup=BeautifulSoup(x,'lxml')
a=[]
b=[]
for i in soup.find_all(class_="al"):
    a.append(i.get_text().strip())  
for j in soup.find_all(align_="center"):
    b.append(l.get_text().strip())
data=[a,b]
print(data)
h=pd.DataFrame(data,index=["标题","热度"])
print(h.T)

  结果:

[[], []]
Empty DataFrame
Columns: [标题, 热度]
Index: []
posted @ 2020-03-20 23:35  锦衣夜行408  阅读(289)  评论(0编辑  收藏  举报