抓取knewone异步加载列表页数据

代码如下:

from bs4 import BeautifulSoup
import requests
import time
url = 'https://knewone.com/things?page='
def get_page(url,data = None):
web_date =requests.get(url)
soup = BeautifulSoup(web_date.text,'lxml')
imgs = soup.select(' a.cover-inner > img')
titles = soup.select('h4.title > a')

if data == None:
for img,title in zip(imgs,titles):
date = {
'img':img.get('src'),
'title':title.get('title'),

}
print(date)
def get_more_page(start,end):
for one in range(start,end):
get_page(url+str(one))
time.sleep(2)
get_more_page(1,30)
主要是找到异步加载的网址并构建函数控制抓取页面需要思考,也是仿的教程上的,
posted @ 2017-07-08 23:45  独善其身412  阅读(155)  评论(0编辑  收藏  举报