2020 年 3月 1 日随笔档案 - 方木Fengl

python爬虫（十七）电影天堂爬虫1

摘要：电影天堂里面的要爬取这个页面里所有的电影信息，每个电影信息都在另一个html里，先在这里页面里把这些电影的url爬取出来 # 电影天堂爬虫 from lxml import etree import requests # 一个网址头 BASE_DOMAIN="https://www.dytt8.n 阅读全文

posted @ 2020-03-01 23:14 方木Fengl 阅读(635) 评论(0) 推荐(0) 编辑

python爬虫（十六） -IndexError: list index out of range

摘要：在用lxml和xpath对一个网站进行解析，在解析的时候出现错误-IndexError: list index out of range 原因是在中这个网站的html代码中有的标识为空，只要加上try.....except 错误机制跳过空值就行了例如： html=etree.HTML(text) 阅读全文

posted @ 2020-03-01 22:00 方木Fengl 阅读(15000) 评论(0) 推荐(0) 编辑

python爬虫（十五）豆瓣电影爬虫

摘要： from lxml import etree import requests # 1、将目标网站上的页面抓取下来 headers={ 'User-Agent':"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like 阅读全文

posted @ 2020-03-01 21:55 方木Fengl 阅读(542) 评论(0) 推荐(0) 编辑

zhaoxinhui

导航

公告