随笔分类 -  python爬虫

摘要:BeautifulSoup4使用 #Beautiful Soup 是一个可以从HTML或XML文件中提取数据的Python库 html_doc = """ The Dormouse's story The Dormouse's story Once upon a time there were three little sisters; and their names were Elsie,... 阅读全文
posted @ 2019-04-19 14:05 呆呆114 阅读(151) 评论(0) 推荐(0)
摘要:1、爬取代理IP,使用xpath 网页分析,需要这两个参数 ['50.239.245.103:80', '92.222.180.156:8080',] xpath 取出的是一个列表,用[0]取出,再str字符串拼接。 另一种,取IP,但port获取不到 爬取梨视频首页的视频,使用xpath http 阅读全文
posted @ 2019-04-19 11:11 呆呆114 阅读(128) 评论(0) 推荐(0)
只有注册用户登录后才能阅读该文。
posted @ 2018-03-28 09:30 呆呆114