随笔分类 - python爬虫
摘要:BeautifulSoup4使用 #Beautiful Soup 是一个可以从HTML或XML文件中提取数据的Python库 html_doc = """ The Dormouse's story The Dormouse's story Once upon a time there were three little sisters; and their names were Elsie,...
阅读全文
摘要:1、爬取代理IP,使用xpath 网页分析,需要这两个参数 ['50.239.245.103:80', '92.222.180.156:8080',] xpath 取出的是一个列表,用[0]取出,再str字符串拼接。 另一种,取IP,但port获取不到 爬取梨视频首页的视频,使用xpath http
阅读全文
浙公网安备 33010602011771号