爬虫？ - 随笔分类 - fly_bk

BeautifulSoup使用

摘要：import re from bs4 import BeautifulSoup html = """ <html><head><title>The Dormouse's story</title></head> <body> <p class="title" name="dromouse"><b>T 阅读全文

posted @ 2021-08-03 09:35 fly_bk 阅读(35) 评论(0) 推荐(0) 编辑

ajax数据获取

摘要：使用requests直接访问页面url返回的信息与页面信息不匹配 import requests headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like 阅读全文

posted @ 2021-07-31 15:36 fly_bk 阅读(90) 评论(0) 推荐(0) 编辑

xpath使用

摘要：之前已经对爬虫基本知识点用一篇博客总结过来，因为xpath实在太重要了，单独放一篇 """ xpath使用参考：https://www.w3school.com.cn/xpath/index.asp """ from lxml import etree text = ''' <div> <ul> 阅读全文

posted @ 2021-07-31 10:10 fly_bk 阅读(44) 评论(0) 推荐(0) 编辑

gzip压缩问题转化

摘要：在使用httpclient访问接口时得到的数据是以"b’\x1f\x8b\x08"开头的，说明它是gzip压缩过的数据，需要对我们接收的字节码进行一个解码操作 from urllib import request from io import BytesIO import gzip def xxx 阅读全文

posted @ 2021-03-31 14:31 fly_bk 阅读(154) 评论(0) 推荐(0) 编辑

一天掌握python爬虫

摘要：一天掌握python爬虫日记：（小爬虫，NO 我们是大蜘蛛）chromedriver[http://npm.taobao.org/mirrors/chromedriver/] 数据抓取: requests：requests 的底层实现其实就是 urllib开源地址：https://github. 阅读全文

posted @ 2018-12-18 10:31 fly_bk 阅读(445) 评论(3) 推荐(0) 编辑

fly_bk

随笔分类 - 爬虫？

公告