2022 年 8月随笔档案 - 冬天不下雨

多线程的爬取

摘要：# 导入一个请求的模块import jsonimport timefrom concurrent.futures.thread import ThreadPoolExecutorfrom urllib.parse import urlencodeimport requests# 图片的名字num = 阅读全文

posted @ 2022-08-21 23:04 冬天不下雨阅读(42) 评论(0) 推荐(0) 编辑

正则爬取实例

摘要：import reimport requestsurl = 'https://b.faloo.com/1183478 1.html'headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537. 阅读全文

posted @ 2022-08-19 14:48 冬天不下雨阅读(18) 评论(0) 推荐(0) 编辑

python链接

摘要：import redis# 链接redis host,port,db# 建立链接con = redis.StrictRedis( host='127.0.0.1', port=6379, db=4, # 默认使用的是0号 decode_responses=True)# string类型# con.s 阅读全文

posted @ 2022-08-18 21:42 冬天不下雨阅读(37) 评论(0) 推荐(0) 编辑

scrapy简介

摘要：'''scrapy简介 Scrapy是由Python语言开发的一个快速、高层次的屏幕抓取和web抓取框架，用于抓取 web站点并从页面中提取结构化的数据，只需要实现少量的代码，就能够快速的抓取。Scrapy使用了Twisted异步网络框架，可以加快我们的下载速度twisted下载：https:// 阅读全文

posted @ 2022-08-17 15:20 冬天不下雨阅读(143) 评论(0) 推荐(0) 编辑

selenium使用方法

摘要：'''## **认识selenium** **下载：pip install selenium** 官方文档：https://selenium-python.readthedocs.io/### 什么是selenium？ selenium 是一套完整的web应用程序测试系统，包含了测试的录制（s 阅读全文

posted @ 2022-08-17 15:19 冬天不下雨阅读(111) 评论(0) 推荐(0) 编辑

paresl方法

摘要：from parsel import Selector'''parsel 是一个Python的第三方库,相当于css选择器+xpath+re需要安装: pip install parsel'''html = '''<!DOCTYPE html><html lang="en"><head> <meta 阅读全文

posted @ 2022-08-17 15:18 冬天不下雨阅读(72) 评论(0) 推荐(0) 编辑

爬虫操作

摘要：#导入模块import requests# 请求的路径url = 'https://www.baidu.com/?tn=88093251_37_hao_pg'resp = requests.get(url)# 指定字符编码resp.encoding = 'utf-8'print(resp.text) 阅读全文

posted @ 2022-08-17 15:17 冬天不下雨阅读(133) 评论(0) 推荐(0) 编辑

lxml解析

摘要：'''### xpath解析**安装：pip install lxml****简介** XPath 是一门在 HTML/XML 文档中查找信息的语言。XPath 可用来在 HTML/XML 文档中对元素和属性进行遍历。相比于BeautifulSoup，Xpath在提取数据时会更有效率。**lxml 阅读全文

posted @ 2022-08-17 15:17 冬天不下雨阅读(176) 评论(0) 推荐(0) 编辑

longwanghzx

08 2022 档案

公告

搜索

常用链接

随笔档案

阅读排行榜