python爬虫 - 随笔分类 - 0x1e61

爬取青年大学习

摘要：import requests from lxml import etree url = 'http://news.cyol.com/gb/channels/vrGlAKDl/index.html' headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 阅读全文

posted @ 2023-04-24 20:14 0x1e61 阅读(108) 评论(0) 推荐(0) 编辑

用协程扒光百度小说中的《西游记》

摘要：# 不用协程 """ import os import asyncio import requests import aiofiles as aiofiles from lxml import etree import aiohttp headers = { 'User-Agent': 'Mozil 阅读全文

posted @ 2023-03-04 00:56 0x1e61 阅读(74) 评论(0) 推荐(0) 编辑

xpath-猪八戒网服务商名称爬取

摘要：import requests from lxml import etree url = 'https://changsha.zbj.com/xcxkfzbjzbj/f.html?fr=zbj.sy.zyyw_2nd.lv3&r=2' headers = { 'User-Agent': 'Mozil 阅读全文

posted @ 2023-03-03 23:56 0x1e61 阅读(26) 评论(0) 推荐(0) 编辑

协程-应用

摘要：# request.get() 同步的代码 -> 异步操作aiohttp import os # 异步io import asyncio # 异步file import aiofiles # 异步 http import aiohttp # 如果文件夹不存在则创建文件夹，用来放图片 if not o 阅读全文

posted @ 2023-03-03 23:09 0x1e61 阅读(17) 评论(0) 推荐(0) 编辑

协程

摘要：import asyncio import time # 协程（Coroutine），也可以被称为微线程，是一种用户态内的上下文切换技术。 # 简而言之，其实就是通过一个线程实现代码块相互切换执行 # asyncio是Python 3.4版本引入的标准库，直接内置了对异步IO的支持。 # async 阅读全文

posted @ 2023-03-03 22:52 0x1e61 阅读(16) 评论(0) 推荐(0) 编辑

线程池进程池实战-新发菜地价

摘要：import csv import time from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor import requests # 存储文件 f = open('data.csv', mode='w', en 阅读全文

posted @ 2023-03-03 21:50 0x1e61 阅读(14) 评论(0) 推荐(0) 编辑

多进程

摘要：from multiprocessing import Process def task(name): for i in range(10000): print(f"{name}:",i) if __name__ == '__main__': p = Process(target=task,args 阅读全文

posted @ 2023-03-03 20:44 0x1e61 阅读(6) 评论(0) 推荐(0) 编辑

多线程

摘要：# 线程类 from threading import Thread def func(): for i in range(1000): print("func", 1) if __name__ == '__main__': t = Thread(target=func) # 创建线程并给线程安排任阅读全文

posted @ 2023-03-03 20:37 0x1e61 阅读(8) 评论(0) 推荐(0) 编辑

代理操作

摘要：import requests # 使用代理获取百度首页 headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0. 阅读全文

posted @ 2023-03-03 19:43 0x1e61 阅读(17) 评论(0) 推荐(0) 编辑

模拟用户登录-cookes

摘要：import requests url = 'https://www.xread8.com/user/login.json' headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 阅读全文

posted @ 2023-03-03 14:11 0x1e61 阅读(16) 评论(0) 推荐(0) 编辑

猪八戒网服务商名称爬取

摘要：import requests from lxml import etree url = 'https://changsha.zbj.com/xcxkfzbjzbj/f.html?fr=zbj.sy.zyyw_2nd.lv3&r=2' headers = { 'User-Agent': 'Mozil 阅读全文

posted @ 2023-03-02 23:28 0x1e61 阅读(26) 评论(0) 推荐(0) 编辑

python爬虫-xpath基础

摘要：# 准备一个html格式文档 doc = ''' <div> <ul> <li class="item-0"><a href="https://ask.hellobi.com/link1.html">first item</a></li> <li class="item-1"><a href="ht 阅读全文

posted @ 2023-03-02 21:45 0x1e61 阅读(22) 评论(0) 推荐(0) 编辑

bs4解析-优美图库

摘要：import requests from bs4 import BeautifulSoup url = 'http://www.umeituku.com/bizhitupian/meinvbizhi/' headers = { 'User-Agent': 'Mozilla/5.0 (Windows 阅读全文

posted @ 2023-03-01 22:49 0x1e61 阅读(45) 评论(0) 推荐(0) 编辑

bs4解析-湖南农场品价格行情

摘要：import requests from bs4 import BeautifulSoup import csv url = 'https://price.21food.cn/market/174-p1.html' headers = { 'User-Agent': 'Mozilla/5.0 (Wi 阅读全文

posted @ 2023-02-28 20:17 0x1e61 阅读(14) 评论(0) 推荐(0) 编辑

python爬虫-bs4基础

摘要：# 下面的一段HTML代码将作为例子被多次用到.这是爱丽丝梦游仙境的的一段内容(以后内容中简称为爱丽丝的文档): html_doc = """ <html><head><title>The Dormouse's story</title></head> <body> <p class="ti 阅读全文

posted @ 2023-02-28 20:16 0x1e61 阅读(14) 评论(0) 推荐(0) 编辑

爬取电影天堂最新电影下各个电影标题-电影磁链接

摘要：import requests import re url = 'xxx/index2.htm' headers = { 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like 阅读全文

posted @ 2023-02-28 13:06 0x1e61 阅读(463) 评论(0) 推荐(0) 编辑

抓取豆瓣电影TOP250标题-年份-评分-评分人数

摘要：import csv import re import requests headers = { 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrom 阅读全文

posted @ 2023-02-28 08:53 0x1e61 阅读(199) 评论(0) 推荐(0) 编辑

python爬虫-request模块

摘要：1. requests 中的请求方法 HTTP 请求方法: requests.get(url, params=None, **kwargs) # GET 请求 requests.post(url, data=None, json=None, **kwargs) # POST 请求 requests. 阅读全文

posted @ 2023-02-27 19:50 0x1e61 阅读(104) 评论(0) 推荐(0) 编辑

python基础-re模块

摘要：import re # 1.从一个字符串中提取到所以的数字 lst = re.findall('\d+', "fajhdsjk123kjfakl123213") print(lst) # 返回的是一个列表 # 2.判断一句话是否有数字 # search的特点：匹配字符串，匹配到第一个结果就返回，不会阅读全文

posted @ 2023-02-26 23:12 0x1e61 阅读(33) 评论(0) 推荐(0) 编辑

python基础-hashlib模块

摘要：import hashlib # 创建md5对象 obj = hashlib.md5() # 把要加密的信息传递给obj obj.update("6666".encode('utf-8')) # 从obj中拿到密文 mi = obj.hexdigest() print(mi) # e9510081a 阅读全文

posted @ 2023-02-26 22:28 0x1e61 阅读(12) 评论(0) 推荐(0) 编辑

0x1e61

随笔分类 - python爬虫

公告

搜索

常用链接

最新随笔

随笔分类

随笔档案

阅读排行榜

评论排行榜

推荐排行榜

最新评论