园龄：3年9个月粉丝：2 关注：1

python爬虫环境配置

环境配置

python3/请求库/解析库/数据库/存储库/web库/app爬取库/爬虫框架库

python3
- win11下可以直接商店下载了（
- Linux下apt-get install python3

请求库

requests

pip3 install requests
selenium

pip install selenium
chromeDriver
1. 在关于查看chrome版本
2. 在chromeDriver下载对应版本
3. 将chromeDriver配置到环境变量

~~phantomJS~~

新版selenium已经不支持phantomJS了，可以在chromedriver里面直接使用

验证：

 from selenium import webdriver
from selenium.webdriver.chrome.options import Options
 
chrome_options = Options()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--disable-gpu')
driver = webdriver.Chrome(options=chrome_options)
driver.get("https://dreaife.icu/")
print(driver.current_url)

aiohttp

pip install aiodns

解析库
- lxml
  
  pip install lxml
- beautifulsoup4
  
  pip install beautifulsoup4
- pyquery
  
  pip install pyquery
- tesserocr
  - 安装tesseract
    
    windows
  - 安装tesserocr
    
    windows使用pip install <name>.whl安装
  - 验证
```
 import tesserocr
from PIL import Image
 
image = Image.open('G:/codeS/backOnGithub/Jupyter/spider/image.png')
print(tesserocr.image_to_text(image))
```
    注意：如果出现File "tesserocr.pyx", line 2580, in tesserocr._tesserocr.image_to_text
    RuntimeError: Failed to init API, possibly an invalid tessdata path错误，需要先将tesseract的test_data放到错误文件夹下
数据库
- MySQL
- MongoDB
- Redis
存储库
- PyMySQL
  
  pip install pymysql
- PyMongo
  
  pip install pymongo
- redis-py
  
  pip install redis
- RedisDump
  
  安装ruby
  
  gem install redis-dump
web库
- Flask
  
  pip install flask
- Tornado
  
  pip install tornado
app爬取库
- charles
- mitmproxy
  
  pip install mitmproxy
- appium
爬虫框架
- pyspider
  
  pip install pyspider
  
  如果win11无法运行可以看我这篇
- scrapy
- scrapy-splash
- scrapy-redis

上一篇elasticsearch初识

下一篇pandas基础使用

本文作者：Dreaife

本文链接：https://www.cnblogs.com/dreaife/p/17939090

posted @ 2024-01-01 20:28 Dreaife 阅读(42) 评论(0) 编辑收藏举报

刷新页面返回顶部

登录后才能查看或发表评论，立即登录或者逛逛博客园首页

Dreaife

python爬虫环境配置

环境配置

公告

搜索

常用链接

最新随笔

我的标签

随笔分类

随笔档案

阅读排行榜

评论排行榜

推荐排行榜

最新评论

	from selenium import webdriver
	from selenium.webdriver.chrome.options import Options

	chrome_options = Options()
	chrome_options.add_argument('--headless')
	chrome_options.add_argument('--disable-gpu')
	driver = webdriver.Chrome(options=chrome_options)
	driver.get("https://dreaife.icu/")
	print(driver.current_url)

	import tesserocr
	from PIL import Image

	image = Image.open('G:/codeS/backOnGithub/Jupyter/spider/image.png')
	print(tesserocr.image_to_text(image))