蕝戀 - 博客园

2023年7月17日

摘要：这玩意一般般、OCR可以试试EasyOcr、飞桨OCR、 ```Python """ Tesseract 就是谷歌开源的一个OCR光学文字识别引擎默认已经有训练好的库了，但它还可以加载别人训练好的库。使用： 1、根据平台安装引擎这里有一个【曼海姆大学图书馆】的封装版本 https://gith 阅读全文

posted @ 2023-07-17 11:27 蕝戀阅读(24) 评论(0) 推荐(0) 编辑

Selenium-[实例]猫眼电影爬取

摘要： ```Python import random import time from selenium import webdriver from selenium.webdriver import ActionChains from selenium.webdriver.chrome.service 阅读全文

posted @ 2023-07-17 11:27 蕝戀阅读(47) 评论(0) 推荐(0) 编辑

Selenium-ActionChains动作链（针对鼠标、滚轮等操作

摘要： [https://www.selenium.dev/documentation/webdriver/actions_api/](https://www.selenium.dev/documentation/webdriver/actions_api/) 注意：对于滚轮的操作，只支持chrome浏览器阅读全文

posted @ 2023-07-17 11:26 蕝戀阅读(45) 评论(0) 推荐(0) 编辑

Selenium文件上传

摘要： [https://www.selenium.dev/documentation/webdriver/elements/file_upload/](https://www.selenium.dev/documentation/webdriver/elements/file_upload/) 用的方法就阅读全文

posted @ 2023-07-17 11:26 蕝戀阅读(7) 评论(0) 推荐(0) 编辑

Selenium查找元素、元素的属性和方法

摘要： # 查找元素官方文档：[https://www.selenium.dev/documentation/webdriver/elements/locators/](https://www.selenium.dev/documentation/webdriver/elements/locators/) 阅读全文

posted @ 2023-07-17 11:24 蕝戀阅读(290) 评论(0) 推荐(0) 编辑

Selenium浏览器属性、提取数据

摘要： # 浏览器属性 > 在使用selenium过程中，实例化driver对象后，driver对象有一些常用的属性和方法 1. `driver.page_source` 当前标签页浏览器渲染之后的网页源代码。 2. `driver.current_url` 当前标签页的url。 3. `dirver.ti 阅读全文

posted @ 2023-07-17 11:24 蕝戀阅读(27) 评论(0) 推荐(0) 编辑

Selenium基本使用、过检测

摘要： ```Python import time from selenium import webdriver from selenium.webdriver.chrome.service import Service as ChromeService # 驱动文件所在路径 DIRVER_PATH = r 阅读全文

posted @ 2023-07-17 11:15 蕝戀阅读(281) 评论(0) 推荐(0) 编辑

[案例]贴吧爬取并获取图片

摘要： ```Python import os import random import re import sys import time import urllib.parse import requests from lxml import etree from lxml.etree import _ 阅读全文

posted @ 2023-07-17 11:11 蕝戀阅读(11) 评论(0) 推荐(0) 编辑

[案例]豆瓣电影信息爬取

摘要： ```Python import json import os import requests from lxml import etree from lxml.etree import _Element class DoubanMovieSpider(object): def __init__(s 阅读全文

posted @ 2023-07-17 11:10 蕝戀阅读(11) 评论(0) 推荐(0) 编辑

lxml模块

摘要： lxml主要用xpath、css选择器等来提取xml格式文档，html也是xml格式文档的一种。 - xpath方法返回列表的三种情况 - 返回空列表：没有找到任何元素 - 返回字符串列表：xpath规则匹配用了`@属性`或者`text()等函数`返回`str`（文本内容或某属性的值） - 返回由_ 阅读全文

posted @ 2023-07-17 11:08 蕝戀阅读(15) 评论(0) 推荐(0) 编辑

绝恋。。

公告