LeeHua - 博客园

2019年7月25日

摘要：简单验证码识别 import tesserocr from PIL import Image image = Image.open('PFET.jpg') # 利用 Image 对象的 convert() 方法传入参数 "L" ，即可将图片转化为灰度图像 image = image.convert( 阅读全文

posted @ 2019-07-25 15:30 LeeHua 阅读(289) 评论(0) 推荐(0) 编辑

2019年7月23日

通过 Python 使用 Selenium 爬取淘宝商品

摘要：无注释版 import pymongo from selenium import webdriver from selenium.common.exceptions import TimeoutException from selenium.webdriver.common.by import By 阅读全文

posted @ 2019-07-23 12:02 LeeHua 阅读(552) 评论(0) 推荐(0) 编辑

2019年7月21日

Python 调用 Splash API

摘要： render.html render.html 接口用于获取 JavaScript 渲染的页面的 HTML 代码，接口地址就是 Splash 的运行地址加此接口名称。例如： http://0.0.0.0:8050/render.html?url=https://www.baidu.com&wait 阅读全文

posted @ 2019-07-21 19:59 LeeHua 阅读(648) 评论(0) 推荐(0) 编辑

Splash的简单使用

摘要： Splash Lua脚本http://localhost:8050，端口为8050 入口及返回值 function main(splash, args) splash:go("http://www.baidu.com") splash:wait(0.5) local title = splash:e 阅读全文

posted @ 2019-07-21 18:13 LeeHua 阅读(1925) 评论(0) 推荐(0) 编辑

2019年7月17日

Python 自动化库 Selenium 的使用

摘要：基本使用 from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.common.keys import Keys from selenium.webdrive 阅读全文

posted @ 2019-07-17 15:41 LeeHua 阅读(452) 评论(0) 推荐(0) 编辑

2019年7月15日

Python 网络爬虫之 Ajax 数据爬取

摘要： Ajax 概述 Ajax是利用 JavaScript在保证页面不被刷新、页面链接不改变的情况下与服务器交换数据并更新部分网页的技术。 Ajax基本原理发送请求解析内容渲染页面查看请求 Ajax结果提取爬取一个人微博的前面10页分析过程 Python代码实现 from urllib.par 阅读全文

posted @ 2019-07-15 11:38 LeeHua 阅读(2131) 评论(0) 推荐(0) 编辑

2019年7月13日

Python 操作 MongoDB 数据库

摘要： MongoDB是一个存储文档型的数据库（非关系型数据库）利用pymongo连接MongoDB import pymongo client = pymongo.MongoClient(host='localhost', port=27017) # 或 pymongo.MongoClient('mon 阅读全文

posted @ 2019-07-13 18:43 LeeHua 阅读(502) 评论(0) 推荐(0) 编辑

2019年7月12日

Python 操作 MySQL 数据库

摘要：利用PyMySQL连接MySQL 连接数据库 import pymysql # 连接MySQL MySQL在本地运行用户名为root 密码为123456 默认端口3306 db = pymysql.connect(host='localhost', user='root', password='1 阅读全文

posted @ 2019-07-12 23:15 LeeHua 阅读(633) 评论(0) 推荐(0) 编辑

数据存储之文件存储

摘要： TXT 文件存储爬取知乎上的热门话题，获取话题的问题、作者、答案，然后保存在TXT文本中 import requests from pyquery import PyQuery url = 'https://www.zhihu.com/explore' headers = { 'User-Agen 阅读全文

posted @ 2019-07-12 11:01 LeeHua 阅读(770) 评论(0) 推荐(0) 编辑

2019年7月11日

pyquery 的简单使用

摘要： pyquery 的初步了解（实例引入）简单举例 from pyquery import PyQuery as pq html = ''' <div> <ul> <li class="item-O"><a href="linkl.html">first item</a></li> <li class 阅读全文

posted @ 2019-07-11 16:40 LeeHua 阅读(258) 评论(0) 推荐(0) 编辑

2019年7月10日

BeautifulSoup 的简单使用

摘要： Beautiful Soup初了解解析工具Beautiful Soup，借助网页的结构和属性等特性来解析网页(简单的说就是python的一个HTML或XML的解析库) Beautiful Soup支持的解析器有很多：Python标准库、lxml HTML解析器、lxmlXML解析器、html5li 阅读全文

posted @ 2019-07-10 12:35 LeeHua 阅读(284) 评论(0) 推荐(0) 编辑

2019年6月23日

正则表达式和python中的re模块

摘要：常用的正则匹配规则元字符量词字符组字符集转义符贪婪匹配 re模块使用正则表达式举例：判断一个手机号码是否合法不使用正则表达式 # 不使用正则表达式 phone_number = input("请输入一个11位数导入手机号码 :") if len(phone_number) == 11 阅读全文

posted @ 2019-06-23 12:53 LeeHua 阅读(482) 评论(0) 推荐(0) 编辑

2019年6月22日

爬虫基本库的使用之requests库

摘要：使用requests 由于处理网页验证和Cookies时，需要写Opener和Handler来处理，为了更方便地实现这些操作，就有了更强大的库requests。requests库功能很强大。能实现Cookies、登录验证、代理设置等操作。简单使用requests库 import requests 阅读全文

posted @ 2019-06-22 11:06 LeeHua 阅读(372) 评论(0) 推荐(0) 编辑

2019年6月19日

爬虫基本库的使用之urllib库

摘要： urllib的简单使用 urllib模块是Python内置的HTTP请求模块 urllib包含模块：request模块、error模块、parse模块、robotparser模块例子举例1：向指定的url发送请求，并返回服务器响应的类文件对象 response = urllib.request 阅读全文

posted @ 2019-06-19 10:42 LeeHua 阅读(609) 评论(0) 推荐(0) 编辑

2019年6月6日

《鸟哥的Linux私房菜--基础篇》学习

摘要：第四章显示日期与时间的指令：date 输入： (base) liyihuadeMacBook-Pro:~ liyihua$ date 输出： Thu Jun 6 08:44:02 CST 2019 显示日历指令：cal 输入： (base) liyihuadeMacBook-Pro:~ liyih 阅读全文

posted @ 2019-06-06 15:17 LeeHua 阅读(3777) 评论(0) 推荐(0) 编辑

Lee Hua's Blog

热爱编程 -- 写Bug

公告