2019 年 7月随笔档案 - LeeHua

Python 网络爬虫代理设置

摘要：requests http/https代理设置 import requests proxy = '120.78.225.5:3128' proxies = { 'http': 'http://' + proxy, 'https': 'https://' + proxy, } try: respons 阅读全文

posted @ 2019-07-29 18:17 LeeHua 阅读(426) 评论(0) 推荐(0) 编辑

Python 极客验证码识别

摘要：简单验证码识别 import tesserocr from PIL import Image image = Image.open('PFET.jpg') # 利用 Image 对象的 convert() 方法传入参数 "L" ，即可将图片转化为灰度图像 image = image.convert( 阅读全文

posted @ 2019-07-25 15:30 LeeHua 阅读(297) 评论(0) 推荐(0) 编辑

通过 Python 使用 Selenium 爬取淘宝商品

摘要：无注释版 import pymongo from selenium import webdriver from selenium.common.exceptions import TimeoutException from selenium.webdriver.common.by import By 阅读全文

posted @ 2019-07-23 12:02 LeeHua 阅读(560) 评论(0) 推荐(0) 编辑

Python 调用 Splash API

摘要：render.html render.html 接口用于获取 JavaScript 渲染的页面的 HTML 代码，接口地址就是 Splash 的运行地址加此接口名称。例如： http://0.0.0.0:8050/render.html?url=https://www.baidu.com&wait 阅读全文

posted @ 2019-07-21 19:59 LeeHua 阅读(668) 评论(0) 推荐(0) 编辑

Splash的简单使用

摘要：Splash Lua脚本http://localhost:8050，端口为8050 入口及返回值 function main(splash, args) splash:go("http://www.baidu.com") splash:wait(0.5) local title = splash:e 阅读全文

posted @ 2019-07-21 18:13 LeeHua 阅读(1998) 评论(0) 推荐(0) 编辑

Python 自动化库 Selenium 的使用

摘要：基本使用 from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.common.keys import Keys from selenium.webdrive 阅读全文

posted @ 2019-07-17 15:41 LeeHua 阅读(458) 评论(0) 推荐(0) 编辑

Python 网络爬虫之 Ajax 数据爬取

摘要：Ajax 概述 Ajax是利用 JavaScript在保证页面不被刷新、页面链接不改变的情况下与服务器交换数据并更新部分网页的技术。 Ajax基本原理发送请求解析内容渲染页面查看请求 Ajax结果提取爬取一个人微博的前面10页分析过程 Python代码实现 from urllib.par 阅读全文

posted @ 2019-07-15 11:38 LeeHua 阅读(2239) 评论(0) 推荐(0) 编辑

Python 操作 MongoDB 数据库

摘要：MongoDB是一个存储文档型的数据库（非关系型数据库）利用pymongo连接MongoDB import pymongo client = pymongo.MongoClient(host='localhost', port=27017) # 或 pymongo.MongoClient('mon 阅读全文

posted @ 2019-07-13 18:43 LeeHua 阅读(507) 评论(0) 推荐(0) 编辑

Python 操作 MySQL 数据库

摘要：利用PyMySQL连接MySQL 连接数据库 import pymysql # 连接MySQL MySQL在本地运行用户名为root 密码为123456 默认端口3306 db = pymysql.connect(host='localhost', user='root', password='1 阅读全文

posted @ 2019-07-12 23:15 LeeHua 阅读(639) 评论(0) 推荐(0) 编辑

数据存储之文件存储

摘要：TXT 文件存储爬取知乎上的热门话题，获取话题的问题、作者、答案，然后保存在TXT文本中 import requests from pyquery import PyQuery url = 'https://www.zhihu.com/explore' headers = { 'User-Agen 阅读全文

posted @ 2019-07-12 11:01 LeeHua 阅读(779) 评论(0) 推荐(0) 编辑

pyquery 的简单使用

摘要：pyquery 的初步了解（实例引入）简单举例 from pyquery import PyQuery as pq html = ''' <div> <ul> <li class="item-O"><a href="linkl.html">first item</a></li> <li class 阅读全文

posted @ 2019-07-11 16:40 LeeHua 阅读(260) 评论(0) 推荐(0) 编辑

BeautifulSoup 的简单使用

摘要：Beautiful Soup初了解解析工具Beautiful Soup，借助网页的结构和属性等特性来解析网页(简单的说就是python的一个HTML或XML的解析库) Beautiful Soup支持的解析器有很多：Python标准库、lxml HTML解析器、lxmlXML解析器、html5li 阅读全文

posted @ 2019-07-10 12:35 LeeHua 阅读(285) 评论(0) 推荐(0) 编辑

Lee Hua's Blog

热爱编程 -- 写Bug

07 2019 档案

公告

我的标签

积分与排名

随笔分类 (240)

随笔档案 (226)

阅读排行榜

推荐排行榜