2020 年 1月随笔档案 - douzujun

python爬虫笔记（六）网络爬虫之实战（2）——股票数据定向爬虫

摘要：1. 股票数据定向爬虫 https://gupiao.baidu.com/stock http://quote.eastmoney.com/stock_list.html 2. 实例编写 2.1 获取HTML页面 def getHTMLText(url): try: r = requests.get 阅读全文

posted @ 2020-01-31 23:56 douzujun 阅读(975) 评论(2) 推荐(0) 编辑

python爬虫笔记（六）网络爬虫之实战（1）——淘宝商品比价定向爬虫（解决淘宝爬虫限制：使用cookies）

摘要：1. 淘宝商品信息定向爬虫链接： https://www.taobao.com/ 2. 实例编写 2.1 整体框架 # -*- coding: utf-8 -*- import requests import re def getHTMLText(url): print("") # 对获得的每个页阅读全文

posted @ 2020-01-31 15:53 douzujun 阅读(3333) 评论(4) 推荐(0) 编辑

python爬虫笔记（五）网络爬虫之提取——实例优化：中国大学排名爬虫

摘要：1. 代码 # -*- coding: utf-8 -*- """ Created on Thu Jan 30 01:27:38 2020 @author: douzi """ import requests from bs4 import BeautifulSoup import bs4 def 阅读全文

posted @ 2020-01-30 18:35 douzujun 阅读(533) 评论(2) 推荐(0) 编辑

python的格式化输出format

摘要：1. format （1）设置对齐方式（< （默认）左对齐、> 右对齐、^ 中间对齐、= （只用于数字）在小数点后进行补齐） print("{:<6} is {}".format('123', 'abcd')) # 左对齐 print("{:>6} is {}".format('123', 'ab 阅读全文

posted @ 2020-01-30 02:18 douzujun 阅读(595) 评论(0) 推荐(0) 编辑

python爬虫笔记（五）网络爬虫之提取——实例：中国大学排名爬虫

摘要：1. 中国大学排名定向爬虫网站：http://www.zuihaodaxue.com/zuihaodaxuepaiming2016.html 查看源代码，发现信息直接写在HTML里的，即该定向爬虫可以实现 2. 程序的结构设计 2. 实例编写 2.1 代码总框架 # -*- coding: utf 阅读全文

posted @ 2020-01-30 01:27 douzujun 阅读(826) 评论(0) 推荐(0) 编辑

正则表达式笔记（re.search/re.match/re.split/re.compile/用法）

摘要：1. 正则表达式 https://www.cnblogs.com/douzujun/p/7446448.html 单词边界的用法（非常好用啊！！！）比如，我只想替换 app 为 qq，不像替换掉 apple和application里的app re.findall(r'\b\d{3}\b', '11 阅读全文

posted @ 2020-01-29 23:07 douzujun 阅读(1099) 评论(0) 推荐(0) 编辑

python爬虫笔记（五）网络爬虫之提取—信息组织与提取方法（3）基于bs4库的HTML内容查找方法

摘要：1. 基于bs4库的HTML内容查找方法 1.1 <>.find_all() 和 re （正则表达式库）（1）参数为单一字符串（2）参数为列表（3）参数为True，则返回所有标签内容（4）显示以 b 开头的标签，如 b，body。（使用 re：正则表达式库） import request 阅读全文

posted @ 2020-01-29 20:19 douzujun 阅读(296) 评论(0) 推荐(0) 编辑

python爬虫笔记（五）网络爬虫之提取—信息组织与提取方法（2）信息提取的一般方法

摘要：1. 信息提取的一般方法 1.1 方法一 1.2 方法2 1.3 方法3 2. 实例 import requests from bs4 import BeautifulSoup r = requests.get("http://python123.io/ws/demo.html") demo = r 阅读全文

posted @ 2020-01-29 19:48 douzujun 阅读(251) 评论(0) 推荐(0) 编辑

Ubuntu安装anaconda3

摘要：官网下载好anaconda，然后 bash Anaconda3-5.2.0-Linux-x86_64.sh 安装时候，需要输入的地方输入yes，然后一路回车：安装完成，打开 .bashrc文件，添加把 export xxxx 写到最后一行，保存 sudo gedit ~/.bashrc expor 阅读全文

posted @ 2020-01-29 00:07 douzujun 阅读(208) 评论(0) 推荐(0) 编辑

python爬虫笔记（五）网络爬虫之提取—信息组织与提取方法（1）信息标记的三种形式

摘要：1. 信息标记 2. 信息标记种类 2.1 XML 举例： 2.2 JSON 2.3 YAML 3. 三种信息标记形式比较阅读全文

posted @ 2020-01-23 22:45 douzujun 阅读(253) 评论(0) 推荐(0) 编辑

python爬虫笔记（四）网络爬虫之提取—Beautiful Soup库（3）基于bs4库的格式化和编码

摘要：1. prettify() import requests from bs4 import BeautifulSoup r = requests.get("http://python123.io/ws/demo.html") demo = r.text print(demo, "\n") soup 阅读全文

posted @ 2020-01-23 20:36 douzujun 阅读(361) 评论(0) 推荐(0) 编辑

python爬虫笔记（四）网络爬虫之提取—Beautiful Soup库（2）基于bs4库的HTML内容遍历方法

摘要：1. 基于bs4库的HTML内容遍历方法 1.1 .contents 举例 1.2 结点的父亲标签 1.3 标签树的上行遍历（parents） 1.4 标签树的平行遍历注意：标签的儿子结点可能是 NavigableString 阅读全文

posted @ 2020-01-22 18:06 douzujun 阅读(234) 评论(0) 推荐(0) 编辑

python爬虫笔记（四）网络爬虫之提取—Beautiful Soup库（1）

摘要：1. Beautiful Soup安装 pip install beautifulsoup linux要用 pip3 2. 使用使用这个网站：https://python123.io/ws/demo.html # -*- coding: utf-8 -*- """ Created on Tue J 阅读全文

posted @ 2020-01-21 22:06 douzujun 阅读(189) 评论(0) 推荐(0) 编辑

python爬虫笔记（三）ip归属地自动查询

摘要：http://m.ip138.com/ import requests url = "http://m.ip138.com/ip.asp?ip=" r = requests.get(url + "202.204.80.112") print(r.status_code) print(r.text[- 阅读全文

posted @ 2020-01-20 01:03 douzujun 阅读(670) 评论(0) 推荐(0) 编辑

python爬虫笔记（三）requests模块深入—网络图片的爬取和存储

摘要：1. 网络图片爬取 import os import requests root = ".//" url = "https://img2018.cnblogs.com/i-beta/817161/202001/817161-20200116224428592-123074215.png" path 阅读全文

posted @ 2020-01-20 00:56 douzujun 阅读(347) 评论(0) 推荐(0) 编辑

python爬虫笔记（三）requests模块深入—模拟登录的三种方式

摘要：1. cookie和session区别 2. 爬虫处理cookie和session 3. 处理cookies和session请求 4. 尝试使用session登录人人网（别试，了解一下） # -*- coding: utf-8 -*- import requests session = reques 阅读全文

posted @ 2020-01-19 23:12 douzujun 阅读(748) 评论(0) 推荐(0) 编辑

python爬虫笔记（三）requests模块深入—使用代理

摘要：阅读全文

posted @ 2020-01-19 02:43 douzujun 阅读(253) 评论(0) 推荐(0) 编辑

python爬虫笔记（三）requests模块深入—发送post请求

摘要：1. requests模块发送post请求 # -*- coding: utf-8 -*- """ Created on Sun Jan 19 01:26:05 2020 @author: douzi """ # -*- coding: utf-8 -*- import requests impor 阅读全文

posted @ 2020-01-19 01:54 douzujun 阅读(708) 评论(0) 推荐(0) 编辑

python爬虫笔记（二）request库的使用（实例：贴吧爬虫）

摘要：1. requests库安装推荐使用anaconda，自带 2. requests使用 import requests r = requests.get("http://www.baidu.com") print(r.status_code) r.encoding = 'utf-8' print( 阅读全文

posted @ 2020-01-16 22:46 douzujun 阅读(590) 评论(0) 推荐(0) 编辑

python爬虫笔记（一）爬虫简介和流程

摘要：1. 字符串知识点 2. HTTP和HTTPS 3. url的形式 4. HTTP请求格式 5. GET和POST两种基本请求方法的区别（1）GET把参数包含在URL中，POST通过request body传递参数。（2）GET请求在URL中传送的参数是有长度限制的，而POST没有（大文本）。阅读全文

posted @ 2020-01-16 18:05 douzujun 阅读(337) 评论(0) 推荐(0) 编辑

IEAD2020注册码

摘要：6ZUMD7WWWU-eyJsaWNlbnNlSWQiOiI2WlVNRDdXV1dVIiwibGljZW5zZWVOYW1lIjoiSmV0cyBHcm91cCIsImFzc2lnbmVlTmFtZSI6IiIsImFzc2lnbmVlRW1haWwiOiIiLCJsaWNlbnNlUmVzdHJ 阅读全文

posted @ 2020-01-16 16:25 douzujun 阅读(273) 评论(0) 推荐(0) 编辑

Git学习笔记

摘要：1. Git简介 2. 版本创建 2.1 安装 sudo apt-get install git 2.2 创建一个版本库（1）创建一个目录git_test，在git_test目录下创建一个版本库，命令： git init （2）在git_test目录下创建一个文件code.txt，编辑内容如下：阅读全文

posted @ 2020-01-13 15:09 douzujun 阅读(259) 评论(0) 推荐(0) 编辑

ERROR 2003 (HY000): Can't connect to MySQL server on 'localhost' (10061)

摘要：使用阅读全文

posted @ 2020-01-06 23:09 douzujun 阅读(755) 评论(0) 推荐(0) 编辑

douzi

01 2020 档案

公告

搜索

我的标签

积分与排名

随笔分类 (1045)

随笔档案 (811)

相册 (14)

好友

阅读排行榜

推荐排行榜