2020 年 1月 30 日随笔档案 - 10nnn4R

2020年1月30日

摘要：爬取学校网站的链接 import requests import re url = 'http://www.whut.edu.cn/' r = requests.get(url) r.encoding = 'utf-8' text = r.text links = re.findall('(http 阅读全文

posted @ 2020-01-30 21:29 10nnn4R 阅读(175) 评论(0) 推荐(0) 编辑

正则表达式(1)

摘要：正则表达式符号与方法一常用符号字符含义 . 匹配任意字符 * 匹配一个字符无限次或零次 ? 匹配一个字符0次或一次 .* 贪心算法 .*? 非贪心算法 () 括号内的结果作为反回值常用方法几种符号的demo demo1: code = 'huasdakxxIxxbcjkxxlovexxsbs 阅读全文

posted @ 2020-01-30 17:41 10nnn4R 阅读(331) 评论(0) 推荐(0) 编辑

python爬虫-User-Agent的伪造

摘要：某些网站会识别python爬虫程序并阻断,通过构造User_Agent可以抵抗某些反爬虫机制用fake-useragent这个库就能很好的实现 pycharm中安装步骤产生随机的User-Agent 只需一行代码 from fake_useragent import UserAgent ua = 阅读全文

posted @ 2020-01-30 15:56 10nnn4R 阅读(393) 评论(0) 推荐(0) 编辑

python爬虫(1)requests库

摘要：在pycharm中安装requests库的一种方法首先找到设置搜索然后安装,蓝色代表已经安装 requests库中的get请求与HTTP协议相对应,requests库也有七种请求方式. 获取url requests.get(url.params,kwargs) r = requests.get 阅读全文

posted @ 2020-01-30 15:25 10nnn4R 阅读(149) 评论(0) 推荐(0) 编辑

L0NMAR

Konw it then hack it.

公告