淡水无甜

2019年7月13日

摘要： import requests # 引用requests库 res_music = requests.get('https://c.y.qq.com/soso/fcgi-bin/client_search_cp?ct=24&qqmusic_ver=1298&new_json=1&remoteplac 阅读全文

posted @ 2019-07-13 23:42 淡水无甜阅读(368) 评论(0) 推荐(0) 编辑

2019年7月7日

Xpath解析库的使用

摘要： 1 import requests # 导入requests 请求库 2 from lxml import etree # 导入lxml解析库 3 4 def getHTMLtext(url): 5 try: 6 r =requests.get(url,timeout=20) #用get方法超时时间20秒 7 r... 阅读全文

posted @ 2019-07-07 19:13 淡水无甜阅读(395) 评论(0) 推荐(0) 编辑

爬虫通用框架

摘要： 1 爬虫通用框架 2 import requests 3 4 def get_html_text(url): 5 try: 6 r =requests.get(url,timeout=20) 7 8 r.raise_for_status() 9 10 r.encoding = r.apparent_encoding ... 阅读全文

posted @ 2019-07-07 13:24 淡水无甜阅读(262) 评论(0) 推荐(0) 编辑

用requests 爬取豆瓣书评的评论

摘要： 1 import requests 2 3 4 url ="https://book.douban.com/subject/1084336/comments/" 5 response = requests.get(url) 6 r = response.text 7 8 from bs4 import BeautifulSoup 9 soup = BeautifulSou... 阅读全文

posted @ 2019-07-07 12:31 淡水无甜阅读(504) 评论(0) 推荐(0) 编辑

urllib和requsts的简单使用

摘要： 1 import urllib.request // 导入urllib.requests 2 3 url = "http://www.baidu.com" //百度访问网址 4 5 response = urllib.request.urlopen(url) //get的请求访问 6 7 r = response.read(500) 打印前500字符 8 ... 阅读全文

posted @ 2019-07-07 10:44 淡水无甜阅读(195) 评论(0) 推荐(0) 编辑

2019年4月17日

day3

摘要： 1 import urllib.request 2 3 #付费的代理发送 4 #1.用户名密码(带着) 5 #通过验证的处理器来发送 6 7 def money_proxy_use(): 8 # #第一种方式付费代理发送请求 9 # #1.代理ip 10 # money_proxy ={"http":"username:pwd@192.168.12.... 阅读全文

posted @ 2019-04-17 21:59 淡水无甜阅读(139) 评论(0) 推荐(0) 编辑

day02

摘要： 1 import urllib.request 2 import urllib.parse 3 import string 4 5 6 def get_params(): 7 url = "http://www.baidu.com/s?" 8 9 params = { 10 "wd":"中文", 11 "key":"zh... 阅读全文

posted @ 2019-04-17 21:22 淡水无甜阅读(104) 评论(0) 推荐(0) 编辑

2019年4月16日

day01 urllib.request 的简单使用

摘要：简单网络爬虫案例 1 import urllib.request 2 3 # 请求百度地址 4 url = "http://www.baidu.com/" 5 # get请求 6 response = urllib.request.urlopen(url) 7 # 将文件获取内容转换成字符串 8 data = response.read().decode("utf-8") 9... 阅读全文

posted @ 2019-04-16 20:41 淡水无甜阅读(113) 评论(0) 推荐(0) 编辑

2019年4月2日

day02

摘要： 1 import urllib.request 2 3 4 def create_proxy_handler(): 5 url = "https://blog.csdn.net/m0_37499059/article/details/79003731" 6 7 #添加代理 8 proxy = { 9 阅读全文

posted @ 2019-04-02 19:11 淡水无甜阅读(149) 评论(0) 推荐(0) 编辑

2019年4月1日

day16 re模块

摘要：正则表达式本身也和python没有什么关系，就是匹配字符串内容的一种规则。一说规则我已经知道你很晕了，现在就让我们先来看一些实际的应用。在线测试工具 http://tool.chinaz.com/regex/ 字符：量词： . ^ $ * + ? { } 李杰李莲李二李杰和李莲英李二棍注意阅读全文

posted @ 2019-04-01 20:21 淡水无甜阅读(112) 评论(0) 推荐(0) 编辑

公告