2020 年 3月 11 日随笔档案 - 胡辣汤王子

2020年3月11日

错误类型：TypeError: 'method' object is not subscriptable

摘要：可能是由于函数括号写错，或者哪里写的不够规范错误代码如下： title=li.xpath[".//a/@href"] 此时会报错：TypeError: 'method' object is not subscriptable 修改后： title=li.xpath(".//a/@href") 正确阅读全文

posted @ 2020-03-11 23:36 胡辣汤王子阅读(6763) 评论(0) 推荐(0) 编辑

爬虫08-xpath语法练习

摘要： from lxml import etree parser=etree.HTMLParser(encoding="utf-8") html=etree.parse("test.html",parser=parser) html2=etree.parse("lagou.html",parser=par 阅读全文

posted @ 2020-03-11 19:42 胡辣汤王子阅读(292) 评论(0) 推荐(0) 编辑

爬虫08-lxm读取网页文件方法

摘要： from lxml import etree text=""" <html> <head> <title>表格标签学习</title> <meta charset="UTF-8"/> <pre> 表格标签学习: table :声明一个表格 tr:声明一行,设置行高及改行所有单元格的高度. th:声明阅读全文

posted @ 2020-03-11 19:41 胡辣汤王子阅读(185) 评论(0) 推荐(0) 编辑

爬虫07-requests库cookie和session

摘要： import requests#1.获取cookiesresp=requests.get("http://www.baidu.com")print(resp.cookies.get_dict())#2.sessiondapeng_url="http://www.renren.com/88015124 阅读全文

posted @ 2020-03-11 19:39 胡辣汤王子阅读(235) 评论(0) 推荐(0) 编辑

爬虫06-处理不信任的SSL证书

摘要： resp=request.get("http://www.12306.cn/",verify=False)#添加verify参数即可 print(resp.content.decode("utf-8")) 阅读全文

posted @ 2020-03-11 15:32 胡辣汤王子阅读(154) 评论(0) 推荐(0) 编辑

爬虫05-requests库用法

摘要： 1.常用函数 import requests response=requests.get("http://www.baidu.com") # print(type(response.text))#打开 # print(response.text) print(response.url) print( 阅读全文

posted @ 2020-03-11 15:05 胡辣汤王子阅读(326) 评论(0) 推荐(0) 编辑

爬虫04-cookie

摘要： 1.cookie用法 1.cookie用法 from urllib import request dapeng_url="http://www.renren.com/880151247/profile" headers = { "User-Agent":"Mozilla/5.0 (Windows N 阅读全文

posted @ 2020-03-11 15:02 胡辣汤王子阅读(196) 评论(0) 推荐(0) 编辑

网络爬虫-爬取拉勾网不成功，登录设置cookie

摘要：在反复爬取拉勾网的信息都被拉勾网服务器识破了之后，我登录了拉勾网，并且把cookies信息放在了响应头中，结果成功了！代码如下： import requests url="https://www.lagou.com/jobs/positionAjax.json?needAddtionalResul 阅读全文

posted @ 2020-03-11 14:57 胡辣汤王子阅读(1361) 评论(0) 推荐(0) 编辑

爬虫03-简单使用代理

摘要： from urllib import parse from urllib import request url="http://httpbin.org/ip" # resp=request.urlopen(url) # print(resp.read()) #1.使用ProxyHandler 传入代阅读全文

posted @ 2020-03-11 09:59 胡辣汤王子阅读(126) 评论(0) 推荐(0) 编辑

爬虫02-简单伪装浏览器

摘要： from urllib import parse from urllib import request # url="https://www.lagou.com/jobs/list_python?labelWords=&fromSearch=true&suginput=" url="https:// 阅读全文

posted @ 2020-03-11 09:57 胡辣汤王子阅读(165) 评论(0) 推荐(0) 编辑

爬虫01-urllib常用函数

摘要： from urllib import request from urllib import parse #1.读取网页 url="http://www.baidu.com" resp=request.urlopen(url) # print(resp.getcode())#获取响应码 # print 阅读全文

posted @ 2020-03-11 09:56 胡辣汤王子阅读(251) 评论(0) 推荐(0) 编辑

胡辣汤王子

公告