乔儿 - 博客园

2019年5月2日

摘要： url_rl = "https://www.yijiupi.com/v31/Product/ListProduct" payload = '{"currentPage":1,"data":{"sonCategoryId":"%s","categoryIds":["%s"],"saleModel":-1,"sort":0,"currentPage":1,"pageS... 阅读全文

posted @ 2019-05-02 17:08 乔儿阅读(1121) 评论(0) 推荐(0) 编辑

Python除法

摘要： #encoding:utf-8 import math #向上取整（进一） print("math.ceil---") print("math.ceil(2.3) => ", math.ceil(2.3)) print("math.ceil(2.6) => ", math.ceil(2.6)) #向下取整（舍一） print("\nmath.floor---") print("math.fl... 阅读全文

posted @ 2019-05-02 15:46 乔儿阅读(473) 评论(0) 推荐(0) 编辑

scrapy错误汇总

摘要：没有请求头headers时就会这样报错,例如: 阅读全文

posted @ 2019-05-02 09:58 乔儿阅读(661) 评论(0) 推荐(0) 编辑

2019年4月27日

python爬取post请求Reque Payload的json数据

摘要：这里需要特别注意的是，把payload里面value为‘null’的值去掉（这里只是改url需要注意这一点，其他的还没测试）,该url = “https://www.xxxxxxxxxxxxx” 阅读全文

posted @ 2019-04-27 10:46 乔儿阅读(1316) 评论(1) 推荐(0) 编辑

2019年4月19日

scrapy框架修改单个爬虫的配置,包括下载延时，下载超时设置

摘要： DOWNLOAD_DELAY是下载延时的意思，就是下载网页（html）的间隔时间， DOWNLOAD_TIMEOUT是超时时间限制，就是如果60s还没有把网页（html）下载了，那么就会放弃这个网页，例如pycharm运行爬虫时的提示：“(failed 1 times):User timeout c 阅读全文

posted @ 2019-04-19 15:35 乔儿阅读(4655) 评论(0) 推荐(0) 编辑

2019年4月16日

待解决问题1

摘要： url= ‘https://t.360jinhuo.com/goods-3451.html?user_id=2207’ 爬取图片url时无法正常拿到，用小path-helper时获得Loading zoom.. 阅读全文

posted @ 2019-04-16 12:00 乔儿阅读(84) 评论(0) 推荐(0) 编辑

2019年4月15日

xpath | 计算两个节点集

摘要： url = li.xpath("./div/div[2]/a/@href | ./div/div[2]/div[2]/a/@href").extract_first() 阅读全文

posted @ 2019-04-15 19:11 乔儿阅读(266) 评论(0) 推荐(0) 编辑

2019年4月12日

re正则表达式匹配字符串中的数字

摘要： \d+匹配1次或者多次数字，注意这里不要写成*，因为即便是小数，小数点之前也得有一个数字；\.?这个是匹配小数点的，可能有，也可能没有；\d*这个是匹配小数点之后的数字的，所以是0个或者多个例如：阅读全文

posted @ 2019-04-12 15:17 乔儿阅读(12155) 评论(0) 推荐(0) 编辑

2019年3月29日

extract()函数，将selector对象中data的值取出来；extract_first()函数，将列表中第0个selector对象拿出来，然后取data的值。

摘要：阅读全文

posted @ 2019-03-29 16:59 乔儿阅读(383) 评论(0) 推荐(0) 编辑

打印出来的文字乱码怎么办（还有一种可能是由于Accept-Encoding: gzip, deflate, br导致的，还有'Content-Encoding: gzip'）

摘要：例如： response = requests.get(url=url,headers=headers) print(response.encoding) text = response.text html = etree.HTML(text) title = html.xpath("//div[c 阅读全文

posted @ 2019-03-29 10:47 乔儿阅读(1562) 评论(0) 推荐(0) 编辑

乔儿

公告