2017 年 10月 12 日随笔档案 - 蓝空

scrapy代理

摘要：网上有好多proxy代理，下面的中间件完成scrapy的代理的爬虫demo github链接class ProxyMiddleware(): def __init__(self, proxy_url): self.logger = logging.getLogge... 阅读全文

posted @ 2017-10-12 10:46 蓝空阅读(124) 评论(0) 推荐(0) 编辑

scrapy 概述

摘要：用了一段时间的scrapy了，比直接Requests、Urllib确实是好用很多，框架还是不错的，偶然看到这篇帖子，确实是深有体会，copy下作为记录在编程语言的世界里，python似乎被贴上了做爬虫的一个标签，强而有力。而scrapy做为另一个老牌的开源项目，更是大规模抓取不可或缺... 阅读全文

posted @ 2017-10-12 10:37 蓝空阅读(362) 评论(0) 推荐(0) 编辑

scrapy提取不在标签内文字

摘要： response.xpath(u’//span[./text()=”出版社:”]/following::text()[1]’)如果text() 中有空格, 感谢 @董成良提醒, 你可能还需要这么写response.xpath(u’//span[contains(./text(),... 阅读全文

posted @ 2017-10-12 10:09 蓝空阅读(442) 评论(0) 推荐(0) 编辑

Mongodb官方文档（入门必备）

摘要： MongoDB 3.4 Manual W3School MangoDB教程 MongoDB 教程阅读全文

posted @ 2017-10-12 10:03 蓝空阅读(152) 评论(0) 推荐(0) 编辑

Python官方文档（入门必备）

摘要： The Python Standard Library this library reference manual describes the standard library that is distributed with Python. It also describes s... 阅读全文

posted @ 2017-10-12 10:02 蓝空阅读(209) 评论(0) 推荐(0) 编辑

scrapy 官方文档（入门必备）

摘要： scrapy（官方） scrapy（中文）阅读全文

posted @ 2017-10-12 10:01 蓝空阅读(435) 评论(0) 推荐(0) 编辑

Sweety