2018 年 4月 18 日随笔档案 - 骑者赶路

2018年4月18日

摘要：要爬取的网址是：http://quotes.toscrape.com/ 磕磕绊绊的写完了 spiders Pipeline 收获： 1 这里都是类，完全可以定义__init__，完全可以把一些数据添加到其静态属性。阅读全文

posted @ 2018-04-18 23:58 骑者赶路阅读(111) 评论(0) 推荐(0) 编辑

摘要：官方文档：https://docs.scrapy.org/en/latest/topics/item-pipeline.html 激活pipeline，需要在settings里配置，然而这里配置的pipeline会作用于所有的spider。加入项目中有很多spider在运行。item pipelin 阅读全文

posted @ 2018-04-18 19:27 骑者赶路阅读(338) 评论(0) 推荐(0) 编辑

scrapy之spiders

摘要：官方文档：https://docs.scrapy.org/en/latest/topics/spiders.html# 一句话总结：spider是定义爬取的动作（是否跟进新的链接）及分析网页结构（提取数据，返回item）的地方。一 scrapy.Spider 1 name 2 allowed_do 阅读全文

posted @ 2018-04-18 15:39 骑者赶路阅读(130) 评论(0) 推荐(0) 编辑

scrapy之Selectors

摘要：练习url：https://doc.scrapy.org/en/latest/_static/selectors-sample1.html 一获取文本值 xpath css 注：可以省略写成:response.xpath() 二获取属性值 xpath css 注: 可以省略写成：response 阅读全文

posted @ 2018-04-18 14:18 骑者赶路阅读(130) 评论(0) 推荐(0) 编辑

scrapy介绍及源码分析

摘要：一简介 Scrapy基于事件驱动网络框架 Twisted 编写。因此，Scrapy基于并发性考虑由非阻塞(即异步)的实现。官方文档：https://docs.scrapy.org/en/latest/topics/architecture.html 最重要的是理解 Data flow。别人的阅读全文

posted @ 2018-04-18 11:38 骑者赶路阅读(190) 评论(0) 推荐(0) 编辑

公告