Scrapy学习-4-Items类&Pipelines类
items类使用
作用
能使得我们非常方便的操作字段名
在items.py中定制我们的类
class ArticleItem(scrapy.Item): title = scrapy.Field() create_time = scrapy.Field() url = scrapy.Field() url_id = scrapy.Field() front_image_url = scrapy.Field() front_image_path = scrapy.Field() praise_nums = scrapy.Field() comment_nums = scrapy.Field() fav_nums = scrapy.Field() tags = scrapy.Field() content = scrapy.Field()
在spider项目中导入ArticleItem
def parse(self, response): article_item = ArticleItem() article_item['title'] = title article_item['create_time'] = create_time article_item['url'] = url import hashlib m = hashlib.md5() m.update(url) article_item['url_id'] = m.hexdigest() article_item['praise_nums'] = praise_nums article_item['comment_nums'] = comment_nums article_item['fav_nums'] = fav_nums article_item['tags'] = tags article_item['front_image_url'] = front_image_url article_item['content'] = content yield article_item
Pipelines类
步骤
在parse中使用items做值填充,并传递到pipelines做数据处理
默认类
class ArticlespiderPipeline(object): def process_item(self, item, spider): return item