Scrapy存入MySQL
一、概述
之前利用Scrapy爬取的数据,都是写入在json文件中,现在需要写入到mysql中。
在items.py中,主要有2个字段:
class CityItem(scrapy.Item): name = scrapy.Field() url = scrapy.Field()
环境说明
mysql服务器ip:192.168.0.3
用户名:root
密码:abcd@1234
创建数据库
CREATE DATABASE qunar CHARACTER SET utf8 COLLATE utf8_general_ci;
创建表test
CREATE TABLE `test` ( `id` int(11) NOT NULL AUTO_INCREMENT, `name` varchar(64) DEFAULT NULL, `url` varchar(255) DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
二、pipelines写入数据库
修改settings.py
MYSQL_HOST = "192.168.0.3" MYSQL_PORT = 3306 MYSQL_DBNAME = "qunar" MYSQL_USER = "root" MYSQL_PASSWORD = "abcd@1234"
修改pipelines.py,内容如下:
# Define your item pipelines here # # Don't forget to add your pipeline to the ITEM_PIPELINES setting # See: https://docs.scrapy.org/en/latest/topics/item-pipeline.html # useful for handling different item types with a single interface # from itemadapter import ItemAdapter import pymysql from twisted.enterprise import adbapi # 异步更新操作 class LvyouPipeline(object): def __init__(self, dbpool): self.dbpool = dbpool @classmethod def from_settings(cls, settings): # 函数名固定,会被scrapy调用,直接可用settings的值 """ 数据库建立连接 :param settings: 配置参数 :return: 实例化参数 """ adbparams = dict( host=settings['MYSQL_HOST'], port=settings['MYSQL_PORT'], db=settings['MYSQL_DBNAME'], user=settings['MYSQL_USER'], password=settings['MYSQL_PASSWORD'], cursorclass=pymysql.cursors.DictCursor # 指定cursor类型 ) # 连接数据池ConnectionPool,使用pymysql或者Mysqldb连接 dbpool = adbapi.ConnectionPool('pymysql', **adbparams) # 返回实例化参数 return cls(dbpool) def process_item(self, item, spider): """ 使用twisted将MySQL插入变成异步执行。通过连接池执行具体的sql操作,返回一个对象 """ query = self.dbpool.runInteraction(self.do_insert, item) # 指定操作方法和操作数据 # 添加异常处理 query.addCallback(self.handle_error) # 处理异常 def do_insert(self, cursor, item): # 对数据库进行插入操作,并不需要commit,twisted会自动commit insert_sql = """ insert into test(name, url) VALUES (%s,%s) """ cursor.execute(insert_sql, (item['name'], item['url'])) def handle_error(self, failure): if failure: # 打印错误信息 print(failure)
注意:insert语句,请根据实际情况修改
最后执行爬虫程序,就可以写入数据库了。
本文参考链接: