scrapy中运行一段时间报错pymysql.err.InterfaceError: (0, '')
错误信息
Traceback (most recent call last): File "/home/anaconda3/envs/python36/lib/python3.6/site-packages/twisted/python/threadpool.py", line 250, in inContext result = inContext.theWork() File "/home/anaconda3/envs/python36/lib/python3.6/site-packages/twisted/python/threadpool.py", line 266, in <lambda> inContext.theWork = lambda: context.call(ctx, func, *args, **kw) File "/home/anaconda3/envs/python36/lib/python3.6/site-packages/twisted/python/context.py", line 122, in callWithContext return self.currentContext().callWithContext(ctx, func, *args, **kw) File "/home/anaconda3/envs/python36/lib/python3.6/site-packages/twisted/python/context.py", line 85, in callWithContext return func(*args,**kw) --- <exception caught here> --- File "/home/anaconda3/envs/python36/lib/python3.6/site-packages/twisted/enterprise/adbapi.py", line 474, in _runInteraction conn.rollback() File "/home/anaconda3/envs/python36/lib/python3.6/site-packages/twisted/enterprise/adbapi.py", line 52, in rollback self._connection.rollback() File "/home/anaconda3/envs/python36/lib/python3.6/site-packages/pymysql/connections.py", line 431, in rollback self._execute_command(COMMAND.COM_QUERY, "ROLLBACK") File "/home/anaconda3/envs/python36/lib/python3.6/site-packages/pymysql/connections.py", line 745, in _execute_command raise err.InterfaceError("(0, '')") pymysql.err.InterfaceError: (0, '')
翻了一下日志信息,发现有一个特点,就是在报错之前的信息大多都是抓取listing的信息,而没有返回item,也没有item入库的日志打印,所以我怀疑是adbapi的连接池中的连接很久没有使用导致连接被mysql销毁,插入的时候失败,然后事务回滚,连接池中数据库连接异常导致回滚失败,最后得到这个报错信息。
我item pipeline中使用数据库连接方法为:
1 2 3 4 5 | # pipeline默认调用 def process_item(self, item, spider): query = self.dbpool.runInteraction(self._conditional_insert, item) query.addErrback(self._handle_error, item, spider) return item |
最后我修改了一下,每次插入数据的时候ping一下,如果重连失败就将整个数据库连接池初始化一遍,完整的代码为
class MyPipeline(object): def __init__(self, dbpool): self.dbpool = dbpool @classmethod def from_settings(cls, settings): dbpool = MysqlConnectionPool().dbpool() return cls(dbpool) # pipeline默认调用 def process_item(self, item, spider): query = self.dbpool.runInteraction(self._conditional_insert, item) query.addErrback(self._handle_error, item, spider) return item def _handle_error(self, failue, item, spider): print(failue) def _conditional_insert(self, transction, item): tt = transction._connection._connection try: tt.ping() except: self.dbpool.close() self.dbpool = MysqlConnectionPool().dbpool() sql = """insert INTO `DOC_BASEINFO`(doc_type,author_org ) VALUES (%s,%s)""" params = ( item['doc_type'], item['author_org']) transction.execute(sql, params)
来源:http://www.shanhubei.com/archives/3416.html
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 震惊!C++程序真的从main开始吗?99%的程序员都答错了
· winform 绘制太阳,地球,月球 运作规律
· 【硬核科普】Trae如何「偷看」你的代码?零基础破解AI编程运行原理
· 上周热点回顾(3.3-3.9)
· 超详细:普通电脑也行Windows部署deepseek R1训练数据并当服务器共享给他人