Scrapy中将item字段转为简体or繁体
1. 安装hanziconv
安装一个简繁体转换的包:
pip install hanziconv
2. 自定义一个itempiples
找到项目中的pipelines.py文件
添加自定义的pipeline:
from hanziconv import HanziConv class HanziconvPipeline(object): def process_item(self, item, spider): project_info = item['project_info'] for key, value in project_info.items(): if value is not None: if isinstance(value, unicode): value = HanziConv.toTraditional(str(value)) print key, value project_info[key] = value else: # 不为中文不处理 pass else: # value为None 初始化为空串 project_info[key] = "" return item
此代码为本人项目代码,判断value为unicode,则转换为繁体;
若要将繁体转换为简体,请将toTraditional改为toSimplified。
3. 配置项目pipeline
找到settings.py中的ITEM_PIPELINES
添加自定义的pipelines:
ITEM_PIPELINES = { 'scrapy_redis.pipelines.RedisPipeline': 400, '<project_name>.pipelines.HanziconvPipeline': 300 }
:warning: <project_name>需手动修改为自己的项目名称!
转载于 https://blog.csdn.net/weixin_34082854/article/details/87429754