http://scrapy-chs.readthedocs.org/zh_CN/latest/intro/overview.html
以上链接是很好的scrapy学些资料.感谢marchtea的翻译.
在学习过程中,碰到一个很棘手的问题: 中文的显示和存储. (中文在控制台显示的为\u77e5\u540d...这样的字符,保存到文件也是这样的)
在网上找了很久,下面这个链接应是最切题的.
http://stackoverflow.com/questions/9181214/scrapy-text-encoding
摘抄如下:
pipelines.py:
1 import json 2 import codecs 3 4 class JsonWithEncodingPipeline(object): 5 6 def __init__(self): 7 self.file = codecs.open('scraped_data_utf8.json', 'w', encoding='utf-8') 8 9 def process_item(self, item, spider): 10 line = json.dumps(dict(item), ensure_ascii=False) + "\n" 11 self.file.write(line) 12 return item 13 14 def spider_closed(self, spider): 15 self.file.close()
按照上面的方法,输出到文件就是正常的中文了.
搜索关键字和链接: JsonItemExporter ensure_ascii=False JsonItemExporter uxxx python输出json文件\uxxx如何转换成中文 Decode and Encode in Python [ http://yangpengg.github.io/blog/2012/12/13/decode-and-encode-in-python/ ] -- python print输出的是中文但是输出到文件的是\uxxx http://wklken.me/posts/2013/08/31/python-extra-coding-intro.html Scrapy : storing the data http://stackoverflow.com/questions/14073442/scrapy-storing-the-data scrapy 使用item export输出中文到json文件,内容为unicode码,如何输出为中文? http://www.lefern.com/question/15837/scrapy-shi-yong-item-exportshu-chu-zhong-wen-dao-jsonwen-jian-nei-rong-wei-unicodema-ru-he-shu-chu-wei-zhong-wen/ how to put in json utf-8 symbols, not their codes? https://groups.google.com/forum/#!msg/scrapy-users/rJcfSFVZ3O4/ZYsD7CMoCKMJ scrapy text encoding http://stackoverflow.com/questions/9181214/scrapy-text-encoding