2013 年 10月 24 日随笔档案 - Alex-Zeng

2013年10月24日

摘要：因工作需要，要查找中文汉字分词，因为python正则表达式\W+表示的是所有的中文字就连标点符号都包括。所以要想办法过滤掉。参考博客：http://log.medcl.net/item/2011/03/the-chinese-deal-is-the-python/1.匹配中文时，正则表达式规则和目标字串的编码格式必须相同 print sys.getdefaultencoding() text =u"#who#helloworld#a中文x#" print isinstance(text,unicode) print textUnicodeDecodeError: ' 阅读全文

posted @ 2013-10-24 14:54 Alex-Zeng 阅读(648) 评论(0) 推荐(0) 编辑