Python 处理得到的js的escape编码

  参考链接:http://www.cnblogs.com/suwings/p/6360395.html

  做个爬虫真的是一波三折,今天爬取网站得到的返回内容是js的escape编码,完全乱码,用urllib.unquote()不行,decode再encode也不行。

上网查了下发现了这样做可以:

import json
import demjson
import urllib
test = """{isSuccess:\'1\',pager:\'<i class="icon icon-arrow-left-mute disabled"></i><a class="pager active" data-page="1" onclick="ser(1,15)">1</a><i class="icon icon-arrow-right-active disabled" ></i>\',recordCount:\'1\',hrecordCount:\'1\',content:\'%3Ctr%20class%3D%22even%22%20onclick%3D%22locationUrl%28178303%2C0%29%3B%22%3E%3Ctd%3E1%3C/td%3E%3Ctd%20class%3D%22text-left%22%20title%3D%22%u6E56%u5357%u5929%u79CD%u5174%u519C%u517B%u6B96%u6709%u9650%u516C%u53F8%22%3E%u6E56%u5357%u5929%u79CD%u5174%u519C%u517B%u6B96%u6709%u9650%u516C%u53F8%3C/td%3E%3Ctd%3E%u5CB3%u9633%20/%20%3Cspan%20class%3D%22text-prov%22%3E%u6E56%u5357%3C/span%3E%3C/td%3E%3Ctd%3E2016%3C/td%3E%3Ctd%3E5%3C/td%3E%3C/tr%3E\'}"""
value = test.replace('%u','\\u')
byts = urllib.unquote(value)
byts = byts.encode('utf-8')
test_dem  = demjson.decode(byts)
print test_dem
for k,v in test_dem.items():
    print k,v

如图输出结果:

 

posted @ 2017-10-17 20:37  lplucky  阅读(654)  评论(0编辑  收藏  举报