python抓去网页一部分
- import sys, urllib2
- headers = {'User-Agent':'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6'} //设置代理
- req = urllib2.Request("http://blog.csdn.net/nevasun", headers=headers)
- content = urllib2.urlopen(req).read()
- type = sys.getfilesystemencoding()
- print content.decode("UTF-8").encode(type) # 编码格式