Python 获取豆瓣用户电影收藏数据

通过豆瓣API获取用户的影评信息，存入到字典中格式为 {电影名:评分}

# -*- coding: utf-8 -*-

'''

Created on May 19, 2012

@author: Edison

'''

import urllib2

import json

import string, pickle

def getData(name, start_index, movie_dict):

'''获取用户的电影数据'''

MAX_RESULTS = 50 #douban目前API一次最多返回50条记录

API_KEY = 'xxx'

req = urllib2.Request('http://api.douban.com/people/'+name+'/collection?cat=movie&status=watched&alt=json' +

'&apikey='+API_KEY+

'&max-results=' + str(MAX_RESULTS)+

'&start-index='+ str(start_index))

response = urllib2.urlopen(req).read()

mfile = open('movies.json',"w+")

mfile.write(response.encode('utf-8'))

mfile.close()

json_data = json.loads(response)

#如果获取到所有的用户数据

if 0==len(json_data['entry']):

return

else:

count = 0

for entry in json_data['entry']:

movie_name = entry["db:subject"]["title"]["$t"]

if "gd:rating" in entry: #部分电影没有任何评分，就不存入字典

movie_rate = string.atoi(entry["gd:rating"]["@value"])

movie_dict[movie_name] = movie_rate

else:

movie_rate = 'no_rate'

print count

count += 1

#递归获取所有数据

start_index += MAX_RESULTS

print start_index

getData(name, start_index, movie_dict)

if __name__ == "__main__":

start_index = 0

name_a='xxx'

movie_dict_a = {}

getData(name_a, start_index, movie_dict_a)

outf = file('xxx_aquar25_movie_dict','wb') #{moive_name:rate,moive_name:rate}

pickle.dump(movie_dict_a, outf) #序列化到文件中

outf.close()

关于中文编码

字符是一种表现形式，它在内存中都是以字节8bit存储的，str是8bit的字符，而Unicode是16bit字符，Unicode在内存中存储时也是以byte进行存储的，因此print utf-8编码的字符串都是一系列字节数字。unicode字符可以和普通字符串一样进行各种操作，甚至作为字典的key。如字典{u'\u4e54\u6cbb\u514b\u9c81\u5c3c': u'\u7537'}，实际上是{'乔治克鲁尼':'男'} 乔治克鲁尼用utf-8编码的字节序列为'\xe4\xb9\x94\xe6\xb2\xbb\xe5\x85\x8b\xe9\xb2\x81\xe5\xb0\xbc'

字节序列str===>str.decode('字节序列的编码')====>unicode字符串（如u=u'\u4600'）====>u.encode("gb2312")

一般情况下，在程序中处理字符串时都使用unicode对象，即u开头的字符，而只有在输出的时候再对字符串使用encode()按照需要进行编码。在用数据进行测试时，不仅要使用编码值<127的字符，更要使用编码值大于255的字符，如汉字来进行测试保证程序的兼容性。

本文使用Blog_Backup未注册版本导出，请到soft.pt42.com注册。

posted @ 2012-05-22 07:05 莫忆往西阅读(361) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

Time Goes By

如花美眷似水流年

Python 获取豆瓣用户电影收藏数据

公告

Time Goes By

如花美眷 似水流年

Python 获取豆瓣用户电影收藏数据

公告

如花美眷似水流年