文件处理
文件处理: csv, table, json, txt, db
这里面全是死的东西,在这几个例子最常用的是.csv .dat .txt 文件,再牛逼点儿的使用json转成对象
import pandas as pd import json path = 'datasets/bitly_usagov/example.txt' open(path).readline() records = [json.loads(line) for line in open(path)] records[0] #----------------------------------# pd.options.display.max_rows = 10 unames = ['user_id', 'gender', 'age', 'occupation', 'zip'] rnames = ['user_id', 'movie_id', 'rating', 'timestamp'] mnames = ['movie_id', 'title', 'genres'] user= pd.read_table('datasets/movielens/users.dat', sep='::', header=None, names=unames) ratings = pd.read_table('datasets/movielens/ratings.dat', sep='::', header=None, names= rnames) movies = pd.read_table('datasets/movielens/movies.dat', sep='::', header=None, names=mnames) user[:5] #----------------------------------# names1880 = pd.read_csv('datasets/babynames/yob1880.txt', names = ['name', 'sex', 'births']) names1880.head() #----------------------------------# db = json.load(open('datasets/usda_food/database.json')) # 32M的数据 len(db) #----------------------------------# fec = pd.read_csv('datasets/fec/P00000001-ALL.csv') fec.info()
详细中文参考: https://www.jianshu.com/p/047d8c1c7e14
持续更新...