安利几个pandas处理字典和JSON数据的方法

　　1. 字典数据转化为Dataframe类型

　　2.Dataframe转化为字典数据

　　3.json数据与Dataframe类型互相转化

　　4.多层结构字典转化为Dataframe

　　对于字典数据，直接用pd.Dataframe方法即可转化为Dataframe类型。我们可以看到，在常规的字典转化为Dataframe时，键转化为了列索引，行索引默认为range(n)，其中n为数据长度。我们亦可在进行转化的时候，通过设定参数index的值指定行索引。

　　In [1]: import pandas as pd

　　In [2]: d = {'one': [1., 2., 3., 4.],

　　...: 'two': [4., 3., 2., 1.]}

　　In [3]: pd.DataFrame(d)

　　Out[3]:

　　one two

　　0 1.0 4.0

　　1 2.0 3.0

　　2 3.0 2.0

　　3 4.0 1.0

　　In [4]: pd.DataFrame(d,index=['a', 'b', 'c', 'd'])

　　Out[4]:

　　one two

　　a 1.0 4.0

　　b 2.0 3.0

　　c 3.0 2.0

　　d 4.0 1.0

　　如果对于最简单的字典，其值为单一元素值的时候，直接使用pd.Dataframe方法进行转化时会出现报错“ValueError: If using all scalar values, you must pass an index”，大概是指在这种情况下我们需要进行index索引设置。

　　In [5]: data={'a': 1, 'b': 2}

　　In [6]: pd.DataFrame(data)

　　Traceback (most recent call last):

　　File "", line 1, in

　　pd.DataFrame(data)

　　File "C:\Users\gongdc\Anaconda3\lib\site-packages\pandas\core\frame.py", line 468, in __init__

　　mgr=init_dict(data, index, columns, dtype=dtype)

　　File "C:\Users\gongdc\Anaconda3\lib\site-packages\pandas\core\internals\construction.py", line 283, in init_dict

　　return arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)

　　File "C:\Users\gongdc\Anaconda3\lib\site-packages\pandas\core\internals\construction.py", line 78, in arrays_to_mgr

　　index=extract_index(arrays)

　　File "C:\Users\gongdc\Anaconda3\lib\site-packages\pandas\core\internals\construction.py", line 387, in extract_index

　　raise ValueError("If using all scalar values, you must pass an index")

　　ValueError: If using all scalar values, you must pass an index

　　对于这种情况，我们可以有以下几种解决方案：

　　In [7]: data

　　Out[7]: {'a': 1, 'b': 2}

　　In [8]: pd.DataFrame(data,index=[0]) #设置index

　　Out[8]:

　　a b

　　0 1 2

　　In [9]: pd.DataFrame.from_dict(data,orient='index').T #使用 pd.DataFrame.from_dict，再转置

　　Out[9]:

　　a b

　　0 1 2

　　对于由字典组成的列表，同样可以简单使用pd.Dataframe方法转化为Dataframe类型。

　　In [10]: data = [{'a': 1, 'b': 2}, {'a': 5, 'b': 10, 'c': 20}]

　　In [11]: pd.DataFrame(data)

　　Out[11]:

　　a b c

　　0 1 2 NaN

　　1 5 10 20.0

　　In [12]: pd.DataFrame(data,index=['first', 'second'],columns=['a','b'])

　　Out[12]:

　　a b

　　first 1 2

　　second 5 10

　　对于元组组成的字典，会构成多级索引的情况，其中元组的第一个元素为一级索引，第二个元素为二级索引，以此类推。

　　In [13]: td={('a', 'b'): 2,

　　...: ('a', 'a'): 4,

　　...: ('a', 'c'): 6,

　　...: ('b', 'a'): 8,

　　...: ('b', 'b'): 10}

　　In [14]: pd.DataFrame(td,index=[0])

　　Out[14]:

　　a b

　　b a c a b

　　0 2 4 6 8 10

　　对于简单的嵌套字典，使用pd.Dataframe方法进行转化时，一级key是列索引，二级key是行索引。

　　In [15]: data= {

　　...: 'key1': {'a':-2, 'b': 100},

　　...: 'key2': {'a':11, 'b': 1000},

　　...: 'key3': {'a':-34, 'b': 800},

　　...: 'key4': {'a':8, 'b': 1100},

　　...: 'key5': {'a':46, 'b': 400}

　　...: }

　　In [16]: pd.DataFrame(data)

　　Out[16]:

　　key1 key2 key3 key4 key5

　　a -2 11 -34 8 46

　　b 100 1000 800 1100 400

　　采用pd.Dataframe.from_dict方法则可以通过设置参数orient来指定行列索引。

　　In [17]: pd.DataFrame.from_dict(data, orient='index')

　　Out[17]:

　　a b

　　key1 -2 100

　　key2 11 1000

　　key3 -34 800

　　key4 8 1100

　　key5 46 400

　　In [18]: pd.DataFrame.from_dict(data, orient='columns')

　　Out[18]:

　　key1 key2 key3 key4 key5

　　a -2 11 -34 8 46

　　b 100 1000 800 1100 400

　　方法：DataFrame.to_dict(orient='dict', into=<class 'dict'>)

　　!! orient可选参数有：‘dict’, ‘list’, ‘series’, ‘split’, ‘records’, ‘index’

　　具体效果见下方案例：

　　In [19]: df=pd.DataFrame({'col1': [1, 2],

　　...: 'col2': [0.5, 0.75]},

　　...: index=['row1', 'row2'])

　　In [20]: df

　　Out[20:

　　col1 col2

　　row1 1 0.50

　　row2 2 0.75

　　In [21]: df.to_dict('dict')

　　Out[21]: {'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}

　　In [22]: df.to_dict('list')

　　Out[22]: {'col1': [1, 2], 'col2': [0.5, 0.75]}

　　In [23]: df.to_dict('series')

　　Out[23]:

　　{'col1': row1 1

　　row2 2

　　Name: col1, dtype: int64,

　　'col2': row1 0.50

　　row2 0.75

　　Name: col2, dtype: float64}

　　In [24]: df.to_dict('split')

　　Out[24]:

　　{'index': ['row1', 'row2'],

　　'columns': ['col1', 'col2'],

　　'data': [[1, 0.5], [2, 0.75]]}

　　In [25]: df.to_dict('records')

　　Out[25]: [{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]

　　In [26]: df.to_dict('index')

　　Out[26]: {'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}

　　方法：**pandas.read_json(*args, kwargs)和to_json(orient=None)一般来说，传入2个参数：data和orient

　　!! orient可选参数有如下几类：

　　'split' : dict like {index -> [index], columns -> [columns], data ->[values]}

　　'records' : list like [{column -> value}, ... , {column -> value}]

　　'index' : dict like {index -> {column -> value}}

　　'columns' : dict like {column -> {index -> value}}

　　'values' : just the values array

　　表现效果如下：

　　In [27]: df

　　Out[27]:

　　col1 col2

　　row1 1 0.50

　　row2 2 0.75

　　In [28]: df.to_json(orient='split')

　　Out[28]: '{"columns":["col1","col2"],"index":["row1","row2"],"data":[[1,0.5],[2,0.75]]}'

　　In [29]: pd.read_json(_, orient='split')

　　Out[29]:

　　col1 col2

　　row1 1 0.50

　　row2 2 0.75

　　In [30]: df.to_json(orient='records')

　　Out[30]: '[{"col1":1,"col2":0.5},{"col1":2,"col2":0.75}]'

　　In [31]: pd.read_json(_, orient='records')

　　Out[31]:

　　col1 col2

　　0 1 0.50

　　1 2 0.75

　　In [32]: df.to_json(orient='index')

　　Out[32]: '{"row1":{"col1":1,"col2":0.5},"row2":{"col1":2,"col2":0.75}}'

　　In [33]: pd.read_json(_, orient='index')

　　Out[33]:

　　col1 col2

　　row1 1 0.50

　　row2 2 0.75

　　In [34]: df.to_json(orient='columns')

　　Out[34]: '{"col1":{"row1":1,"row2":2},"col2":{"row1":0.5,"row2":0.75}}'

　　In [35]: pd.read_json(_, orient='columns')

　　Out[35]:

　　col1 col2

　　row1 1 0.50

　　row2 2 0.75

　　In [36]: df.to_json(orient='values')

　　Out[36]: '[[1,0.5],[2,0.75]]'

　　In [37]: pd.read_json(_, orient='values')

　　Out[37]:

　　0 1

　　0 1 0.50

　　1 2 0.75

　　方法：pandas.json_normalize对于普通的多级字典如下：

　　In [38]: d = {'id': 1,

　　...: 'name': '马云',

　　...: 'rank': 1,

　　...: 'score':{'数学':120,

　　...: '语文':116,

　　...: '英语':120}}

　　In [39]: pd.DataFrame(d)

　　Out[39]:

　　id name rank score

　　数学 1 马云 1 120

　　英语 1 马云 1 120

　　语文 1 马云 1 116

　　In [40]: pd.json_normalize(d)

　　Out[40]:

　　id name rank score.数学 score.语文 score.英语

　　0 1 马云 1 120 116 120

　　In [41]: pd.json_normalize(d,max_level=0)

　　Out[41]:

　　id name rank score

　　0 1 马云 1 {'数学': 120, '语文': 116, '英语': 120}

　　In [42]: pd.json_normalize(d,max_level=1)

　　Out[42]:

　　id name rank score.数学 score.语文 score.英语

　　0 1 马云 1 120 116 120

　　对于字典和列表的组合，如下：

　　In [43]: data = {'id': 1,

　　...: 'name': '马云',

　　...: 'rank': 1,

　　...: 'score':[{'数学':120,

　　...: '语文':116,

　　...: '英语':120}]}

　　In [44]: pd.DataFrame(data)

　　Out[44]:

　　id name rank score

　　0 1 马云 1 {'数学': 120, '语文': 116, '英语': 120}

　　In [45]: pd.json_normalize(d)

　　Out[45]:

　　id name rank score.数学 score.语文 score.英语

　　0 1 马云 1 120 116 120

　　In [46]: pd.json_normalize(d,max_level=0)

　　Out[46]:

　　id name rank score

　　0 1 马云 1 {'数学': 120, '语文': 116, '英语': 120}

　　In [47]: pd.json_normalize(d,max_level=1)

　　Out[47]:

　　id name rank score.数学 score.语文 score.英语

　　0 1 马云 1 120 116 120

　　对于更复杂的情况，可以如下处理：（最后案例为需求结果）

　　In [48]: data=[{'id': 101,

　　...: 'info': {'name':'马云','班级':'2班'},

　　...: 'rank': 1,

　　...: 'score':[{'数学':120,

　　...: '语文':116,

　　...: '英语':120}]},

　　...: {'id': 201,

　　...: 'info': {'name':'马华腾','班级':'1班'},

　　...: 'rank': 2,

　　...: 'score':[{'数学':119,

　　...: '语文':116,

　　...: '英语':120}]}]

　　In [49]: pd.DataFrame(data)

　　Out[49]:

　　id info rank score

　　0 101 {'name': '马云', '班级': '2班'} 1 [{'数学': 120, '语文': 116, '英语': 120}]

　　1 201 {'name': '马华腾', '班级': '1班'} 2 [{'数学': 119, '语文': 116, '英语': 120}]

　　In [50]: pd.json_normalize(data)

　　Out[50]:

　　id rank score info.name info.班级

　　0 101 1 [{'数学': 120, '语文': 116, '英语': 120}] 马云 2班

　　1 201 2 [{'数学': 119, '语文': 116, '英语': 120}] 马华腾 1班

　　In [51]: pd.json_normalize(data,max_level=1)

　　Out[51]:

　　id rank score info.name info.班级

　　0 101 1 [{'数学': 120, '语文': 116, '英语': 120}] 马云 2班

　　1 201 2 [{'数学': 119, '语文': 116, '英语': 120}] 马华腾 1班

　　In [52]: pd.json_normalize(data,max_level=0)

　　Out[52]:

　　id info rank score

　　0 101 {'name': '马云', '班级': '2班'} 1 [{'数学': 120, '语文': 116, '英语': 120}]

　　1 201 {'name': '马华腾', '班级': '1班'} 2 [{'数学': 119, '语文': 116, '英语': 120}]

　　In [53]: pd.json_normalize(data,'score',['id','rank',['info','name'],['info','班级']])

　　Out[53]:

　　数学语文英语 id rank info.name info.班级

　　0 120 116 120 101 1 马云 2班

　　1 119 116 120 201 2 马华腾 1班

　　PyTorch 中文版官方教程来了。

　　PyTorch 是近年来较为火爆的深度学习框架，然而其中文版官方教程久久不来。近日，一款完整的 PyTorch 中文版官方教程出炉，读者朋友可以更好的学习了解 PyTorch 的相关细节了。教程作者来自 pytorchchina。

posted @ 2022-02-09 19:59 ebuybay 阅读(746) 评论(0) 编辑收藏举报

刷新页面返回顶部

ebuybay

安利几个pandas处理字典和JSON数据的方法

公告