pandas 初识(五)
1. 如何实现把一个属性(列)拆分成多列,产生pivot,形成向量信息,计算相关性?
例:
class_ timestamp count 0 10 2019-01-20 13:23:00 1 1 10 2019-01-20 13:24:00 2 2 10 2019-01-20 13:25:00 2 3 10 2019-01-20 13:26:00 1 4 10 2019-01-20 13:27:00 2
转为:
class_ 1 2 3 4 10
timestamp
2019-01-20 13:23:01 1.0 NaN NaN NaN NaN
2019-01-20 13:24:02 NaN NaN 2.0 NaN NaN
2019-01-20 13:25:03 NaN 2.0 NaN NaN NaN
2019-01-20 13:26:02 NaN NaN NaN 1.0 NaN
2019-01-20 13:27:05 NaN NaN NaN NaN 2.0
解决:
import pandas as pd from pandas import Timestamp info = {'class_': {0: 1, 1: 2, 2: 3, 3: 4, 4: 10}, 'timestamp': {0: Timestamp('2019-01-20 13:23:00'), 1: Timestamp('2019-01-20 13:24:00'), 2: Timestamp('2019-01-20 13:25:00'), 3: Timestamp('2019-01-20 13:26:00'), 4: Timestamp('2019-01-20 13:27:00')}, 'count': {0: 1, 1: 2, 2: 2, 3: 1, 4: 2}} df = pd.DataFrame(info) # df.pivot(index='timestamp', columns="class_", values="count").fillna(0) df.pivot(index='timestamp', columns="class_", values="count")
2. 如何实现把一个属性的多列(属性唯一)合并成一列
例:
class_ 1 2 3 4 10
timestamp
2019-01-20 13:23:01 1.0 NaN NaN NaN NaN
2019-01-20 13:24:02 NaN NaN 2.0 NaN NaN
2019-01-20 13:25:03 NaN 2.0 NaN NaN NaN
2019-01-20 13:26:02 NaN NaN NaN 1.0 NaN
2019-01-20 13:27:05 NaN NaN NaN NaN 2.0
转为:
class_ timestamp count
0 10 2019-01-20 13:23:00 1
1 10 2019-01-20 13:24:00 2
2 10 2019-01-20 13:25:00 2
3 10 2019-01-20 13:26:00 1
4 10 2019-01-20 13:27:00 2
解决:
import pandas as pd from pandas import Timestampinfo = {'class_': {0: 1, 1: 2, 2: 3, 3: 4, 4: 10}, 'timestamp': {0: Timestamp('2019-01-20 13:23:00'), 1: Timestamp('2019-01-20 13:24:00'), 2: Timestamp('2019-01-20 13:25:00'), 3: Timestamp('2019-01-20 13:26:00'), 4: Timestamp('2019-01-20 13:27:00')}, 'count': {0: 1, 1: 2, 2: 2, 3: 1, 4: 2}} df = pd.DataFrame(info) # df1 = _df.pivot(index='timestamp', columns="class_", values="count").dropna() df1 = _df.pivot(index='timestamp', columns="class_", values="count") df1 = _df.stack().reset_index() df1.columns = ["class_", "count"]