pandas 技巧

 

find index of top 3 largest values of each column:

 

df1.apply(lambda s: pd.Series(s.nlargest(3).index))

 

map dataframe column

df["ItemIdx"] = df["question"].map(lambda x: itemMap.get(x,np.NaN))

 

load a dictionay from a save pkl file

with open ("l.pkl","rb") as f:
    itemMap= pickle.lead(f)

 

find the startpoint of each session (after sorted):

offset = np.zeros(df["sessinId"].nunique()+1,dtype=np.int32)
offset[1:] = df.groupby('sessinId').size().cumsum()

 create a dictionary of two pandas Dataframe columns?

In [9]: pd.Series(df.Letter.values,index=df.Position).to_dict()
Out[9]: {1: 'a', 2: 'b', 3: 'c', 4: 'd', 5: 'e'}

 

Remap values in pandas column with a dict

>>> df = pd.DataFrame({'col2': {0: 'a', 1: 2, 2: np.nan}, 'col1': {0: 'w', 1: 1, 2: 2}})
>>> di = {1: "A", 2: "B"}
>>> df
  col1 col2
0    w    a
1    1    2
2    2  NaN
>>> df.replace({"col1": di})
  col1 col2
0    w    a
1    A    2
2    B  NaN

 去掉括号里数字字母

config .loc[:,'cc'] = config.insurance.apply(lambda x: re.sub("\([a-zA-Z0-9]\)","",x))

 

 去掉括号里

config .loc[:,'cc'] = config.insurance.apply(lambda x: re.sub("\(.*?\)","",x))

 

index reset

dfff.reset_index(drop=True)

 

Translate every element in numpy array according to key

>>> a = np.array([[1,2,3],
              [3,2,4]])
>>> my_dict = {1:23, 2:34, 3:36, 4:45}
>>> np.vectorize(my_dict.get)(a)
array([[23, 34, 36],
       [36, 34, 45]]

 

 

 pandas dataframe to nested json after groupby

#application: dataframe to mongoDB

test_dict = {'id':[1,2,3,1,2,1],
"name":[...],
"math":[...],
"English":[...]}

df = pd.DataFrame(data=test_dict)

e = df.groupby(["name","id"],as_index=False).apply(lambda x: x[["math","english"]].to_dict("r"))

sss = e.reset_index().rename(columns={0:"questions"})

result_dict = sss.to_dict("records")

 

 

 

 

posted @ 2019-11-01 10:57  SENTIMENT_SONNE  阅读(176)  评论(0编辑  收藏  举报