pandsa_series02

如何获得数值series的四分位值

\# 设置随机数种子
state = np.random.RandomState(100)
\# 从均值为5标准差为25的正态分布随机抽取5个点构成series
ser = pd.Series(state.normal(10, 5, 25))
\# 求ser的四分位数

    np.percentile(ser, q=[0, 25, 50, 75, 100])
    
    #> array([ 1.25117263,  7.70986507, 10.92259345, 13.36360403, 18.0949083 ])

如何获得series中单一项的频率计数

#从0~7随机抽取30个列表值，组成series
ser = pd.Series(np.take(list('abcdefgh'), np.random.randint(8, size=30)))
# 对该series进行计数

    ser.value_counts()
    
    #>    d    8
        g    6
        b    6
        a    5
        e    2
        h    2
        f    1
        dtype: int64

如何保留series中前两个频次最多的项，其他项替换为‘other’

np.random.RandomState(100)
\# 从1~4均匀采样12个点组成series
  ser = pd.Series(np.random.randint(1, 5, [12]))
    # 除前两行索引对应的值不变，后几行索引对应的值为Other
    ser[~ser.isin(ser.value_counts().index[:2])] = 'Other'
    ser
    
  #>    0     Other
       1         4
       2         2
       3         2
       4         4
       5     Other
       6     Other
       7     Other
       8         4
       9         4
       10        4
       11        2
       dtype: object

如何对数值series分成10个相同数目的组
换个角度理解，对数值series离散化成10个类别（categorical）值

ser = pd.Series(np.random.random(20))

\# 离散化10个类别值,只显示前5行的数据

   pd.qcut(ser, q=[0, .10, .20, .3, .4, .5, .6, .7, .8, .9, 1], 
           labels=['1st', '2nd', '3rd', '4th', '5th', '6th', '7th', '8th', '9th', '10th']).head()
   
   #>
   0    3rd
   1    1st
   2    6th
   3    6th
   4    9th
   dtype: category
   Categories (10, object): [1st < 2nd < 3rd < 4th ... 7th < 8th < 9th < 10th]

如何使numpy数组转化为给定形状的dataframe

ser = pd.Series(np.random.randint(1, 10, 35))
\# serier类型转换numpy类型，然后重构

   df = pd.DataFrame(ser.values.reshape(7,5))
   print(df)
   
   #>       0  1  2  3  4
       0  1  2  1  2  5
       1  1  2  4  5  2
       2  1  3  3  2  8
       3  8  6  4  9  6
       4  2  1  1  8  5
       5  3  2  8  5  6
       6  1  5  5  4  6

如何找到series的值是3的倍数的位置

ser = pd.Series(np.random.randint(1, 10, 7))
print(ser)

# 获取值是3倍数的索引

    np.argwhere(ser % 3==0)
    
    #>    0    6
        1    8
        2    6
        3    7
        4    6
        5    2
        6    4
        dtype: int64
        
    #>    array([[0],
               [2],
               [4]])

posted @ 2021-12-14 11:19 青竹之下阅读(62) 评论(0) 收藏举报

huaobin

pandsa_series02

公告