pandsa_series02

  1. 如何获得数值series的四分位值
    \# 设置随机数种子
    state = np.random.RandomState(100)
    \# 从均值为5标准差为25的正态分布随机抽取5个点构成series
    ser = pd.Series(state.normal(10, 5, 25))
    \# 求ser的四分位数
    
        np.percentile(ser, q=[0, 25, 50, 75, 100])
        
        #> array([ 1.25117263,  7.70986507, 10.92259345, 13.36360403, 18.0949083 ])
    1. 如何获得series中单一项的频率计数
      #从0~7随机抽取30个列表值,组成series
      ser = pd.Series(np.take(list('abcdefgh'), np.random.randint(8, size=30)))
      # 对该series进行计数
      
          ser.value_counts()
          
          #>    d    8
              g    6
              b    6
              a    5
              e    2
              h    2
              f    1
              dtype: int64
      1. 如何保留series中前两个频次最多的项,其他项替换为‘other’
      2. np.random.RandomState(100)
        \# 从1~4均匀采样12个点组成series
          ser = pd.Series(np.random.randint(1, 5, [12]))
            # 除前两行索引对应的值不变,后几行索引对应的值为Other
            ser[~ser.isin(ser.value_counts().index[:2])] = 'Other'
            ser
            
          #>    0     Other
               1         4
               2         2
               3         2
               4         4
               5     Other
               6     Other
               7     Other
               8         4
               9         4
               10        4
               11        2
               dtype: object
          1. 如何对数值series分成10个相同数目的组
            换个角度理解,对数值series离散化成10个类别(categorical)值
        ser = pd.Series(np.random.random(20))
        
        \# 离散化10个类别值,只显示前5行的数据
        
           pd.qcut(ser, q=[0, .10, .20, .3, .4, .5, .6, .7, .8, .9, 1], 
                   labels=['1st', '2nd', '3rd', '4th', '5th', '6th', '7th', '8th', '9th', '10th']).head()
           
           #>
           0    3rd
           1    1st
           2    6th
           3    6th
           4    9th
           dtype: category
           Categories (10, object): [1st < 2nd < 3rd < 4th ... 7th < 8th < 9th < 10th]
        1. 如何使numpy数组转化为给定形状的dataframe
          ser = pd.Series(np.random.randint(1, 10, 35))
          \# serier类型转换numpy类型,然后重构
          
             df = pd.DataFrame(ser.values.reshape(7,5))
             print(df)
             
             #>       0  1  2  3  4
                 0  1  2  1  2  5
                 1  1  2  4  5  2
                 2  1  3  3  2  8
                 3  8  6  4  9  6
                 4  2  1  1  8  5
                 5  3  2  8  5  6
                 6  1  5  5  4  6
          1. 如何找到series的值是3的倍数的位置
            ser = pd.Series(np.random.randint(1, 10, 7))
            print(ser)
            
            # 获取值是3倍数的索引
            
                np.argwhere(ser % 3==0)
                
                #>    0    6
                    1    8
                    2    6
                    3    7
                    4    6
                    5    2
                    6    4
                    dtype: int64
                    
                #>    array([[0],
                           [2],
                           [4]])

             

posted @ 2021-12-14 11:19  青竹之下  阅读(40)  评论(0编辑  收藏  举报