利用函数或映射进行数据转换 (map)

 

先来看个数据

df = DataFrame({"food":["bacon", "pulled pork", "bacon", "Pastrami", "corned beef"
                        , "Bacon", "pastrami", "honey ham", "nova lox"],
                "ounces": [4, 3, 12, 6, 7.5, 8, 3, 5, 6]})

print(df)

 

需求, 你想要添加一列表示该肉类食物来源的动物类型。 我们先编写一个肉类到动物的映射:

meat_to_animal = {
    "bacon": "pig",
    "pulled pork": "pig",
    "pastrami": "cow",
    "corned beef": "cow",
    "honey ham": "pig",
    "nova lox": "salmon"
}

Series的map方法可以接受一个函数或含有映射关系的字典型对象, 但是这里有一个小问题, 即有些肉类

的首字母大写了, 而另一些则没有。因此, 我们还需要将各个值转换为小写:

各种方法:

df = DataFrame({"food":["bacon", "pulled pork", "bacon", "Pastrami", "corned beef"
                        , "Bacon", "pastrami", "honey ham", "nova lox"],
                "ounces": [4, 3, 12, 6, 7.5, 8, 3, 5, 6]})

print(df)

meat_to_animal = {
    "bacon": "pig",
    "pulled pork": "pig",
    "pastrami": "cow",
    "corned beef": "cow",
    "honey ham": "pig",
    "nova lox": "salmon"
}
# df['animal'] = df['food'].map(str.lower).map(meat_to_animal)
# print(df)
# df['animal'] = df['food'].map(meat_to_animal)
# print(df)
df1 = df['food'].map(str.lower).map(meat_to_animal)
print(df1)

print("-----------------------")
df3 = df["food"].map(lambda x:meat_to_animal[x.lower()])
print(df3)

print('---------------------') #此方法得到的是key, 不是value了, 特此表明
df2 = df["food"].map(lambda x:x.lower(), meat_to_animal)
print(df2)

 还要个方法, 替换值

 

df = DataFrame({"food":["bacon", "pulled pork", "bacon", "Pastrami", "corned beef"
                        , "Bacon", "pastrami", "honey ham", "nova lox"],
                "ounces": [4, 3, 12, 6, 7.5, 8, 3, 5, 6]})

print(df)

meat_to_animal = {
    "bacon": "pig",
    "pulled pork": "pig",
    "pastrami": "cow",
    "corned beef": "cow",
    "honey ham": "pig",
    "nova lox": "salmon"
}
df['ounces'] = df['food'].map(str.lower).map(meat_to_animal)
print(df)
View Code

 

看源码例子

     >>> x
        one   1
        two   2
        three 3

        >>> y
        1  foo
        2  bar
        3  baz

        >>> x.map(y)
        one   foo
        two   bar
        three baz

 

还有个na_nation参数, 如果需要看源码

        >>> s = pd.Series([1, 2, 3, np.nan])

        >>> s2 = s.map(lambda x: 'this is a string {}'.format(x),
                       na_action=None)
        0    this is a string 1.0
        1    this is a string 2.0
        2    this is a string 3.0
        3    this is a string nan
        dtype: object

        >>> s3 = s.map(lambda x: 'this is a string {}'.format(x),
                       na_action='ignore')
        0    this is a string 1.0
        1    this is a string 2.0
        2    this is a string 3.0
        3                     NaN
        dtype: object

 

posted @ 2017-03-08 15:46  我当道士那儿些年  阅读(1192)  评论(0编辑  收藏  举报