alex_bn_lee

导航

< 2025年3月 >
23 24 25 26 27 28 1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30 31 1 2 3 4 5

统计

[990] Functions of pandas

Ref: pandas-cookbook


Series.isxxxx()

Series.isin(): Whether elements in Series are contained in values.

top_oceania_wines = reviews[
(reviews.country.isin(['Australia', 'New Zealand']))
& (reviews.points >= 95)

Series.str.islower(): Check whether all characters in each string are lowercase.

Series.str.isalpha(): Check whether all characters are alphabetic.

Series.str.isnumeric(): Check whether all characters are numeric.

Series.str.isalnum(): Check whether all characters are alphanumeric.

Series.str.isdigit(): Check whether all characters are digits.

Series.str.isdecimal(): Check whether all characters are decimal.

Series.str.isspace(): Check whether all characters are whitespace.

Series.str.islower(): Check whether all characters are lowercase.

Series.str.isupper(): Check whether all characters are uppercase.

Series.str.istitle(): Check whether all characters are titlecase.


Series.str.xxxx()

Series.str.contains(): Test if pattern or regex is contained within a string of a Series or Index.

data[data.Department.str.contains("HR")]

Series.str.capitalize(): Convert strings in the Series/Index to be capitalized. (The first letter)

Series.str.lower(): Converts all characters to lowercase.

Series.str.upper(): Converts all characters to uppercase.

Series.str.title(): Converts first character of each word to uppercase and remaining to lowercase.


if-then

An if-then on one column

df = pd.DataFrame(
{"AAA": [4, 5, 6, 7], "BBB": [10, 20, 30, 40], "CCC": [100, 50, -30, -50]}
)
df
Out[2]:
AAA BBB CCC
0 4 10 100
1 5 20 50
2 6 30 -30
3 7 40 -50
df.loc[df.AAA >= 5, "BBB"] = -1
df
Out[4]:
AAA BBB CCC
0 4 10 100
1 5 -1 50
2 6 -1 -30
3 7 -1 -50

An if-then with assignment to 2 columns:

In [5]: df.loc[df.AAA >= 5, ["BBB", "CCC"]] = 555
In [6]: df
Out[6]:
AAA BBB CCC
0 4 10 100
1 5 555 555
2 6 555 555
3 7 555 555

Add another line with different logic, to do the -else

In [7]: df.loc[df.AAA < 5, ["BBB", "CCC"]] = 2000
In [8]: df
Out[8]:
AAA BBB CCC
0 4 2000 2000
1 5 555 555
2 6 555 555
3 7 555 555

Building criteria

Select with multi-column criteria

In [19]: df = pd.DataFrame(
....: {"AAA": [4, 5, 6, 7], "BBB": [10, 20, 30, 40], "CCC": [100, 50, -30, -50]}
....: )
....:
In [20]: df
Out[20]:
AAA BBB CCC
0 4 10 100
1 5 20 50
2 6 30 -30
3 7 40 -50

…and (without assignment returns a Series)

In [21]: df.loc[(df["BBB"] < 25) & (df["CCC"] >= -40), "AAA"]
Out[21]:
0 4
1 5
Name: AAA, dtype: int64

…or (without assignment returns a Series)

In [22]: df.loc[(df["BBB"] > 25) | (df["CCC"] >= -40), "AAA"]
Out[22]:
0 4
1 5
2 6
3 7
Name: AAA, dtype: int64

…or (with assignment modifies the DataFrame.)

In [23]: df.loc[(df["BBB"] > 25) | (df["CCC"] >= 75), "AAA"] = 999
In [24]: df
Out[24]:
AAA BBB CCC
0 999 10 100
1 5 20 50
2 999 30 -30
3 999 40 -50

Selection

Dataframes

Ambiguity arises when an index consists of integers with a non-zero start or non-unit increment.

In [46]: data = {"AAA": [4, 5, 6, 7], "BBB": [10, 20, 30, 40], "CCC": [100, 50, -30, -50]}
In [47]: df2 = pd.DataFrame(data=data, index=[1, 2, 3, 4]) # Note index starts at 1.
In [48]: df2.iloc[1:3] # Position-oriented
Out[48]:
AAA BBB CCC
2 5 20 50
3 6 30 -30
In [49]: df2.loc[1:3] # Label-oriented
Out[49]:
AAA BBB CCC
1 4 10 100
2 5 20 50
3 6 30 -30

Column selection, addition, deletion

You can treat a DataFrame semantically like a dict of like-indexed Series objects. Getting, setting, and deleting columns works with the same syntax as the analogous dict operations:

In [72]: df["one"]
Out[72]:
a 1.0
b 2.0
c 3.0
d NaN
Name: one, dtype: float64
In [73]: df["three"] = df["one"] * df["two"]
In [74]: df["flag"] = df["one"] > 2
In [75]: df
Out[75]:
one two three flag
a 1.0 1.0 1.0 False
b 2.0 2.0 4.0 False
c 3.0 3.0 9.0 True
d NaN 4.0 NaN False

Columns can be deleted or popped like with a dict:

In [76]: del df["two"]
In [77]: three = df.pop("three")
In [78]: df
Out[78]:
one flag
a 1.0 False
b 2.0 False
c 3.0 True
d NaN False

When inserting a scalar value, it will naturally be propagated to fill the column:

In [79]: df["foo"] = "bar"
In [80]: df
Out[80]:
one flag foo
a 1.0 False bar
b 2.0 False bar
c 3.0 True bar
d NaN False bar

When inserting a Series that does not have the same index as the DataFrame, it will be conformed to the DataFrame’s index:

In [81]: df["one_trunc"] = df["one"][:2]
In [82]: df
Out[82]:
one flag foo one_trunc
a 1.0 False bar 1.0
b 2.0 False bar 2.0
c 3.0 True bar NaN
d NaN False bar NaN

You can insert raw ndarrays but their length must match the length of the DataFrame’s index.

By default, columns get inserted at the end. DataFrame.insert() inserts at a particular location in the columns:

In [83]: df.insert(1, "bar", df["one"])
In [84]: df
Out[84]:
one bar flag foo one_trunc
a 1.0 1.0 False bar 1.0
b 2.0 2.0 False bar 2.0
c 3.0 3.0 True bar NaN
d NaN NaN False bar NaN

 

posted on   McDelfino  阅读(3)  评论(0编辑  收藏  举报

相关博文:
阅读排行:
· DeepSeek 开源周回顾「GitHub 热点速览」
· 记一次.NET内存居高不下排查解决与启示
· 物流快递公司核心技术能力-地址解析分单基础技术分享
· .NET 10首个预览版发布:重大改进与新特性概览!
· .NET10 - 预览版1新功能体验(一)
点击右上角即可分享
微信分享提示