[990] Functions of pandas
Ref: pandas-cookbook
Series.isxxxx()
Series.isin(): Whether elements in Series are contained in values.
top_oceania_wines = reviews[ (reviews.country.isin(['Australia', 'New Zealand'])) & (reviews.points >= 95)
Series.str.islower(): Check whether all characters in each string are lowercase.
Series.str.isalpha(): Check whether all characters are alphabetic.
Series.str.isnumeric(): Check whether all characters are numeric.
Series.str.isalnum(): Check whether all characters are alphanumeric.
Series.str.isdigit(): Check whether all characters are digits.
Series.str.isdecimal(): Check whether all characters are decimal.
Series.str.isspace(): Check whether all characters are whitespace.
Series.str.islower(): Check whether all characters are lowercase.
Series.str.isupper(): Check whether all characters are uppercase.
Series.str.istitle(): Check whether all characters are titlecase.
Series.str.xxxx()
Series.str.contains(): Test if pattern or regex is contained within a string of a Series or Index.
data[data.Department.str.contains("HR")]
Series.str.capitalize(): Convert strings in the Series/Index to be capitalized. (The first letter)
Series.str.lower(): Converts all characters to lowercase.
Series.str.upper(): Converts all characters to uppercase.
Series.str.title(): Converts first character of each word to uppercase and remaining to lowercase.
if-then
An if-then on one column
df = pd.DataFrame( {"AAA": [4, 5, 6, 7], "BBB": [10, 20, 30, 40], "CCC": [100, 50, -30, -50]} ) df Out[2]: AAA BBB CCC 0 4 10 100 1 5 20 50 2 6 30 -30 3 7 40 -50 df.loc[df.AAA >= 5, "BBB"] = -1 df Out[4]: AAA BBB CCC 0 4 10 100 1 5 -1 50 2 6 -1 -30 3 7 -1 -50
An if-then with assignment to 2 columns:
In [5]: df.loc[df.AAA >= 5, ["BBB", "CCC"]] = 555 In [6]: df Out[6]: AAA BBB CCC 0 4 10 100 1 5 555 555 2 6 555 555 3 7 555 555
Add another line with different logic, to do the -else
In [7]: df.loc[df.AAA < 5, ["BBB", "CCC"]] = 2000 In [8]: df Out[8]: AAA BBB CCC 0 4 2000 2000 1 5 555 555 2 6 555 555 3 7 555 555
Building criteria
Select with multi-column criteria
In [19]: df = pd.DataFrame( ....: {"AAA": [4, 5, 6, 7], "BBB": [10, 20, 30, 40], "CCC": [100, 50, -30, -50]} ....: ) ....: In [20]: df Out[20]: AAA BBB CCC 0 4 10 100 1 5 20 50 2 6 30 -30 3 7 40 -50
…and (without assignment returns a Series)
In [21]: df.loc[(df["BBB"] < 25) & (df["CCC"] >= -40), "AAA"] Out[21]: 0 4 1 5 Name: AAA, dtype: int64
…or (without assignment returns a Series)
In [22]: df.loc[(df["BBB"] > 25) | (df["CCC"] >= -40), "AAA"] Out[22]: 0 4 1 5 2 6 3 7 Name: AAA, dtype: int64
…or (with assignment modifies the DataFrame.)
In [23]: df.loc[(df["BBB"] > 25) | (df["CCC"] >= 75), "AAA"] = 999 In [24]: df Out[24]: AAA BBB CCC 0 999 10 100 1 5 20 50 2 999 30 -30 3 999 40 -50
Selection
Dataframes
Ambiguity arises when an index consists of integers with a non-zero start or non-unit increment.
In [46]: data = {"AAA": [4, 5, 6, 7], "BBB": [10, 20, 30, 40], "CCC": [100, 50, -30, -50]} In [47]: df2 = pd.DataFrame(data=data, index=[1, 2, 3, 4]) # Note index starts at 1. In [48]: df2.iloc[1:3] # Position-oriented Out[48]: AAA BBB CCC 2 5 20 50 3 6 30 -30 In [49]: df2.loc[1:3] # Label-oriented Out[49]: AAA BBB CCC 1 4 10 100 2 5 20 50 3 6 30 -30
Column selection, addition, deletion
You can treat a DataFrame
semantically like a dict of like-indexed Series
objects. Getting, setting, and deleting columns works with the same syntax as the analogous dict operations:
In [72]: df["one"] Out[72]: a 1.0 b 2.0 c 3.0 d NaN Name: one, dtype: float64 In [73]: df["three"] = df["one"] * df["two"] In [74]: df["flag"] = df["one"] > 2 In [75]: df Out[75]: one two three flag a 1.0 1.0 1.0 False b 2.0 2.0 4.0 False c 3.0 3.0 9.0 True d NaN 4.0 NaN False
Columns can be deleted or popped like with a dict:
In [76]: del df["two"] In [77]: three = df.pop("three") In [78]: df Out[78]: one flag a 1.0 False b 2.0 False c 3.0 True d NaN False
When inserting a scalar value, it will naturally be propagated to fill the column:
In [79]: df["foo"] = "bar" In [80]: df Out[80]: one flag foo a 1.0 False bar b 2.0 False bar c 3.0 True bar d NaN False bar
When inserting a Series
that does not have the same index as the DataFrame
, it will be conformed to the DataFrame’s index:
In [81]: df["one_trunc"] = df["one"][:2] In [82]: df Out[82]: one flag foo one_trunc a 1.0 False bar 1.0 b 2.0 False bar 2.0 c 3.0 True bar NaN d NaN False bar NaN
You can insert raw ndarrays but their length must match the length of the DataFrame’s index.
By default, columns get inserted at the end. DataFrame.insert()
inserts at a particular location in the columns:
In [83]: df.insert(1, "bar", df["one"]) In [84]: df Out[84]: one bar flag foo one_trunc a 1.0 1.0 False bar 1.0 b 2.0 2.0 False bar 2.0 c 3.0 3.0 True bar NaN d NaN NaN False bar NaN
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· DeepSeek 开源周回顾「GitHub 热点速览」
· 记一次.NET内存居高不下排查解决与启示
· 物流快递公司核心技术能力-地址解析分单基础技术分享
· .NET 10首个预览版发布:重大改进与新特性概览!
· .NET10 - 预览版1新功能体验(一)