Pandas-2-2-中文文档-五十六-

Pandas 2.2 中文文档（五十六）

原文：pandas.pydata.org/docs/

版本 0.17.1（2015 年 11 月 21 日）

原文：pandas.pydata.org/docs/whatsnew/v0.17.1.html

注意

我们很自豪地宣布pandas已成为(NumFOCUS 组织)的赞助项目。这将有助于确保pandas作为世界一流开源项目的成功开发。

这是从 0.17.0 开始的一个小 bug 修复版本，包括大量的 bug 修复以及几个新功能、增强功能和性能改进。我们建议所有用户升级到这个版本。

亮点包括：

支持条件 HTML 格式化，请参阅这里
释放 csv 阅读器和其他操作的 GIL，请参阅这里
修复了从 0.16.2 开始在DataFrame.drop_duplicates中的回归，导致整数值出现错误结果 (GH 11376)

v0.17.1 中的新功能

新功能
- 条件 HTML 格式化
增强功能
API 更改
- 弃用功能
性能改进
错误修复
贡献者

新功能

条件 HTML 格式化

警告

这是一个新功能，正在积极开发中。我们将在未来的发布中添加功能，可能会进行重大更改。欢迎在GH 11610中提供反馈。

我们为条件 HTML 格式化添加了实验性支持：根据数据对 DataFrame 进行视觉样式设置。样式是通过 HTML 和 CSS 实现的。使用pandas.DataFrame.style属性访问 styler 类，这是一个附加了您的数据的Styler实例。

这里是一个快速示例：

In [1]: np.random.seed(123)

In [2]: df = pd.DataFrame(np.random.randn(10, 5), columns=list("abcde"))

In [3]: html = df.style.background_gradient(cmap="viridis", low=0.5)

我们可以渲染 HTML 以获得以下表格。

	a	b	c	d	e
0	-1.085631	0.997345	0.282978	-1.506295	-0.5786
1	1.651437	-2.426679	-0.428913	1.265936	-0.86674
2	-0.678886	-0.094709	1.49139	-0.638902	-0.443982
3	-0.434351	2.20593	2.186786	1.004054	0.386186
4	0.737369	1.490732	-0.935834	1.175829	-1.253881
5	-0.637752	0.907105	-1.428681	-0.140069	-0.861755
6	-0.255619	-2.798589	-1.771533	-0.699877	0.927462
7	-0.173636	0.002846	0.688223	-0.879536	0.283627
8	-0.805367	-1.727669	-0.3909	0.573806	0.338589
9	-0.01183	2.392365	0.412912	0.978736	2.238143

Styler 与 Jupyter Notebook 可以良好交互。更多信息请参阅文档 ## 增强功能

DatetimeIndex现在支持使用astype(str)转换为字符串 (GH 10442)
在 pandas.DataFrame.to_csv() 中支持compression (gzip/bz2) (GH 7615)
pd.read_* 函数现在也可以接受 pathlib.Path 或 py:py._path.local.LocalPath 对象作为 filepath_or_buffer 参数。 (GH 11033) - DataFrame 和 Series 函数 .to_csv()、.to_html() 和 .to_latex() 现在可以处理以波浪号开头的路径（例如 ~/Documents/） (GH 11438)
如果未提供列，DataFrame 现在将使用 namedtuple 的字段作为列 (GH 11181)
当可能时，DataFrame.itertuples() 现在返回 namedtuple 对象。 (GH 11269, GH 11625)
添加了axvlines_kwds到平行坐标图 (GH 10709)

为.info()和.memory_usage()增加了选项，以提供对内存消耗的深入检查。请注意，这可能计算成本很高，因此是一个可选参数。 (GH 11595)

In [4]: df = pd.DataFrame({"A": ["foo"] * 1000})  # noqa: F821

In [5]: df["B"] = df["A"].astype("category")

# shows the '+' as we have object dtypes
In [6]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   A       1000 non-null   object 
 1   B       1000 non-null   category
dtypes: category(1), object(1)
memory usage: 9.0+ KB

# we have an accurate memory assessment (but can be expensive to compute this)
In [7]: df.info(memory_usage="deep")
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   A       1000 non-null   object 
 1   B       1000 non-null   category
dtypes: category(1), object(1)
memory usage: 59.9 KB

Index现在有一个fillna方法 (GH 10089)

In [8]: pd.Index([1, np.nan, 3]).fillna(2)
Out[8]: Index([1.0, 2.0, 3.0], dtype='float64')

类型为category的 Series 现在可以使用.str.<...>和.dt.<...>访问器方法/属性，如果分类是该类型的话。 (GH 10661)

In [9]: s = pd.Series(list("aabb")).astype("category")

In [10]: s
Out[10]: 
0    a
1    a
2    b
3    b
Length: 4, dtype: category
Categories (2, object): ['a', 'b']

In [11]: s.str.contains("a")
Out[11]: 
0     True
1     True
2    False
3    False
Length: 4, dtype: bool

In [12]: date = pd.Series(pd.date_range("1/1/2015", periods=5)).astype("category")

In [13]: date
Out[13]: 
0   2015-01-01
1   2015-01-02
2   2015-01-03
3   2015-01-04
4   2015-01-05
Length: 5, dtype: category
Categories (5, datetime64[ns]): [2015-01-01, 2015-01-02, 2015-01-03, 2015-01-04, 2015-01-05]

In [14]: date.dt.day
Out[14]: 
0    1
1    2
2    3
3    4
4    5
Length: 5, dtype: int32

pivot_table现在有一个margins_name参数，因此您可以使用除‘All’之外的其他内容作为默认值 (GH 3335)
实现了使用固定的 HDF5 存储导出 datetime64[ns, tz] 数据类型 (GH 11411)
漂亮打印集合（例如在 DataFrame 单元格中）现在使用集合文字语法（{x, y}）而不是 Legacy Python 语法（set([x, y])） (GH 11215)
改进pandas.io.gbq.to_gbq()中的错误消息，当流式插入失败时 (GH 11285) 以及当 DataFrame 与目标表的模式不匹配时 (GH 11359) ## API 变更
对于不支持的索引类型，在Index.shift中引发NotImplementedError（GH 8038）
datetime64和timedelta64类型的序列进行min和max缩减现在会得到NaT而不是nan（GH 11245）。
使用空键进行索引将引发TypeError而不是ValueError（GH 11356）
Series.ptp现在默认忽略缺失值（GH 11163）

Deprecations

实现google-analytics支持的pandas.io.ga模块已弃用，并将在将来的版本中移除（GH 11308）
在未来版本中将移除.to_csv()中的engine关键字（GH 11274） ## Performance improvements
在对索引进行排序之前检查其单调性（GH 11080）
当其 dtype 不能包含NaN时，Series.dropna性能改进（GH 11159）
在大多数日期时间字段操作（例如DatetimeIndex.year、Series.dt.year）、标准化以及转换为Period和从PeriodIndex.to_timestamp中释放 GIL（全局解释器锁）（GH 11263）
在一些滚动算法中释放 GIL（全局解释器锁）：rolling_median、rolling_mean、rolling_max、rolling_min、rolling_var、rolling_kurt、rolling_skew（GH 11450）
在read_csv、read_table中读取和解析文本文件时释放 GIL（全局解释器锁）（GH 11272）
改进了rolling_median的性能（GH 11450）
改进了to_excel的性能（GH 11352）
在Categorical类别的 repr 中性能 bug 已修复，修复了在显示之前将字符串渲染的问题（GH 11305）
在Categorical.remove_unused_categories中性能改进（GH 11643）。
使用没有数据的Series构造函数和DatetimeIndex时性能改进（GH 11433）
使用 groupby 改进了shift、cumprod和cumsum的性能（GH 4095） ## Bug fixes
在 Python 3.5 中SparseArray.__iter__()现在不再引发PendingDeprecationWarning（GH 11622）
0.16.2 版本之后关于长浮点数/NaN 输出格式的回归问题已修复（GH 11302）
Series.sort_index() 现在正确处理 inplace 选项（GH 11402）
在构建时错误地分配了 .c 文件在 PyPi 上读取浮点数的 csv 并传递 na_values=<a scalar> 时会显示异常（GH 11374）
.to_latex() 输出在索引具有名称时出现错误（GH 10660）
具有编码长度超过最大未编码长度的字符串的 HDFStore.append 中的错误（GH 11234）
合并 datetime64[ns, tz] 类型时出现的错误（GH 11405）
在 HDFStore.select 中，在 where 子句中与 numpy 标量进行比较时出现的错误（GH 11283）
使用具有 MultiIndex 索引器的 DataFrame.ix 中的错误（GH 11372）
在具有歧义的端点上使用 date_range 中的错误（GH 11626）
防止将新属性添加到访问器 .str、.dt 和 .cat。检索这样的值是不可能的，因此在设置时出错。（GH 10673）
具有具有歧义时间和 .dt 访问器的 tz 转换中的错误（GH 11295）
在使用具有歧义时间索引的输出格式化中的错误（GH 11619）
Series 与列表类似物的比较中的错误（GH 11339）
在具有 datetime64[ns, tz] 和非兼容 to_replace 的 DataFrame.replace 中的错误（GH 11326，GH 11153）
在 numpy.array 中的 numpy.datetime64('NaT') 未被确定为 null 的 isnull 中的错误（GH 11206）
使用混合整数索引进行类似列表的索引时出现的错误（GH 11320）
在索引为 Categorical 类型时，使用 margins=True 的 pivot_table 中的错误（GH 10993）
DataFrame.plot 中无法使用十六进制字符串颜色的错误（GH 10299）
0.16.2 中 DataFrame.drop_duplicates 中的回归，导致整数值上的结果不正确（GH 11376）
在列表中使用一元运算符时 pd.eval 出现错误（GH 11235）
在零长度数组中使用 squeeze() 中的错误（GH 11230，GH 8999）
对于分层索引，describe() 删除列名称的错误（GH 11517）
DataFrame.pct_change()中未在.fillna方法上传播axis关键字的 bug（GH 11150）
使用.to_csv()时当混合使用整数和字符串列名作为columns参数时存在的错误（GH 11637）
使用range进行索引的 bug，（GH 11652）
numpy 标量的推断和在设置列时保留 dtype 的 bug（GH 11638）
使用 unicode 列名进行to_sql时产生 UnicodeEncodeError 的 bug（GH 11431）。
在plot中设置xticks的回归错误（GH 11529）。
在holiday.dates中的 bug，在节假日上无法应用观察规则和文档增强（GH 11477，GH 11533）
在拥有普通Axes实例而不是SubplotAxes时修复绘图问题（GH 11520，GH 11556）。
DataFrame.to_latex()中的 bug，当header=False时产生额外的规则（GH 7124）
df.groupby(...).apply(func)中的 bug，当 func 返回包含新的 datetimelike 列的Series时（GH 11324）
pandas.json中处理大文件加载时的 bug（GH 11344）
在to_excel中具有重复列的 bug（GH 11007，GH 10982，GH 10970）
修复了一个 bug，阻止了构建 dtype 为datetime64[ns, tz]的空系列（GH 11245）。
read_excel中包含整数的 MultiIndex 的 bug（GH 11317）
在to_excel中与 openpyxl 2.2+和合并相关的 bug（GH 11408）
DataFrame.to_dict()中的 bug，在数据中仅存在日期时间时产生np.datetime64对象而不是Timestamp（GH 11327）
DataFrame.corr()中的 bug，当对具有布尔值和非布尔值列的 DataFrame 计算 Kendall 相关性时引发异常（GH 11560）
在 FreeBSD 10+（使用clang）上由 C inline函数引起的链接时错误的 bug（GH 10510）
DataFrame.to_csv中通过格式化MultiIndexes传递参数的 bug，包括date_format（GH 7791）
DataFrame.join()中存在的 Bug，how='right'会导致TypeError（GH 11519）
Series.quantile中存在的 Bug，空列表结果具有带有object dtype 的Index（GH 11588）
pd.merge中存在的 Bug，当合并结果为空时，结果是空的Int64Index而不是Index(dtype=object)（GH 11588）
Categorical.remove_unused_categories中存在的 Bug，当有NaN值时会出错（GH 11599）
DataFrame.to_sparse()中存在的 Bug，多重索引丢失列名（GH 11600）
DataFrame.round()中存在的 Bug，非唯一列索引会导致致命的 Python 错误（GH 11611）
DataFrame.round()中存在的 Bug，decimals为非唯一索引的 Series 会产生额外的列（GH 11618） ## 贡献者

共有 63 人为此版本提交了补丁。名字后面带“+”的人第一次贡献了补丁。

Aleksandr Drozd +
Alex Chase +
Anthonios Partheniou
BrenBarn +
Brian J. McGuirk +
Chris
Christian Berendt +
Christian Perez +
Cody Piersall +
数据与代码专家在数据上尝试代码
DrIrv +
Evan Wright
Guillaume Gay
Hamed Saljooghinejad +
Iblis Lin +
Jake VanderPlas
Jan Schulz
Jean-Mathieu Deschenes +
Jeff Reback
Jimmy Callin +
Joris Van den Bossche
K.-Michael Aye
Ka Wo Chen
Loïc Séguin-C +
Luo Yicheng +
Magnus Jöud +
Manuel Leonhardt +
Matthew Gilbert
Maximilian Roos
Michael +
Nicholas Stahl +
Nicolas Bonnotte +
Pastafarianist +
Petra Chong +
Phil Schaf +
Philipp A +
Rob deCarvalho +
Roman Khomenko +
Rémy Léone +
Sebastian Bank +
Sinhrks
Stephan Hoyer
Thierry Moisan
Tom Augspurger
Tux1 +
Varun +
Wieland Hoffmann +
Winterflower
Yoav Ram +
Younggun Kim
Zeke +
ajcr
azuranski +
behzad nouri
cel4
emilydolson +
hironow +
lexual
llllllllll +
rockg
silentquasar +
sinhrks
taeold +

新功能

条件 HTML 格式

警告

这是一个新功能，正在积极开发中。我们将在未来的版本中添加功能，并可能进行重大更改。欢迎在GH 11610中提供反馈。

我们增加了对实验性支持条件 HTML 格式的功能：根据数据对 DataFrame 进行视觉样式设置。样式设置使用 HTML 和 CSS 完成。通过pandas.DataFrame.style属性访问样式类，这是带有数据的Styler的实例。

这是一个快速示例：

In [1]: np.random.seed(123)

In [2]: df = pd.DataFrame(np.random.randn(10, 5), columns=list("abcde"))

In [3]: html = df.style.background_gradient(cmap="viridis", low=0.5)

我们可以渲染 HTML 以获得以下表格。

	a	b	c	d	e
0	-1.085631	0.997345	0.282978	-1.506295	-0.5786
1	1.651437	-2.426679	-0.428913	1.265936	-0.86674
2	-0.678886	-0.094709	1.49139	-0.638902	-0.443982
3	-0.434351	2.20593	2.186786	1.004054	0.386186
4	0.737369	1.490732	-0.935834	1.175829	-1.253881
5	-0.637752	0.907105	-1.428681	-0.140069	-0.861755
6	-0.255619	-2.798589	-1.771533	-0.699877	0.927462
7	-0.173636	0.002846	0.688223	-0.879536	0.283627
8	-0.805367	-1.727669	-0.3909	0.573806	0.338589
9	-0.01183	2.392365	0.412912	0.978736	2.238143

Styler与 Jupyter Notebook 很好地交互。有关更多信息，请参阅文档。### 条件 HTML 格式化

警告

这是一个新功能，正在积极开发中。我们将在未来的版本中添加功能，并可能进行重大更改。欢迎在GH 11610中提供反馈。

我们已经为条件 HTML 格式化添加了实验性支持：根据数据对 DataFrame 的视觉样式进行设置。样式设置通过 HTML 和 CSS 完成。使用pandas.DataFrame.style属性访问 styler 类，该类的实例已附加了您的数据。

这是一个快速示例：

In [1]: np.random.seed(123)

In [2]: df = pd.DataFrame(np.random.randn(10, 5), columns=list("abcde"))

In [3]: html = df.style.background_gradient(cmap="viridis", low=0.5)

我们可以渲染 HTML 以获得以下表格。

	a	b	c	d	e
0	-1.085631	0.997345	0.282978	-1.506295	-0.5786
1	1.651437	-2.426679	-0.428913	1.265936	-0.86674
2	-0.678886	-0.094709	1.49139	-0.638902	-0.443982
3	-0.434351	2.20593	2.186786	1.004054	0.386186
4	0.737369	1.490732	-0.935834	1.175829	-1.253881
5	-0.637752	0.907105	-1.428681	-0.140069	-0.861755
6	-0.255619	-2.798589	-1.771533	-0.699877	0.927462
7	-0.173636	0.002846	0.688223	-0.879536	0.283627
8	-0.805367	-1.727669	-0.3909	0.573806	0.338589
9	-0.01183	2.392365	0.412912	0.978736	2.238143

Styler与 Jupyter Notebook 很好地交互。有关更多信息，请参阅文档。

增强功能

DatetimeIndex现在支持使用astype(str)进行字符串转换（GH 10442）
在 pandas.DataFrame.to_csv() 中支持压缩（gzip/bz2） (GH 7615)
pd.read_* 函数现在也可以接受 pathlib.Path 或 py:py._path.local.LocalPath 对象作为 filepath_or_buffer 参数。 (GH 11033) - DataFrame 和 Series 的函数 .to_csv()、.to_html() 和 .to_latex() 现在可以处理以波浪线开头的路径（例如 ~/Documents/） (GH 11438)
如果未提供列，DataFrame 现在将使用 namedtuple 的字段作为列 (GH 11181)
DataFrame.itertuples() 现在在可能的情况下返回 namedtuple 对象。 (GH 11269, GH 11625)
添加了 axvlines_kwds 来并行坐标绘图 (GH 10709)

选项 .info() 和 .memory_usage() 提供了内存消耗的深度检查。请注意，这可能计算量很大，因此是一个可选参数。 (GH 11595)

In [4]: df = pd.DataFrame({"A": ["foo"] * 1000})  # noqa: F821

In [5]: df["B"] = df["A"].astype("category")

# shows the '+' as we have object dtypes
In [6]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   A       1000 non-null   object 
 1   B       1000 non-null   category
dtypes: category(1), object(1)
memory usage: 9.0+ KB

# we have an accurate memory assessment (but can be expensive to compute this)
In [7]: df.info(memory_usage="deep")
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   A       1000 non-null   object 
 1   B       1000 non-null   category
dtypes: category(1), object(1)
memory usage: 59.9 KB

Index 现在具有 fillna 方法 (GH 10089)

In [8]: pd.Index([1, np.nan, 3]).fillna(2)
Out[8]: Index([1.0, 2.0, 3.0], dtype='float64')

类型为 category 的 Series 现在可以使用 .str.<...> 和 .dt.<...> 访问器方法/属性，如果类别是该类型的话。 (GH 10661)

In [9]: s = pd.Series(list("aabb")).astype("category")

In [10]: s
Out[10]: 
0    a
1    a
2    b
3    b
Length: 4, dtype: category
Categories (2, object): ['a', 'b']

In [11]: s.str.contains("a")
Out[11]: 
0     True
1     True
2    False
3    False
Length: 4, dtype: bool

In [12]: date = pd.Series(pd.date_range("1/1/2015", periods=5)).astype("category")

In [13]: date
Out[13]: 
0   2015-01-01
1   2015-01-02
2   2015-01-03
3   2015-01-04
4   2015-01-05
Length: 5, dtype: category
Categories (5, datetime64[ns]): [2015-01-01, 2015-01-02, 2015-01-03, 2015-01-04, 2015-01-05]

In [14]: date.dt.day
Out[14]: 
0    1
1    2
2    3
3    4
4    5
Length: 5, dtype: int32

pivot_table 现在有一个 margins_name 参数，因此您可以使用除了默认值‘All’之外的其他内容 (GH 3335)
实现了对具有固定 HDF5 存储的 datetime64[ns, tz] 类型的导出 (GH 11411)
漂亮地打印集合（例如在 DataFrame 单元格中）现在使用集合文本语法（{x, y}）而不是旧版 Python 语法（set([x, y])） (GH 11215)
改进 pandas.io.gbq.to_gbq() 中的错误消息，当流式插入失败时 (GH 11285) 以及当 DataFrame 不匹配目标表的模式时 (GH 11359)

API 变更

在不支持的索引类型中为 Index.shift 引发 NotImplementedError (GH 8038)
在 datetime64 和 timedelta64 类型的 Series 上进行 min 和 max 缩减现在会导致 NaT 而不是 nan (GH 11245).
使用空键进行索引将引发 TypeError，而不是 ValueError (GH 11356)
Series.ptp现在默认会忽略缺失值（GH 11163）

弃用项

实现google-analytics支持的pandas.io.ga模块已被弃用，并将在未来版本中被移除（GH 11308）
弃用了.to_csv()中的engine关键字，在未来版本中将被移除（GH 11274）### 弃用项
实现google-analytics支持的pandas.io.ga模块已被弃用，并将在未来版本中被移除（GH 11308）
弃用了.to_csv()中的engine关键字，在未来版本中将被移除（GH 11274）

性能改进

在索引排序前检查单调性（GH 11080）
当Series.dropna的 dtype 不能包含NaN时，性能有所提升（GH 11159）
在大多数日期时间字段操作（例如DatetimeIndex.year、Series.dt.year）、标准化以及Period、DatetimeIndex.to_period和PeriodIndex.to_timestamp之间的转换时释放了 GIL（GH 11263）
在某些滚动算法中释放了 GIL：rolling_median、rolling_mean、rolling_max、rolling_min、rolling_var、rolling_kurt、rolling_skew（GH 11450）
在read_csv、read_table中读取和解析文本文件时释放了 GIL（GH 11272）
改进了rolling_median的性能（GH 11450）
改进了to_excel的性能（GH 11352）
Categorical类别的repr性能 bug，会在显示前对字符串进行截取导致渲染错误（GH 11305）
Categorical.remove_unused_categories的性能有所提升（GH 11643）
改进了无数据和DatetimeIndex的Series构造函数的性能（GH 11433）
改进了shift、cumprod和cumsum在分组操作中的性能（GH 4095）

Bug 修复

SparseArray.__iter__()现在在 Python 3.5 中不会引发PendingDeprecationWarning警告（GH 11622）
从 0.16.2 版本开始的长浮点数/NaN 的输出格式的回归问题已修复（GH 11302）
Series.sort_index()现在能正确处理inplace选项（GH 11402）
在读取 csv 文件并传递 na_values=<a scalar> 时，在 PyPi 上的构建中错误地分布了 .c 文件，将显示异常 (GH 11374)
.to_latex() 输出在索引具有名称时出现问题（GH 10660）
使用超出最大未编码长度的字符串时，HDFStore.append 存在问题 (GH 11234)
合并 datetime64[ns, tz] 类型存在问题 (GH 11405)
在 where 子句中与 numpy 标量比较时，在 HDFStore.select 中存在问题 (GH 11283)
使用带有 MultiIndex 索引器的 DataFrame.ix 存在问题 (GH 11372)
date_range 存在带有模糊端点的问题 (GH 11626)
防止向访问器 .str, .dt 和 .cat 添加新属性。检索这样的值是不可能的，因此在设置时出错。 (GH 10673)
在存在模糊时间和.dt 访问器的 tz-转换中存在问题 (GH 11295)
输出格式存在问题，当使用模糊时间的索引时 (GH 11619)
Series 与类似列表的比较存在问题 (GH 11339)
在 DataFrame.replace 中存在问题，其中一个 datetime64[ns, tz] 和一个不兼容的 to_replace (GH 11326, GH 11153)
isnull 中存在问题，numpy.array 中的 numpy.datetime64('NaT') 未被确定为空（GH 11206）
使用混合整数索引进行列表样式索引时存在问题 (GH 11320)
当索引是 Categorical dtype 时，在 pivot_table 中存在问题 (GH 10993)
DataFrame.plot 中不能使用十六进制字符串颜色 (GH 10299)
从 0.16.2 开始的 DataFrame.drop_duplicates 回归，导致整数值的结果不正确 (GH 11376)
pd.eval 中存在问题，列表中的一元运算符出错 (GH 11235)
在零长度数组中存在问题 (GH 11230, GH 8999)
describe() 存在问题，对于具有分层索引的列名被删除（GH 11517）
DataFrame.pct_change() 中的 Bug 未在 .fillna 方法上传播 axis 关键字（GH 11150）
当将整数和字符串列名混合传递为 columns 参数时，.to_csv() 中存在 Bug（GH 11637）
使用 range 进行索引时存在 Bug（GH 11652）
在设置列时存在 numpy 标量推断和保留 dtype 的 Bug（GH 11638）
使用 unicode 列名的 to_sql 存在 Bug，会导致 UnicodeEncodeError（GH 11431）
在 plot 中设置 xticks 时修复的回归 Bug（GH 11529）
holiday.dates 中存在 Bug，无法将观察规则应用于假日和文档增强（GH 11477，GH 11533）
在具有普通 Axes 实例而不是 SubplotAxes 时修复绘图问题（GH 11520，GH 11556）
当 header=False 时，DataFrame.to_latex() 会生成额外的规则（GH 7124）
当 func 返回包含新日期时间列的 Series 时，df.groupby(...).apply(func) 中存在 Bug（GH 11324）
当要加载的文件很大时，pandas.json 中存在 Bug（GH 11344）
在具有重复列的 to_excel 中存在 Bug（GH 11007，GH 10982，GH 10970）
修复了阻止构建 dtype 为 datetime64[ns, tz] 的空 Series 的 Bug（GH 11245）
read_excel 中包含整数的 MultiIndex 存在 Bug（GH 11317）
使用 openpyxl 2.2+ 和合并时，to_excel 存在 Bug（GH 11408）
当数据中只有日期时间时，DataFrame.to_dict() 存在 Bug 会生成 np.datetime64 对象而不是 Timestamp（GH 11327）
在计算具有布尔和非布尔列的 DataFrame 的 Kendall 相关性时，DataFrame.corr() 存在 Bug 会引发异常（GH 11560）
在 FreeBSD 10+（使用 clang）上由 C inline 函数引起的链接时错误的 Bug（GH 10510）
在 DataFrame.to_csv 中传递用于格式化 MultiIndexes 的参数时存在 Bug，包括 date_format（GH 7791）
DataFrame.join() 中存在 Bug，使用 how='right' 会产生 TypeError（GH 11519）
当空列表结果为 Series.quantile 中存在的 Bug 时，会得到具有 object dtype 的 Index（GH 11588）
在合并结果为空时，pd.merge 中的 Bug 会导致空的 Int64Index 而不是 Index(dtype=object)（GH 11588）
当存在 NaN 值时，Categorical.remove_unused_categories 存在 Bug（GH 11599）
DataFrame.to_sparse() 中存在的 Bug 会导致多级索引的列名丢失（GH 11600）
DataFrame.round() 中存在的 Bug 会产生致命的 Python 错误（GH 11611）
DataFrame.round() 中存在的 Bug 会导致 decimals 是非唯一索引的 Series 产生额外的列（GH 11618）

贡献者

一共有 63 人为这个版本贡献了补丁。名字后面带有“+”符号的人是首次贡献补丁的人。

Aleksandr Drozd +
Alex Chase +
Anthonios Partheniou
BrenBarn +
Brian J. McGuirk +
Chris
Christian Berendt +
Christian Perez +
Cody Piersall +
数据与代码专家在数据上进行代码实验
DrIrv +
Evan Wright
Guillaume Gay
Hamed Saljooghinejad +
Iblis Lin +
Jake VanderPlas
Jan Schulz
Jean-Mathieu Deschenes +
Jeff Reback
Jimmy Callin +
Joris Van den Bossche
K.-Michael Aye
Ka Wo Chen
Loïc Séguin-C +
Luo Yicheng +
Magnus Jöud +
Manuel Leonhardt +
Matthew Gilbert
Maximilian Roos
Michael +
Nicholas Stahl +
Nicolas Bonnotte +
Pastafarianist +
Petra Chong +
Phil Schaf +
Philipp A +
Rob deCarvalho +
Roman Khomenko +
Rémy Léone +
Sebastian Bank +
Sinhrks
Stephan Hoyer
Thierry Moisan
Tom Augspurger
Tux1 +
Varun +
Wieland Hoffmann +
Winterflower
Yoav Ram +
Younggun Kim
Zeke +
ajcr
azuranski +
behzad nouri
cel4
emilydolson +
hironow +
lexual
llllllllll +
rockg
silentquasar +
sinhrks
taeold +

版本 0.17.0 (2015 年 10 月 9 日)

原文：pandas.pydata.org/docs/whatsnew/v0.17.0.html

这是从 0.16.2 开始的重要版本，并包括一小部分 API 更改，以及许多新功能、增强功能和性能改进，以及大量的错误修复。我们建议所有用户升级到此版本。

警告

pandas >= 0.17.0 将不再支持与 Python 版本 3.2 的兼容性 (GH 9118)

警告

pandas.io.data 包已被弃用，并将由 pandas-datareader package 替代。这将允许数据模块独立更新到您的 pandas 安装中。pandas-datareader v0.1.1 的 API 与 pandas v0.17.0 中的完全相同 (GH 8961, GH 10861)。

安装 pandas-datareader 后，您可以轻松更改导入方式：

from pandas.io import data, wb

变为

from pandas_datareader import data, wb

主要亮点包括：

在一些 cython 操作中释放全局解释器锁（GIL），参见这里
绘图方法现在作为 .plot 访问器的属性可用，参见这里
排序 API 已进行了重新设计，以消除一些长期存在的不一致性，参见这里
支持将带时区的 datetime64[ns] 作为一级 dtype，参见这里
to_datetime 的默认行为现在将是在遇到无法解析的格式时 raise，以前会返回原始输入。此外，日期解析函数现在返回一致的结果。参见这里
HDFStore 中 dropna 的默认值已更改为 False，默认情况下存储所有行，即使它们全部为 NaN，请参见这里
Datetime 访问器 (dt) 现在支持 Series.dt.strftime 以生成 datetime-likes 的格式化字符串，以及 Series.dt.total_seconds 以生成时间增量的每个持续时间的秒数。参见这里
Period 和 PeriodIndex 可以处理类似 3D 的乘频率，对应于 3 天的跨度。参见这里
安装版本的 pandas 现在将具有 PEP440 兼容的版本字符串 (GH 9518)
开发支持使用 Air Speed Velocity library 进行基准测试 (GH 8361)
支持读取 SAS xport 文件，参见这里
比较 SAS 与 pandas 的文档，请参见这里
自 0.8.0 起已废弃自动 TimeSeries 广播，参见这里
显示格式可以选择与 Unicode 东亚宽度对齐，参见这里
与 Python 3.5 兼容 (GH 11097)
与 matplotlib 1.5.0 兼容 (GH 11111)

在更新之前，请查看 API 更改和弃用。

v0.17.0 中的新功能

新功能
- 带时区的日期时间
- 释放 GIL
- 绘图子方法
- dt 访问器的其他方法
  - Series.dt.strftime
  - Series.dt.total_seconds
- 周期频率增强
- 支持 SAS XPORT 文件
- 在 .eval() 中支持数学函数
- 带有 MultiIndex 的 Excel 更改
- Google BigQuery 增强
- 与 Unicode 东亚宽度对齐的显示对齐](#display-alignment-with-unicode-east-asian-width)
- 其他增强
不兼容的 API 更改
- 排序 API 更改
- to_datetime 和 to_timedelta 的更改
  - 错误处理
  - 一致的解析
- 索引比较的更改
- 布尔比较与 None 的更改
- HDFStore dropna 行为
- display.precision 选项更改
- Categorical.unique 更改
- 解析器中传递为 header 的 bool 更改
- 其他 API 更改
- 弃用
- 删除之前版本的弃用/更改
性能改进
错误修复
贡献者

新功能

带时区的日期时间

我们正在添加一个原生支持带时区的日期时间的实现。以前，Series或DataFrame列可以分配带有时区的日期时间，并且将作为object dtype 工作。这会导致大量行的性能问题。有关更多详细信息，请参阅文档。(GH 8260, GH 10763, GH 11034).

新的实现允许在所有行中具有单一时区，并以高效的方式进行操作。

In [1]: df = pd.DataFrame(
 ...:    {
 ...:        "A": pd.date_range("20130101", periods=3),
 ...:        "B": pd.date_range("20130101", periods=3, tz="US/Eastern"),
 ...:        "C": pd.date_range("20130101", periods=3, tz="CET"),
 ...:    }
 ...: )
 ...: 

In [2]: df
Out[2]: 
 A                         B                         C
0 2013-01-01 2013-01-01 00:00:00-05:00 2013-01-01 00:00:00+01:00
1 2013-01-02 2013-01-02 00:00:00-05:00 2013-01-02 00:00:00+01:00
2 2013-01-03 2013-01-03 00:00:00-05:00 2013-01-03 00:00:00+01:00

[3 rows x 3 columns]

In [3]: df.dtypes
Out[3]: 
A                datetime64[ns]
B    datetime64[ns, US/Eastern]
C           datetime64[ns, CET]
Length: 3, dtype: object

In [4]: df.B
Out[4]: 
0   2013-01-01 00:00:00-05:00
1   2013-01-02 00:00:00-05:00
2   2013-01-03 00:00:00-05:00
Name: B, Length: 3, dtype: datetime64[ns, US/Eastern]

In [5]: df.B.dt.tz_localize(None)
Out[5]: 
0   2013-01-01
1   2013-01-02
2   2013-01-03
Name: B, Length: 3, dtype: datetime64[ns]

这也使用了一种新的数据类型表示，与其 numpy 表亲 datetime64[ns] 非常相似。

In [6]: df["B"].dtype
Out[6]: datetime64[ns, US/Eastern]

In [7]: type(df["B"].dtype)
Out[7]: pandas.core.dtypes.dtypes.DatetimeTZDtype

注意

由于数据类型的更改，底层 DatetimeIndex 的字符串表示有所不同，但在功能上它们是相同的。

先前的行为：

In [1]: pd.date_range('20130101', periods=3, tz='US/Eastern')
Out[1]: DatetimeIndex(['2013-01-01 00:00:00-05:00', '2013-01-02 00:00:00-05:00',
 '2013-01-03 00:00:00-05:00'],
 dtype='datetime64[ns]', freq='D', tz='US/Eastern')

In [2]: pd.date_range('20130101', periods=3, tz='US/Eastern').dtype
Out[2]: dtype('<M8[ns]')

新的行为：

In [8]: pd.date_range("20130101", periods=3, tz="US/Eastern")
Out[8]: 
DatetimeIndex(['2013-01-01 00:00:00-05:00', '2013-01-02 00:00:00-05:00',
 '2013-01-03 00:00:00-05:00'],
 dtype='datetime64[ns, US/Eastern]', freq='D')

In [9]: pd.date_range("20130101", periods=3, tz="US/Eastern").dtype
Out[9]: datetime64[ns, US/Eastern] 
```  ### 释放 GIL

我们正在对一些 Cython 操作释放全局解释器锁（GIL）。这将允许在计算过程中同时运行其他线程，从而可能通过多线程获得性能改进。特别是 `groupby`、`nsmallest`、`value_counts` 和一些索引操作会从中受益。([GH 8882](https://github.com/pandas-dev/pandas/issues/8882))

例如，以下代码中的 groupby 表达式在因子化步骤期间将释放 GIL，例如 `df.groupby('key')` 以及 `.sum()` 操作。

```py
N = 1000000
ngroups = 10
df = DataFrame(
    {"key": np.random.randint(0, ngroups, size=N), "data": np.random.randn(N)}
)
df.groupby("key")["data"].sum()

释放 GIL 可以使使用线程进行用户交互（例如 QT）或执行多线程计算的应用程序受益。一个可以处理这些类型并行计算的库的好例子是 dask 库。 ### 绘图子方法

Series 和 DataFrame 的 .plot() 方法允许通过提供 kind 关键字参数来自定义绘图类型。不幸的是，许多这类绘图使用不同的必需和可选关键字参数，这使得很难发现任何给定绘图种类使用了数十种可能参数中的哪些。

为了缓解这个问题，我们添加了一个新的、可选的绘图接口，将每种绘图作为 .plot 属性的方法暴露出来。现在，您不仅可以编写 series.plot(kind=<kind>, ...)，还可以使用 series.plot.<kind>(...)：

In [10]: df = pd.DataFrame(np.random.rand(10, 2), columns=['a', 'b'])

In [11]: df.plot.bar()

../_images/whatsnew_plot_submethods.png

由于这个变化，这些方法现在都可以通过制表完成来发现：

In [12]: df.plot.<TAB>  # noqa: E225, E999
df.plot.area     df.plot.barh     df.plot.density  df.plot.hist     df.plot.line     df.plot.scatter
df.plot.bar      df.plot.box      df.plot.hexbin   df.plot.kde      df.plot.pie

每个方法签名只包括相关的参数。目前，这些参数仅限于必需参数，但将来将包括可选参数。有关概述，请参阅新的绘图 API 文档。 ### dt 访问器的附加方法

Series.dt.strftime

我们现在支持使用 Series.dt.strftime 方法来为类似日期时间的对象生成格式化字符串（GH 10110）。示例：

# DatetimeIndex
In [13]: s = pd.Series(pd.date_range("20130101", periods=4))

In [14]: s
Out[14]: 
0   2013-01-01
1   2013-01-02
2   2013-01-03
3   2013-01-04
Length: 4, dtype: datetime64[ns]

In [15]: s.dt.strftime("%Y/%m/%d")
Out[15]: 
0    2013/01/01
1    2013/01/02
2    2013/01/03
3    2013/01/04
Length: 4, dtype: object

# PeriodIndex
In [16]: s = pd.Series(pd.period_range("20130101", periods=4))

In [17]: s
Out[17]: 
0    2013-01-01
1    2013-01-02
2    2013-01-03
3    2013-01-04
Length: 4, dtype: period[D]

In [18]: s.dt.strftime("%Y/%m/%d")
Out[18]: 
0    2013/01/01
1    2013/01/02
2    2013/01/03
3    2013/01/04
Length: 4, dtype: object

字符串格式与 python 标准库相同，详细信息可在这里找到

Series.dt.total_seconds

pd.Series 类型的 timedelta64 具有新方法 .dt.total_seconds()，返回时间增量的持续时间（秒）(GH 10817)

# TimedeltaIndex
In [19]: s = pd.Series(pd.timedelta_range("1 minutes", periods=4))

In [20]: s
Out[20]: 
0   0 days 00:01:00
1   1 days 00:01:00
2   2 days 00:01:00
3   3 days 00:01:00
Length: 4, dtype: timedelta64[ns]

In [21]: s.dt.total_seconds()
Out[21]: 
0        60.0
1     86460.0
2    172860.0
3    259260.0
Length: 4, dtype: float64 
```  ### 周期频率增强

`Period`, `PeriodIndex`和`period_range`现在可以接受乘以频率。另外，`Period.freq`和`PeriodIndex.freq`现在被存储为`DateOffset`实例，类似于`DatetimeIndex`，而不是`str` ([GH 7811](https://github.com/pandas-dev/pandas/issues/7811))

乘以频率表示相应长度的跨度。下面的示例创建了一个 3 天的期间。加法和减法将按照其跨度移动期间。

```py
In [22]: p = pd.Period("2015-08-01", freq="3D")

In [23]: p
Out[23]: Period('2015-08-01', '3D')

In [24]: p + 1
Out[24]: Period('2015-08-04', '3D')

In [25]: p - 2
Out[25]: Period('2015-07-26', '3D')

In [26]: p.to_timestamp()
Out[26]: Timestamp('2015-08-01 00:00:00')

In [27]: p.to_timestamp(how="E")
Out[27]: Timestamp('2015-08-03 23:59:59.999999999')

您可以在PeriodIndex和period_range中使用乘以频率。

In [28]: idx = pd.period_range("2015-08-01", periods=4, freq="2D")

In [29]: idx
Out[29]: PeriodIndex(['2015-08-01', '2015-08-03', '2015-08-05', '2015-08-07'], dtype='period[2D]')

In [30]: idx + 1
Out[30]: PeriodIndex(['2015-08-03', '2015-08-05', '2015-08-07', '2015-08-09'], dtype='period[2D]') 
```  ### 对 SAS XPORT 文件的支持

`read_sas()`现在提供对读取*SAS XPORT*格式文件的支持。 ([GH 4052](https://github.com/pandas-dev/pandas/issues/4052)).

```py
df = pd.read_sas("sas_xport.xpt")

还可以获得迭代器并逐步读取 XPORT 文件。

for df in pd.read_sas("sas_xport.xpt", chunksize=10000):
    do_something(df)

详细信息请参阅文档 ### 在.eval()中支持数学函数

eval() 现在支持调用数学函数 (GH 4893)

df = pd.DataFrame({"a": np.random.randn(10)})
df.eval("b = sin(a)")

支持的数学函数有sin, cos, exp, log, expm1, log1p, sqrt, sinh, cosh, tanh, arcsin, arccos, arctan, arccosh, arcsinh, arctanh, abs 和 arctan2.

这些函数映射到NumExpr引擎的内部函数。对于 Python 引擎，它们被映射到NumPy调用。

Excel 中的更改与`MultiIndex`

在版本 0.16.2 中，具有MultiIndex列的DataFrame无法通过to_excel写入 Excel。已经添加了这个功能 (GH 10564)，同时更新了read_excel，以便通过在header和index_col参数中指定哪些列/行组成MultiIndex，可以读取数据而不会丢失信息 (GH 4679)

有关详细信息，请参阅文档。

In [31]: df = pd.DataFrame(
 ....:    [[1, 2, 3, 4], [5, 6, 7, 8]],
 ....:    columns=pd.MultiIndex.from_product(
 ....:        [["foo", "bar"], ["a", "b"]], names=["col1", "col2"]
 ....:    ),
 ....:    index=pd.MultiIndex.from_product([["j"], ["l", "k"]], names=["i1", "i2"]),
 ....: )
 ....: 

In [32]: df
Out[32]: 
col1  foo    bar 
col2    a  b   a  b
i1 i2 
j  l    1  2   3  4
 k    5  6   7  8

[2 rows x 4 columns]

In [33]: df.to_excel("test.xlsx")

In [34]: df = pd.read_excel("test.xlsx", header=[0, 1], index_col=[0, 1])

In [35]: df
Out[35]: 
col1  foo    bar 
col2    a  b   a  b
i1 i2 
j  l    1  2   3  4
 k    5  6   7  8

[2 rows x 4 columns]

以前，在read_excel中必须指定has_index_names参数，如果序列化数据具有索引名称。对于版本 0.17.0，to_excel的输出格式已更改，使得这个关键字不再必要 - 更改如下所示。

旧

../_images/old-excel-index.png

新

../_images/new-excel-index.png

警告

保存在版本 0.16.2 或之前并且具有索引名称的 Excel 文件仍然可以被读取，但必须指定has_index_names参数为True。

Google BigQuery 增强

添加了使用pandas.io.gbq.to_gbq()函数自动创建表/数据集的能力，如果目标表/数据集不存在的话。 (GH 8325, GH 11121).
添加了通过 if_exists 参数调用 pandas.io.gbq.to_gbq() 函数时替换现有表和模式的能力。有关更多详细信息，请参阅文档（GH 8325）。
gbq 模块中的 InvalidColumnOrder 和 InvalidPageToken 将引发 ValueError 而不是 IOError。
generate_bq_schema() 函数现已过时，并将在将来的版本中删除（GH 11121）
gbq 模块现在将支持 Python 3（GH 11094）。 ### 显示与 Unicode 东亚宽度对齐

警告

启用此选项将影响打印 DataFrame 和 Series 的性能（大约慢 2 倍）。仅在实际需要时使用。

一些东亚国家使用 Unicode 字符，其宽度对应于 2 个字母。如果 DataFrame 或 Series 包含这些字符，则默认输出无法正确对齐。添加以下选项以精确处理这些字符。

display.unicode.east_asian_width：是否使用 Unicode 东亚宽度来计算显示文本宽度。（GH 2612）
display.unicode.ambiguous_as_wide：是否处理 Unicode 字符属于 Ambiguous 为 Wide。（GH 11102）

In [36]: df = pd.DataFrame({u"国籍": ["UK", u"日本"], u"名前": ["Alice", u"しのぶ"]})

In [37]: df
Out[37]: 
 国籍     名前
0  UK  Alice
1  日本    しのぶ

[2 rows x 2 columns]

In [38]: pd.set_option("display.unicode.east_asian_width", True)

In [39]: df
Out[39]: 
 国籍    名前
0    UK   Alice
1  日本  しのぶ

[2 rows x 2 columns]

更多详细信息，请参阅这里 ### 其他增强

支持 openpyxl >= 2.2。样式支持的 API 现在稳定（GH 10125）

merge 现在接受参数 indicator，它将向输出对象添加一个分类类型的列（默认称为 _merge），该列接受以下值（GH 8790）

观察来源	`_merge` 值
仅在 `'left'` 框架中的合并键	`left_only`
仅在 `'right'` 框架中的合并键	`right_only`
两个框架中的合并键	`both`

In [40]: df1 = pd.DataFrame({"col1": [0, 1], "col_left": ["a", "b"]})

In [41]: df2 = pd.DataFrame({"col1": [1, 2, 2], "col_right": [2, 2, 2]})

In [42]: pd.merge(df1, df2, on="col1", how="outer", indicator=True)
Out[42]: 
 col1 col_left  col_right      _merge
0     0        a        NaN   left_only
1     1        b        2.0        both
2     2      NaN        2.0  right_only
3     2      NaN        2.0  right_only

[4 rows x 4 columns]

更多信息，请参阅更新文档

pd.to_numeric 是一个新函数，用于将字符串强制转换为数字（可能会进行强制转换）（GH 11133）
如果它们未合并，则 pd.merge 现在将允许重复列名（GH 10639）。
pd.pivot 现在将允许将索引作为 None 传递（GH 3962）。

pd.concat 现在将使用提供的现有 Series 名称（如果提供）（GH 10698）。

In [43]: foo = pd.Series([1, 2], name="foo")

In [44]: bar = pd.Series([1, 2])

In [45]: baz = pd.Series([4, 5])

先前行为：

In [1]: pd.concat([foo, bar, baz], axis=1)
Out[1]:
 0  1  2
 0  1  1  4
 1  2  2  5

新行为：

In [46]: pd.concat([foo, bar, baz], axis=1)
Out[46]: 
 foo  0  1
0    1  1  4
1    2  2  5

[2 rows x 3 columns]

DataFrame 已增加 nlargest 和 nsmallest 方法（GH 10393）

添加了一个 limit_direction 关键字参数，与 limit 一起使用以使 interpolate 填充 NaN 值向前、向后或两者都填充（GH 9218，GH 10420，GH 11115）

In [47]: ser = pd.Series([np.nan, np.nan, 5, np.nan, np.nan, np.nan, 13])

In [48]: ser.interpolate(limit=1, limit_direction="both")
Out[48]: 
0     NaN
1     5.0
2     5.0
3     7.0
4     NaN
5    11.0
6    13.0
Length: 7, dtype: float64

添加了一个 DataFrame.round 方法，以将值四舍五入为可变小数位数（GH 10568）。

In [49]: df = pd.DataFrame(
 ....:    np.random.random([3, 3]),
 ....:    columns=["A", "B", "C"],
 ....:    index=["first", "second", "third"],
 ....: )
 ....: 

In [50]: df
Out[50]: 
 A         B         C
first   0.126970  0.966718  0.260476
second  0.897237  0.376750  0.336222
third   0.451376  0.840255  0.123102

[3 rows x 3 columns]

In [51]: df.round(2)
Out[51]: 
 A     B     C
first   0.13  0.97  0.26
second  0.90  0.38  0.34
third   0.45  0.84  0.12

[3 rows x 3 columns]

In [52]: df.round({"A": 0, "C": 2})
Out[52]: 
 A         B     C
first   0.0  0.966718  0.26
second  1.0  0.376750  0.34
third   0.0  0.840255  0.12

[3 rows x 3 columns]

drop_duplicates 和 duplicated 现在接受一个 keep 关键字来定位第一个、最后一个和所有重复项。 take_last 关键字已弃用，请参阅此处（GH 6511，GH 8505）

In [53]: s = pd.Series(["A", "B", "C", "A", "B", "D"])

In [54]: s.drop_duplicates()
Out[54]: 
0    A
1    B
2    C
5    D
Length: 4, dtype: object

In [55]: s.drop_duplicates(keep="last")
Out[55]: 
2    C
3    A
4    B
5    D
Length: 4, dtype: object

In [56]: s.drop_duplicates(keep=False)
Out[56]: 
2    C
5    D
Length: 2, dtype: object

Reindex 现在具有 tolerance 参数，允许更精细地控制重新索引时填充的限制（GH 10411）：

In [57]: df = pd.DataFrame({"x": range(5), "t": pd.date_range("2000-01-01", periods=5)})

In [58]: df.reindex([0.1, 1.9, 3.5], method="nearest", tolerance=0.2)
Out[58]: 
 x          t
0.1  0.0 2000-01-01
1.9  2.0 2000-01-03
3.5  NaN        NaT

[3 rows x 2 columns]

当用于 DatetimeIndex、TimedeltaIndex 或 PeriodIndex 时，如果可能的话，tolerance 将被强制转换为 Timedelta。这使您可以使用字符串指定容差：

In [59]: df = df.set_index("t")

In [60]: df.reindex(pd.to_datetime(["1999-12-31"]), method="nearest", tolerance="1 day")
Out[60]: 
 x
1999-12-31  0

[1 rows x 1 columns]

tolerance 也由底层的 Index.get_indexer 和 Index.get_loc 方法公开。

添加了在重新采样 TimeDeltaIndex 时使用 base 参数的功能（GH 10530）
DatetimeIndex 可以使用包含 NaT 的字符串进行实例化（GH 7599）
to_datetime 现在可以接受 yearfirst 关键字（GH 7599）
pandas.tseries.offsets 大于 Day 偏移量现在可以与 Series 一起使用进行加法/减法（GH 10699）。有关更多详细信息，请参阅文档。
pd.Timedelta.total_seconds() 现在返回纳秒精度的 Timedelta 时长（之前是微秒精度）（GH 10939）
PeriodIndex 现在支持与 np.ndarray 的算术运算（GH 10638）
支持对 Period 对象进行 pickling（GH 10439）
.as_blocks 现在将接受一个 copy 可选参数以返回数据的副本，默认为复制（与先前版本的行为没有变化）（GH 9607）
DataFrame.filter 的 regex 参数现在处理数字列名而不是引发 ValueError（GH 10384）。
通过 URL 读取 gzip 压缩文件，可以通过显式设置压缩参数或通过推断响应中的 HTTP Content-Encoding 头来实现（GH 8685）
使用 StringIO/BytesIO 在内存中写入 Excel 文件 (GH 7074)
在ExcelWriter中启用对列表和字典的序列化为字符串的功能（GH 8188)
SQL io 函数现在接受一个 SQLAlchemy 连接对象。(GH 7877)
pd.read_sql和to_sql现在可以接受数据库 URI 作为con参数（GH 10214)
read_sql_table现在允许从视图中读取数据 (GH 10750).
在使用table格式时，允许将复杂值写入HDFStores (GH 10447)
当 HDF 文件包含单个数据集时，允许在不指定键的情况下使用pd.read_hdf (GH 10443)
pd.read_stata现在可以读取 Stata 118 类型文件。 (GH 9882)
msgpack子模块已更新为 0.4.6，并保持向后兼容性 (GH 10581)
DataFrame.to_dict现在接受orient='index'关键字参数 (GH 10844).
如果传递的函数返回一个字典且reduce=True，DataFrame.apply将返回一个字典的 Series (GH 8735).
允许将kwargs传递给插值方法 (GH 10378).
在连接空的Dataframe对象的可迭代对象时，改进了错误消息 (GH 9157)
pd.read_csv现在可以逐步读取 bz2 压缩文件，并且 C 解析器可以从 AWS S3 读取 bz2 压缩文件 (GH 11070, GH 11072).
在pd.read_csv中，识别s3n://和s3a:// URL 作为 S3 文件存储的指示符 (GH 11070, GH 11071).
逐步从 AWS S3 读取 CSV 文件，而不是首先下载整个文件。（Python 2 中仍需要完整文件下载以读取压缩文件。） (GH 11070, GH 11073)
pd.read_csv现在能够推断从 AWS S3 存储读取的文件的压缩类型 (GH 11070, GH 11074). ## 不兼容的 API 更改

排序 API 的更改

排序 API 存在一些长期存在的不一致性。(GH 9816, GH 8239).

以下是 0.17.0 之前的 API 的摘要：

Series.sort 就地进行，而 DataFrame.sort 返回一个新对象。
Series.order 返回一个新对象
过去可以使用 Series/DataFrame.sort_index 通过传递 by 关键字来按值排序。
Series/DataFrame.sortlevel 仅对 MultiIndex 进行索引排序有效。

为了解决这些问题，我们已经重新设计了 API：

我们引入了一个新方法，DataFrame.sort_values()，它是 DataFrame.sort()，Series.sort() 和 Series.order() 的合并，用于处理值的排序。
现有的方法 Series.sort()，Series.order() 和 DataFrame.sort() 已被弃用，并将在将来的版本中删除。
DataFrame.sort_index() 的 by 参数已弃用，并将在将来的版本中删除。
现有方法 .sort_index() 将获得 level 关键字以启用级别排序。

现在我们有了两种不重叠的排序方法。一个 * 标记的项目将显示 FutureWarning。

要按值排序：

之前	替代
* `Series.order()`	`Series.sort_values()`
* `Series.sort()`	`Series.sort_values(inplace=True)`
* `DataFrame.sort(columns=...)`	`DataFrame.sort_values(by=...)`

要按索引排序：

之前	替代
`Series.sort_index()`	`Series.sort_index()`
`Series.sortlevel(level=...)`	`Series.sort_index(level=...`)
`DataFrame.sort_index()`	`DataFrame.sort_index()`
`DataFrame.sortlevel(level=...)`	`DataFrame.sort_index(level=...)`
* `DataFrame.sort()`	`DataFrame.sort_index()`

我们还已经弃用并更改了两个类似于 Series 的类中的相似方法，即 Index 和 Categorical。

之前	替代
* `Index.order()`	`Index.sort_values()`

| * Categorical.order() | Categorical.sort_values() | ### 对 to_datetime 和 to_timedelta 的更改

错误处理

pd.to_datetime 的错误处理默认已更改为 errors='raise'。在以前的版本中是 errors='ignore'。此外，已弃用了 coerce 参数，而采用了 errors='coerce'。这意味着无效的解析将引发异常，而不是像以前的版本那样返回原始输入。(GH 10636)

先前的行为：

In [2]: pd.to_datetime(['2009-07-31', 'asd'])
Out[2]: array(['2009-07-31', 'asd'], dtype=object)

新行为：

In [3]: pd.to_datetime(['2009-07-31', 'asd'])
ValueError: Unknown string format

当然，你也可以强制执行这一点。

In [61]: pd.to_datetime(["2009-07-31", "asd"], errors="coerce")
Out[61]: DatetimeIndex(['2009-07-31', 'NaT'], dtype='datetime64[ns]', freq=None)

要保持以前的行为，您可以使用 errors='ignore'：

In [4]: pd.to_datetime(["2009-07-31", "asd"], errors="ignore")
Out[4]: Index(['2009-07-31', 'asd'], dtype='object')

此外，pd.to_timedelta 已获得了 errors='raise'|'ignore'|'coerce' 的类似 API，而 coerce 关键字已被弃用，而采用了 errors='coerce'。

统一解析

to_datetime，Timestamp和DatetimeIndex的字符串解析已经统一。(GH 7599)

在 v0.17.0 之前，Timestamp和to_datetime可能会使用今天的日期不正确地解析仅包含年份的日期时间字符串，否则DatetimeIndex将使用年初。Timestamp和to_datetime可能会在某些DatetimeIndex可以解析的日期时间字符串中引发ValueError，例如季度字符串。

先前的行为：

In [1]: pd.Timestamp('2012Q2')
Traceback
 ...
ValueError: Unable to parse 2012Q2

# Results in today's date.
In [2]: pd.Timestamp('2014')
Out [2]: 2014-08-12 00:00:00

v0.17.0 可以解析如下。它也适用于DatetimeIndex。

新的行为：

In [62]: pd.Timestamp("2012Q2")
Out[62]: Timestamp('2012-04-01 00:00:00')

In [63]: pd.Timestamp("2014")
Out[63]: Timestamp('2014-01-01 00:00:00')

In [64]: pd.DatetimeIndex(["2012Q2", "2014"])
Out[64]: DatetimeIndex(['2012-04-01', '2014-01-01'], dtype='datetime64[ns]', freq=None)

注意

如果要基于今天的日期执行计算，请使用Timestamp.now()和pandas.tseries.offsets。

In [65]: import pandas.tseries.offsets as offsets

In [66]: pd.Timestamp.now()
Out[66]: Timestamp('2024-04-10 17:55:56.541543')

In [67]: pd.Timestamp.now() + offsets.DateOffset(years=1)
Out[67]: Timestamp('2025-04-10 17:55:56.542277')

索引比较的更改

在Index上的等于操作应该与Series类似（GH 9947, GH 10637）

从 v0.17.0 开始，比较不同长度的Index对象将引发ValueError。这是为了与Series的行为保持一致。

先前的行为：

In [2]: pd.Index([1, 2, 3]) == pd.Index([1, 4, 5])
Out[2]: array([ True, False, False], dtype=bool)

In [3]: pd.Index([1, 2, 3]) == pd.Index([2])
Out[3]: array([False,  True, False], dtype=bool)

In [4]: pd.Index([1, 2, 3]) == pd.Index([1, 2])
Out[4]: False

新的行为：

In [8]: pd.Index([1, 2, 3]) == pd.Index([1, 4, 5])
Out[8]: array([ True, False, False], dtype=bool)

In [9]: pd.Index([1, 2, 3]) == pd.Index([2])
ValueError: Lengths must match to compare

In [10]: pd.Index([1, 2, 3]) == pd.Index([1, 2])
ValueError: Lengths must match to compare

请注意，这与numpy的行为不同，其中比较可以进行广播：

In [68]: np.array([1, 2, 3]) == np.array([1])
Out[68]: array([ True, False, False])

或者如果无法进行广播，则可以返回 False：

In [11]: np.array([1, 2, 3]) == np.array([1, 2])
Out[11]: False

布尔比较与 None 的更改

Series与None的布尔比较现在等同于与np.nan比较，而不是引发TypeError。(GH 1079).

In [69]: s = pd.Series(range(3), dtype="float")

In [70]: s.iloc[1] = None

In [71]: s
Out[71]: 
0    0.0
1    NaN
2    2.0
Length: 3, dtype: float64

先前的行为：

In [5]: s == None
TypeError: Could not compare <type 'NoneType'> type with Series

新的行为：

In [72]: s == None
Out[72]: 
0    False
1    False
2    False
Length: 3, dtype: bool

通常你只想知道哪些值是空的。

In [73]: s.isnull()
Out[73]: 
0    False
1     True
2    False
Length: 3, dtype: bool

警告

通常情况下，您会想要使用isnull/notnull进行这些类型的比较，因为isnull/notnull告诉您哪些元素为空。必须注意nan不相等，但None相等。请注意 pandas/numpy 使用np.nan != np.nan的事实，并将None视为np.nan。

In [74]: None == None
Out[74]: True

In [75]: np.nan == np.nan
Out[75]: False

HDFStore dropna 行为

使用format='table'的 HDFStore 写函数的默认行为现在是保留所有缺失的行。先前的行为是删除所有缺失的行，保存索引。可以使用dropna=True选项复制先前的行为。(GH 9382)

先前的行为：

In [76]: df_with_missing = pd.DataFrame(
 ....:    {"col1": [0, np.nan, 2], "col2": [1, np.nan, np.nan]}
 ....: )
 ....: 

In [77]: df_with_missing
Out[77]: 
 col1  col2
0   0.0   1.0
1   NaN   NaN
2   2.0   NaN

[3 rows x 2 columns]

In [27]:
df_with_missing.to_hdf('file.h5',
 key='df_with_missing',
 format='table',
 mode='w')

In [28]: pd.read_hdf('file.h5', 'df_with_missing')

Out [28]:
 col1  col2
 0     0     1
 2     2   NaN

新的行为：

In [78]: df_with_missing.to_hdf("file.h5", key="df_with_missing", format="table", mode="w")

In [79]: pd.read_hdf("file.h5", "df_with_missing")
Out[79]: 
 col1  col2
0   0.0   1.0
1   NaN   NaN
2   2.0   NaN

[3 rows x 2 columns]

更多细节请参阅文档。 ### 更改display.precision选项

display.precision选项已经明确指的是小数位数（GH 10451）。

早期版本的 pandas 会将浮点数格式化为比display.precision中的值少一个小数位数。

In [1]: pd.set_option('display.precision', 2)

In [2]: pd.DataFrame({'x': [123.456789]})
Out[2]:
 x
0  123.5

如果将精度解释为“有效数字”，这对科学记数法有效，但对于具有标准格式的值则不适用。这也与 numpy 处理格式的方式不一致。

从现在开始，display.precision的值将直接控制小数点后的位数，用于常规格式和科学记数法，类似于 numpy 的precision打印选项的工作方式。

In [80]: pd.set_option("display.precision", 2)

In [81]: pd.DataFrame({"x": [123.456789]})
Out[81]: 
 x
0  123.46

[1 rows x 1 columns]

为保留与以前版本的输出行为一致，display.precision的默认值已从7降低到6。### Categorical.unique的更改

Categorical.unique现在返回具有唯一categories和codes的新Categoricals，而不是返回np.array（GH 10508）

无序类别：值和类别按出现顺序排序。
有序类别：值按出现顺序排序，类别保持现有顺序。

In [82]: cat = pd.Categorical(["C", "A", "B", "C"], categories=["A", "B", "C"], ordered=True)

In [83]: cat
Out[83]: 
['C', 'A', 'B', 'C']
Categories (3, object): ['A' < 'B' < 'C']

In [84]: cat.unique()
Out[84]: 
['C', 'A', 'B']
Categories (3, object): ['A' < 'B' < 'C']

In [85]: cat = pd.Categorical(["C", "A", "B", "C"], categories=["A", "B", "C"])

In [86]: cat
Out[86]: 
['C', 'A', 'B', 'C']
Categories (3, object): ['A', 'B', 'C']

In [87]: cat.unique()
Out[87]: 
['C', 'A', 'B']
Categories (3, object): ['A', 'B', 'C']

更改`parser`中传递的`header`为`bool`

在较早版本的 pandas 中，如果将bool传递给read_csv，read_excel或read_html的header参数，则会隐式转换为整数，导致False为header=0，True为header=1（GH 6113）

header的bool输入现在会引发TypeError

In [29]: df = pd.read_csv('data.csv', header=False)
TypeError: Passing a bool to header is invalid. Use header=None for no header or
header=int or list-like of ints to specify the row(s) making up the column names

其他 API 更改

使用subplots=True的线和 kde 图现在使用默认颜色，而不是全部黑色。指定color='k'以绘制所有线条为黑色（GH 9894）
在具有categoricaldtype 的 Series 上调用.value_counts()方法现在返回具有CategoricalIndex的 Series（GH 10704）
子类的元数据属性现在将被序列化（GH 10553）。
使用Categorical进行groupby遵循上述Categorical.unique的相同规则（GH 10508）
以complex64dtype 数组构造DataFrame以前意味着相应的列会自动提升为complex128dtype。pandas 现在将保留复杂数据的输入项大小（GH 10952）
一些数值缩减运算符将在包含字符串和数字的对象类型上返回ValueError，而不是TypeError（GH 11131）
将目前不支持的chunksize参数传递给read_excel或ExcelFile.parse现在会引发NotImplementedError（GH 8011）
允许将ExcelFile对象传递给read_excel（GH 11198)
如果self和输入的freq都为None，则DatetimeIndex.union不会推断freq（GH 11086）
NaT的方法现在要么引发ValueError，要么返回np.nan或NaT（GH 9513）

行为方法

返回np.nan weekday，isoweekday

返回NaT date，now，replace，to_datetime，today

返回np.datetime64('NaT') to_datetime64（未更改）

| 抛出 ValueError | 所有其他公共方法（不以下划线开头的名称） | ### 弃用
对于 Series，以下索引函数已弃用 (GH 10177)。

弃用的函数替代方法

.irow(i) .iloc[i] 或 .iat[i]

.iget(i) .iloc[i] 或 .iat[i]

.iget_value(i) .iloc[i] 或 .iat[i]
对于 DataFrame，以下索引函数已弃用 (GH 10177).

弃用的函数替代方法

.irow(i) .iloc[i]

.iget_value(i, j) .iloc[i, j] 或 .iat[i, j]

.icol(j) .iloc[:, j]

行为	方法
返回`np.nan`	`weekday`，`isoweekday`
返回`NaT`	`date`，`now`，`replace`，`to_datetime`，`today`
返回`np.datetime64('NaT')`	`to_datetime64`（未更改）

弃用的函数	替代方法
`.irow(i)`	`.iloc[i]` 或 `.iat[i]`
`.iget(i)`	`.iloc[i]` 或 `.iat[i]`
`.iget_value(i)`	`.iloc[i]` 或 `.iat[i]`

弃用的函数	替代方法
`.irow(i)`	`.iloc[i]`
`.iget_value(i, j)`	`.iloc[i, j]` 或 `.iat[i, j]`
`.icol(j)`	`.iloc[:, j]`

注意

这些索引函数自 0.11.0 起已在文档中弃用。

Categorical.name 已弃用以使 Categorical 更类似于 numpy.ndarray。使用 Series(cat, name="whatever") 替代 (GH 10482).
在 Categorical 的 categories 中设置缺失值（NaN）将发出警告 (GH 10748). 您仍然可以在 values 中有缺失值。
drop_duplicates 和 duplicated 的 take_last 关键字已弃用，建议使用 keep。(GH 6511, GH 8505)
Series.nsmallest 和 nlargest 的 take_last 关键字已弃用，建议使用 keep。(GH 10792)
DataFrame.combineAdd 和 DataFrame.combineMult 已弃用。可以通过使用 add 和 mul 方法轻松替换：DataFrame.add(other, fill_value=0) 和 DataFrame.mul(other, fill_value=1.) (GH 10735).
TimeSeries 弃用，建议使用 Series（请注意，自 0.13.0 起这是一个别名），(GH 10890)
SparsePanel 弃用，将在将来的版本中移除 (GH 11157).
Series.is_time_series 弃用，建议使用 Series.index.is_all_dates (GH 11135)
旧偏移（如 'A@JAN'）已弃用（请注意，自 0.8.0 起这是一个别名） (GH 10878)
WidePanel 弃用，建议使用 Panel，LongPanel 弃用，建议使用 DataFrame（请注意，自 < 0.11.0 起这些都是别名），(GH 10892)
DataFrame.convert_objects 已弃用，建议使用类型特定的函数 pd.to_datetime、pd.to_timestamp 和 pd.to_numeric（0.17.0 中新增） (GH 11133). ### 删除之前版本的弃用/更改
从 Series.order() 和 Series.sort() 中移除了 na_last 参数，改用 na_position。 (GH 5231)
从 .describe() 中移除了 percentile_width，改用 percentiles。 (GH 7088)
在 DataFrame.to_string() 中移除 colSpace 参数，改用 col_space，大约在 0.8.0 版本中。

移除自动时间序列广播功能 (GH 2304)

In [88]: np.random.seed(1234)

In [89]: df = pd.DataFrame(
 ....:    np.random.randn(5, 2),
 ....:    columns=list("AB"),
 ....:    index=pd.date_range("2013-01-01", periods=5),
 ....: )
 ....: 

In [90]: df
Out[90]: 
 A         B
2013-01-01  0.471435 -1.190976
2013-01-02  1.432707 -0.312652
2013-01-03 -0.720589  0.887163
2013-01-04  0.859588 -0.636524
2013-01-05  0.015696 -2.242685

[5 rows x 2 columns]

以前

In [3]: df + df.A
FutureWarning: TimeSeries broadcasting along DataFrame index by default is deprecated.
Please use DataFrame.<op> to explicitly broadcast arithmetic operations along the index

Out[3]:
 A         B
2013-01-01  0.942870 -0.719541
2013-01-02  2.865414  1.120055
2013-01-03 -1.441177  0.166574
2013-01-04  1.719177  0.223065
2013-01-05  0.031393 -2.226989

当前

In [91]: df.add(df.A, axis="index")
Out[91]: 
 A         B
2013-01-01  0.942870 -0.719541
2013-01-02  2.865414  1.120055
2013-01-03 -1.441177  0.166574
2013-01-04  1.719177  0.223065
2013-01-05  0.031393 -2.226989

[5 rows x 2 columns]

在 HDFStore.put/append 中移除 table 关键字，改用 format= (GH 4645)
在 read_excel/ExcelFile 中移除 kind 参数，因为它未被使用 (GH 4712)
在 pd.read_html 中移除 infer_type 关键字，因为它未被使用 (GH 4770, GH 7032)
在 Series.tshift/shift 中移除 offset 和 timeRule 关键字，改用 freq (GH 4853, GH 4864)
移除 pd.load/pd.save 的别名，改用 pd.to_pickle/pd.read_pickle (GH 3787) ## 性能提升
支持使用 Air Speed Velocity library 进行基准测试的开发支持 (GH 8361)
为备用 ExcelWriter 引擎和读取 Excel 文件添加了 vbench 基准测试 (GH 7171)
在 Categorical.value_counts 中提高了性能 (GH 10804)
在 SeriesGroupBy.nunique、SeriesGroupBy.value_counts 和 SeriesGroupby.transform 中提高了性能 (GH 10820, GH 11077)
在整数类型的数据中提高了 DataFrame.drop_duplicates 的性能 (GH 10917)
在宽框架中提高了 DataFrame.duplicated 的性能。 (GH 10161, GH 11180)
timedelta 字符串解析速度提高了 4 倍 (GH 6755, GH 10426)
timedelta64 和 datetime64 操作性能提高了 8 倍 (GH 6755)
显著提高了使用切片器对 MultiIndex 进行索引的性能 (GH 10287)
使用类似列表的输入时，iloc 的性能提高了 8 倍 (GH 10791)
对于 datetimelike/integer Series，Series.isin 的性能得到了改善（GH 10287）
当类别相同时，concat 的 20 倍改进（GH 10587）
当指定的格式字符串为 ISO8601 时，to_datetime 的性能得到了改善（GH 10178）
对于 float dtype，Series.value_counts 的 2 倍改进（GH 10821）
在日期组件没有 0 填充时，在 to_datetime 中启用 infer_datetime_format（GH 11142）
从嵌套字典构造 DataFrame 的回归问题（GH 11084）
对于带有 Series 或 DatetimeIndex 的 DateOffset 的加法/减法操作的性能改进（GH 10744，GH 11205）
因为溢出而导致 timedelta64[ns] 上 .mean() 计算不正确的错误（GH 9442）
在旧版本的 numpy 上的 .isin 中的错误（GH 11232）
DataFrame.to_html(index=False) 中的错误，渲染了不必要的 name 行（GH 10344）
DataFrame.to_latex() 中 column_format 参数无法传递的错误（GH 9402）
在使用 NaT 进行本地化时的 DatetimeIndex 中的错误（GH 10477）
在保留元数据方面，Series.dt 操作中的错误（GH 10477）
在传递给 to_datetime 的无效构造中保留 NaT 的错误（GH 10477）
当函数返回分类系列时，在 DataFrame.apply 中的错误（GH 9573）
通过提供无效日期和格式给 to_datetime 导致的错误（GH 10154）
在删除名称时，在 Index.drop_duplicates 中的错误（GH 10115）
删除名称时，在 Series.quantile 中的错误（GH 10881）
在空 Series 上设置值时，pd.Series 中的错误，其索引具有频率。（GH 10193）
在无效的 order 关键字值下的 pd.Series.interpolate 中的错误（GH 10633）
当颜色名称由多个字符指定时，在 DataFrame.plot 中引发 ValueError 的错误（GH 10387）
使用元组混合列表进行Index构造时的 bug (GH 10697)
DataFrame.reset_index中的 bug，当索引包含NaT时。 (GH 10388)
当工作表为空时，ExcelReader中的 bug (GH 6403)
BinGrouper.group_info中的 bug，返回的值与基类不兼容 (GH 10914)
清除DataFrame.pop缓存时的错误和随后的原地操作中的一个 bug (GH 10912)
使用混合整数Index进行索引时导致ImportError的 bug (GH 10610)
Series.count中的 bug，当索引有空值时 (GH 10946)
在非常规频率DatetimeIndex的 pickling 中的 bug (GH 11002)
导致DataFrame.where在框架具有对称形状时不遵守axis参数的 bug。 (GH 9736)
Table.select_column中的 bug，名称未被保留 (GH 10392)
offsets.generate_range中的 bug，其中start和end的精度比offset更精细 (GH 9907)
pd.rolling_*中的 bug，导致输出中Series.name丢失 (GH 10565)
stack中的 bug，当索引或列不唯一时。 (GH 10417)
设置Panel时出现 bug，当一个轴具有 MultiIndex 时 (GH 10360)
USFederalHolidayCalendar中的 bug，导致USMemorialDay和USMartinLutherKingJr不正确 (GH 10278 和 GH 9760)
.sample()中的 bug，如果设置了返回对象，则会出现不必要的SettingWithCopyWarning (GH 10738)
.sample()中的 bug，导致作为Series传递的权重在被处理之前未沿轴对齐，如果权重索引与抽样对象不对齐，可能会导致问题。 (GH 10738)
修复的回归问题 (GH 9311, GH 6620, GH 9345)，其中带有日期时间的 groupby 转换为带有某些聚合器的浮点数 (GH 10979)
DataFrame.interpolate中的 bug，带有axis=1和inplace=True (GH 10395)
在指定多列作为主键时，io.sql.get_schema 中存在 Bug (GH 10385).
在 datetime-like Categorical 中使用 groupby(sort=False) 会引发 ValueError (GH 10505)
在 groupby(axis=1) 中使用 filter() 会抛出 IndexError (GH 11041)
在大端序构建上的 test_categorical 存在 Bug (GH 10425)
在不支持分类数据的情况下，Series.shift 和 DataFrame.shift 中存在 Bug (GH 9416)
使用分类 Series 的 Series.map 会引发 AttributeError (GH 10324)
包括 Categorical 的 MultiIndex.get_level_values 会引发 AttributeError (GH 10460)
在 pd.get_dummies 中，sparse=True 不会返回 SparseDataFrame (GH 10531)
在 Index 子类型（如 PeriodIndex）的 .drop 和 .insert 方法中未返回其自身类型的 Bug (GH 10620)
在 algos.outer_join_indexer 中，当 right 数组为空时存在 Bug (GH 10618)
在多个键分组时，filter（从 0.16.0 开始的回归）和 transform 存在 Bug，其中一个键是类似于日期时间的键 (GH 10114)
to_datetime 和 to_timedelta 中的 Bug 导致 Index 名称丢失 (GH 10875)
在 len(DataFrame.groupby) 中存在 Bug，当存在一个仅包含 NaN 的列时，会引发 IndexError (GH 11016)
当对空 Series 进行重新采样时导致 segfault 的 Bug (GH 10228)
在 DatetimeIndex 和 PeriodIndex.value_counts 中存在 Bug，会重置结果的名称，但在结果的 Index 中保留。 (GH 10150)
在使用 numexpr 引擎的 pd.eval 中，将 1 个元素的 numpy 数组强制转换为标量 (GH 10546)
在 axis=0 时，pd.concat 中存在 Bug，当列的 dtype 为 category 时 (GH 10177)
在 read_msgpack 中存在 Bug，输入类型并非始终被检查 (GH 10369, GH 10630)
在使用 kwargs index_col=False、index_col=['a', 'b'] 或 dtype 时，pd.read_csv 中存在 Bug (GH 10413, GH 10467, GH 10577)
使用header关键字参数的Series.from_csv中的错误未设置Series.name或Series.index.name（GH 10483）
对于小浮点值，导致方差不准确的groupby.var中的错误（GH 10448）
Series.plot(kind='hist')中的 Y 标签不具有信息性的错误（GH 10485）
使用转换器生成uint8类型时的read_csv中的错误（GH 9266）
时间序列线图和区域图中存在内存泄漏的错误（GH 9003）
当右侧为DataFrame时设置沿主轴或次要轴切片的Panel中的错误（GH 11014）
当未实现Panel的操作函数（例如.add）时，返回None并且不引发NotImplementedError的错误（GH 7692）
当subplots=True时，line和kde图中无法接受多个颜色的错误（GH 9894）
当颜色名称由多个字符指定时，DataFrame.plot引发ValueError的错误（GH 10387）
具有MultiIndex的左对齐和右对齐的Series中的错误可能被倒置（GH 10665）
具有MultiIndex的左连接和右连接可能被倒置的错误（GH 10741）
在设置了columns不同顺序的文件时，read_stata中读取文件的错误（GH 10757）
当类别包含tz或Period时，Categorical中可能不正确地表示的错误（GH 10713）
Categorical.__iter__中可能不会返回正确的datetime和Period的错误（GH 10713）
在具有PeriodIndex的对象上进行索引时的错误（GH 4125）
使用engine='c'的read_csv中的错误：EOF 之前有注释、空行等时没有正确处理（GH 10728，GH 10548）
通过DataReader读取“famafrench”数据导致 HTTP 404 错误，因为网站 URL 已更改（GH 10591）。
read_msgpack中解码的 DataFrame 具有重复列名的错误（GH 9618）
io.common.get_filepath_or_buffer中的错误导致读取有效的 S3 文件失败，如果存储桶还包含用户没有读取权限的键（GH 10604）
Bug 在使用 python datetime.date 和 numpy datetime64 进行时间戳列的矢量化设置时 (GH 10408，GH 10412)
Bug in Index.take 可能会添加不必要的 freq 属性 (GH 10791)
Bug in merge 空 DataFrame 可能会引发 IndexError (GH 10824)
Bug in to_latex，某些已记录参数的意外关键字参数 (GH 10888)
Bug 在大型 DataFrame 的索引中，未捕获 IndexError (GH 10645 和 GH 10692)
Bug in read_csv 使用 nrows 或 chunksize 参数时，如果文件仅包含一个标题行 (GH 9535)
Bug 在存在备用编码时 HDF5 中序列化 category 类型 (GH 10366)
Bug in pd.DataFrame 构造空 DataFrame 时带有字符串 dtype (GH 9428)
Bug in pd.DataFrame.diff 当 DataFrame 未经合并时 (GH 10907)
Bug in pd.unique 对于具有 datetime64 或 timedelta64 dtype 的数组，返回对象 dtype 的数组，而不是原始 dtype (GH 9431)
Bug in Timedelta 在从 0s 切片时引发错误 (GH 10583)
Bug in DatetimeIndex.take 和 TimedeltaIndex.take 对无效索引可能不会引发 IndexError (GH 10295)
Bug in Series([np.nan]).astype('M8[ms]')，现在返回 Series([pd.NaT]) (GH 10747)
Bug in PeriodIndex.order 重置频率 (GH 10295)
Bug in date_range 当 freq 将 end 分成纳秒时 (GH 10885)
Bug in iloc 允许使用负整数访问 Series 超出边界的内存（GH 10779）
Bug in read_msgpack 中未尊重编码 (GH 10581)
Bug 阻止使用包含适当负整数的列表时使用 iloc 访问第一个索引 (GH 10547，GH 10779)
Bug in TimedeltaIndex 格式化器在尝试使用 to_csv 保存具有 TimedeltaIndex 的 DataFrame 时引发错误 (GH 10833)
当处理 Series 切片时，DataFrame.where 方法存在 bug（GH 10218，GH 9558）
当 Bigquery 返回零行时，pd.read_gbq 抛出 ValueError 的错误（GH 10273）
在序列化 0 级 ndarray 时，to_json 存在导致段错误的 bug（GH 9576）
在 GridSpec 上绘制时，绘图函数可能会引发 IndexError 的 bug（GH 10819）
在绘图结果中，可能会显示不必要的次要刻度标签的 bug（GH 10657）
在带有 NaT 的 DataFrame 上进行聚合（例如 first、last、min）时，groupby 存在错误的计算。（GH 10590，GH 11010）
当传递一个仅包含标量值的字典并指定列时，构建 DataFrame 时可能不会引发错误（GH 10856）
当高度相似的值进行 .var() 计算时，可能会出现舍入错误（GH 10242）
在 DataFrame.plot(subplots=True) 中存在重复列时，输出结果错误（GH 10962）
当进行索引运算时，Index 类可能会导致错误的类别（GH 10638）
当频率为负数时，date_range 方法在年度、季度和月度情况下结果为空（GH 11018）
DatetimeIndex 无法推断负频率的 bug（GH 11018）
移除了一些已弃用的 numpy 比较操作，主要在测试中。（GH 10569）
当 Index 数据类型未正确应用时存在 bug（GH 11017）
在测试最小 Google API 客户端版本时，io.gbq 存在 bug（GH 10652）
当从嵌套的 dict 构建 DataFrame 时，可能会存在 bug，其中包含 timedelta 键（GH 11129）
当数据包含 datetime 数据类型时，.fillna 方法可能会引发 TypeError 的错误（GH 7095，GH 11153）
当要分组的键数与索引长度相同时，在 .groupby 中存在 bug（GH 11185）
当全部为 null 并且 coerce 参数为真时，convert_objects 方法可能不会返回转换后的值（GH 9589）
当未遵守 copy 参数时，convert_objects 方法可能不会尊重 copy 关键字（GH 9589）

总共有 112 人为此版本贡献了补丁。名字后面带有“+”的人第一次贡献了补丁。

Alex Rothberg
Andrea Bedini +
Andrew Rosenfeld
Andy Hayden
Andy Li +
Anthonios Partheniou +
Artemy Kolchinsky
Bernard Willers
Charlie Clark +
Chris +
Chris Whelan
Christoph Gohlke +
Christopher Whelan
Clark Fitzgerald
Clearfield Christopher +
Dan Ringwalt +
Daniel Ni +
数据和代码专家在数据上尝试代码的实验 +
David Cottrell
David John Gagne +
David Kelly +
ETF +
Eduardo Schettino +
Egor +
Egor Panfilov +
Evan Wright
Frank Pinter +
Gabriel Araujo +
Garrett-R
Gianluca Rossi +
Guillaume Gay
Guillaume Poulin
Harsh Nisar +
Ian Henriksen +
Ian Hoegen +
Jaidev Deshpande +
Jan Rudolph +
Jan Schulz
Jason Swails +
Jeff Reback
Jonas Buyl +
Joris Van den Bossche
Joris Vankerschaver +
Josh Levy-Kramer +
Julien Danjou
Ka Wo Chen
Karrie Kehoe +
Kelsey Jordahl
Kerby Shedden
Kevin Sheppard
Lars Buitinck
Leif Johnson +
Luis Ortiz +
Mac +
Matt Gambogi +
Matt Savoie +
Matthew Gilbert +
Maximilian Roos +
Michelangelo D’Agostino +
Mortada Mehyar
Nick Eubank
Nipun Batra
Ondřej Čertík
Phillip Cloud
Pratap Vardhan +
Rafal Skolasinski +
Richard Lewis +
Rinoc Johnson +
Rob Levy
Robert Gieseke
Safia Abdalla +
Samuel Denny +
Saumitra Shahapure +
Sebastian Pölsterl +
Sebastian Rubbert +
Sheppard, Kevin +
Sinhrks
Siu Kwan Lam +
Skipper Seabold
Spencer Carrucciu +
Stephan Hoyer
Stephen Hoover +
Stephen Pascoe +
Terry Santegoeds +
Thomas Grainger
Tjerk Santegoeds +
Tom Augspurger
Vincent Davis +
冬花 +
Yaroslav Halchenko
Tang Yuan（特里） +
agijsberts
ajcr +
behzad nouri
cel4
chris-b1 +
cyrusmaher +
davidovitch +
ganego +
jreback
juricast +
larvian +
maximilianr +
msund +
rekcahpassyla
robertzk +
scls19fr
seth-p
sinhrks
springcoil +
terrytangyuan +
tzinckgraf + ## 新功能

带时区的 Datetime

我们正在添加一个本地支持带时区的 datetime 的实现。 Series或DataFrame列以前可以被分配具有时区的 datetime，并且将作为objectdtype 工作。这对于大量行存在性能问题。有关更多详细信息，请参见文档。（GH 8260，GH 10763，GH 11034）。

新实现允许在所有行中具有单一时区，并以高效的方式进行操作。

In [1]: df = pd.DataFrame(
 ...:    {
 ...:        "A": pd.date_range("20130101", periods=3),
 ...:        "B": pd.date_range("20130101", periods=3, tz="US/Eastern"),
 ...:        "C": pd.date_range("20130101", periods=3, tz="CET"),
 ...:    }
 ...: )
 ...: 

In [2]: df
Out[2]: 
 A                         B                         C
0 2013-01-01 2013-01-01 00:00:00-05:00 2013-01-01 00:00:00+01:00
1 2013-01-02 2013-01-02 00:00:00-05:00 2013-01-02 00:00:00+01:00
2 2013-01-03 2013-01-03 00:00:00-05:00 2013-01-03 00:00:00+01:00

[3 rows x 3 columns]

In [3]: df.dtypes
Out[3]: 
A                datetime64[ns]
B    datetime64[ns, US/Eastern]
C           datetime64[ns, CET]
Length: 3, dtype: object

In [4]: df.B
Out[4]: 
0   2013-01-01 00:00:00-05:00
1   2013-01-02 00:00:00-05:00
2   2013-01-03 00:00:00-05:00
Name: B, Length: 3, dtype: datetime64[ns, US/Eastern]

In [5]: df.B.dt.tz_localize(None)
Out[5]: 
0   2013-01-01
1   2013-01-02
2   2013-01-03
Name: B, Length: 3, dtype: datetime64[ns]

这还使用了一种新的 dtype 表示法，与其 numpy 近亲datetime64[ns]非常相似。

In [6]: df["B"].dtype
Out[6]: datetime64[ns, US/Eastern]

In [7]: type(df["B"].dtype)
Out[7]: pandas.core.dtypes.dtypes.DatetimeTZDtype

注意

由于 dtype 的更改，底层的DatetimeIndex有一个略有不同的字符串表示形式，但从功能上讲，它们是相同的。

以前的行为：

In [1]: pd.date_range('20130101', periods=3, tz='US/Eastern')
Out[1]: DatetimeIndex(['2013-01-01 00:00:00-05:00', '2013-01-02 00:00:00-05:00',
 '2013-01-03 00:00:00-05:00'],
 dtype='datetime64[ns]', freq='D', tz='US/Eastern')

In [2]: pd.date_range('20130101', periods=3, tz='US/Eastern').dtype
Out[2]: dtype('<M8[ns]')

新行为：

In [8]: pd.date_range("20130101", periods=3, tz="US/Eastern")
Out[8]: 
DatetimeIndex(['2013-01-01 00:00:00-05:00', '2013-01-02 00:00:00-05:00',
 '2013-01-03 00:00:00-05:00'],
 dtype='datetime64[ns, US/Eastern]', freq='D')

In [9]: pd.date_range("20130101", periods=3, tz="US/Eastern").dtype
Out[9]: datetime64[ns, US/Eastern] 
```  ### 释放 GIL

我们正在释放一些 Cython 操作的全局解释器锁（GIL）。这将允许在计算期间同时运行其他线程，可能允许来自多线程的性能改进。特别是 `groupby`、`nsmallest`、`value_counts` 和一些索引操作受益于此。 ([GH 8882](https://github.com/pandas-dev/pandas/issues/8882))

例如，下面代码中的 groupby 表达式在因子化步骤（例如 `df.groupby('key')`）以及 `.sum()` 操作期间都会释放 GIL。

```py
N = 1000000
ngroups = 10
df = DataFrame(
    {"key": np.random.randint(0, ngroups, size=N), "data": np.random.randn(N)}
)
df.groupby("key")["data"].sum()

释放 GIL 可以使使用线程进行用户交互（例如 QT）或执行多线程计算的应用程序受益。一个可以处理这些并行计算的库的良好示例是 dask 库。 ### 绘图子方法

Series 和 DataFrame 的 .plot() 方法允许通过提供 kind 关键字参数来自定义绘图类型。不幸的是，许多这些类型的绘图使用不同的必需和可选关键字参数，这使得难以发现在几十种可能的参数中任何给定绘图类型使用了什么。

为了缓解这个问题，我们添加了一个新的可选绘图接口，它将每种绘图类型都作为 .plot 属性的一个方法暴露出来。现在，你可以使用 series.plot.<kind>(...)，而不仅仅是写 series.plot(kind=<kind>, ...)：

In [10]: df = pd.DataFrame(np.random.rand(10, 2), columns=['a', 'b'])

In [11]: df.plot.bar()

../_images/whatsnew_plot_submethods.png

由于这个改变，这些方法现在都可以通过制表符补全发现：

In [12]: df.plot.<TAB>  # noqa: E225, E999
df.plot.area     df.plot.barh     df.plot.density  df.plot.hist     df.plot.line     df.plot.scatter
df.plot.bar      df.plot.box      df.plot.hexbin   df.plot.kde      df.plot.pie

每个方法签名只包括相关的参数。目前，这些限于必需参数，但将来这些将包括可选参数。有关概述，请参阅新的绘图 API 文档。 ### dt 访问器的其他方法

`Series.dt.strftime`

现在我们支持 Series.dt.strftime 方法来为类似日期时间的对象生成格式化的字符串 (GH 10110)。示例：

# DatetimeIndex
In [13]: s = pd.Series(pd.date_range("20130101", periods=4))

In [14]: s
Out[14]: 
0   2013-01-01
1   2013-01-02
2   2013-01-03
3   2013-01-04
Length: 4, dtype: datetime64[ns]

In [15]: s.dt.strftime("%Y/%m/%d")
Out[15]: 
0    2013/01/01
1    2013/01/02
2    2013/01/03
3    2013/01/04
Length: 4, dtype: object

# PeriodIndex
In [16]: s = pd.Series(pd.period_range("20130101", periods=4))

In [17]: s
Out[17]: 
0    2013-01-01
1    2013-01-02
2    2013-01-03
3    2013-01-04
Length: 4, dtype: period[D]

In [18]: s.dt.strftime("%Y/%m/%d")
Out[18]: 
0    2013/01/01
1    2013/01/02
2    2013/01/03
3    2013/01/04
Length: 4, dtype: object

字符串格式与 Python 标准库相同，详细信息可在此处找到

`Series.dt.total_seconds`

类型为 timedelta64 的 pd.Series 现在有一个新方法 .dt.total_seconds()，返回时间差的持续时间（以秒为单位） (GH 10817)

# TimedeltaIndex
In [19]: s = pd.Series(pd.timedelta_range("1 minutes", periods=4))

In [20]: s
Out[20]: 
0   0 days 00:01:00
1   1 days 00:01:00
2   2 days 00:01:00
3   3 days 00:01:00
Length: 4, dtype: timedelta64[ns]

In [21]: s.dt.total_seconds()
Out[21]: 
0        60.0
1     86460.0
2    172860.0
3    259260.0
Length: 4, dtype: float64 
```  ### 周期频率增强

`Period`、`PeriodIndex` 和 `period_range` 现在可以接受乘法频率。此外，`Period.freq` 和 `PeriodIndex.freq` 现在存储为 `DateOffset` 实例，就像 `DatetimeIndex` 一样，而不是作为 `str` ([GH 7811](https://github.com/pandas-dev/pandas/issues/7811))

乘以频率表示相应长度的跨度。下面的例子创建了一个 3 天的周期。加法和减法将周期移动到其跨度。

```py
In [22]: p = pd.Period("2015-08-01", freq="3D")

In [23]: p
Out[23]: Period('2015-08-01', '3D')

In [24]: p + 1
Out[24]: Period('2015-08-04', '3D')

In [25]: p - 2
Out[25]: Period('2015-07-26', '3D')

In [26]: p.to_timestamp()
Out[26]: Timestamp('2015-08-01 00:00:00')

In [27]: p.to_timestamp(how="E")
Out[27]: Timestamp('2015-08-03 23:59:59.999999999')

你可以在PeriodIndex和period_range中使用乘以频率的值。

In [28]: idx = pd.period_range("2015-08-01", periods=4, freq="2D")

In [29]: idx
Out[29]: PeriodIndex(['2015-08-01', '2015-08-03', '2015-08-05', '2015-08-07'], dtype='period[2D]')

In [30]: idx + 1
Out[30]: PeriodIndex(['2015-08-03', '2015-08-05', '2015-08-07', '2015-08-09'], dtype='period[2D]') 
```  ### 支持 SAS XPORT 文件

`read_sas()` 提供对读取 *SAS XPORT* 格式文件的支持。([GH 4052](https://github.com/pandas-dev/pandas/issues/4052))。

```py
df = pd.read_sas("sas_xport.xpt")

还可以获取迭代器并逐步读取 XPORT 文件。

for df in pd.read_sas("sas_xport.xpt", chunksize=10000):
    do_something(df)

详细信息请参阅文档。 ### 在 .eval() 中支持数学函数

eval() 现在支持调用数学函数 (GH 4893)。

df = pd.DataFrame({"a": np.random.randn(10)})
df.eval("b = sin(a)")

支持的数学函数有sin、cos、exp、log、expm1、log1p、sqrt、sinh、cosh、tanh、arcsin、arccos、arctan、arccosh、arcsinh、arctanh、abs 和 arctan2。

这些函数映射到NumExpr引擎的内在函数。对于 Python 引擎，它们被映射到NumPy调用。

对 Excel 的更改与`MultiIndex`

在版本 0.16.2 中，带有MultiIndex列的DataFrame无法通过to_excel写入到 Excel。这个功能已经被添加（GH 10564），同时更新了read_excel，以便通过在header和index_col参数中指定哪些列/行组成MultiIndex来读取数据，以保证信息不丢失（GH 4679）。

详细信息请参阅文档。

In [31]: df = pd.DataFrame(
 ....:    [[1, 2, 3, 4], [5, 6, 7, 8]],
 ....:    columns=pd.MultiIndex.from_product(
 ....:        [["foo", "bar"], ["a", "b"]], names=["col1", "col2"]
 ....:    ),
 ....:    index=pd.MultiIndex.from_product([["j"], ["l", "k"]], names=["i1", "i2"]),
 ....: )
 ....: 

In [32]: df
Out[32]: 
col1  foo    bar 
col2    a  b   a  b
i1 i2 
j  l    1  2   3  4
 k    5  6   7  8

[2 rows x 4 columns]

In [33]: df.to_excel("test.xlsx")

In [34]: df = pd.read_excel("test.xlsx", header=[0, 1], index_col=[0, 1])

In [35]: df
Out[35]: 
col1  foo    bar 
col2    a  b   a  b
i1 i2 
j  l    1  2   3  4
 k    5  6   7  8

[2 rows x 4 columns]

以前，在read_excel中必须指定has_index_names参数，如果序列化数据有索引名称的话。对于版本 0.17.0，to_excel的输出格式已更改，使得这个关键字不再必要 - 更改如下所示。

旧

../_images/old-excel-index.png

新

../_images/new-excel-index.png

警告

版本 0.16.2 或之前保存的带有索引名称的 Excel 文件仍然可以被读取，但必须将has_index_names参数指定为True。

Google BigQuery 增强

添加了自动创建表格/数据集的能力，使用pandas.io.gbq.to_gbq()函数，如果目标表格/数据集不存在的话。(GH 8325, GH 11121)。
添加了在调用pandas.io.gbq.to_gbq()函数时替换现有表格和模式的能力，通过if_exists参数。更多详细信息请参阅文档（GH 8325）。
在 gbq 模块中，InvalidColumnOrder 和 InvalidPageToken 将引发ValueError而不是IOError。
generate_bq_schema() 函数现在已被弃用，并将在未来版本中移除（GH 11121）。
gbq 模块现在支持 Python 3（GH 11094）。 ### 使用 Unicode 东亚宽度进行显示对齐

警告

启用此选项将影响打印 DataFrame 和 Series 的性能（大约慢 2 倍）。仅在实际需要时使用。

一些东亚国家使用 Unicode 字符，其宽度对应于 2 个字母。如果 DataFrame 或 Series 包含这些字符，则默认输出无法正确对齐。添加以下选项以实现对这些字符的精确处理。

display.unicode.east_asian_width：是否使用 Unicode 东亚宽度来计算显示文本宽度（GH 2612）。
display.unicode.ambiguous_as_wide：是否处理属于 Ambiguous as Wide 的 Unicode 字符（GH 11102）。

In [36]: df = pd.DataFrame({u"国籍": ["UK", u"日本"], u"名前": ["Alice", u"しのぶ"]})

In [37]: df
Out[37]: 
 国籍     名前
0  UK  Alice
1  日本    しのぶ

[2 rows x 2 columns]

In [38]: pd.set_option("display.unicode.east_asian_width", True)

In [39]: df
Out[39]: 
 国籍    名前
0    UK   Alice
1  日本  しのぶ

[2 rows x 2 columns]

更多详细信息，请参见这里 ### 其他增强功能

支持 openpyxl >= 2.2。现在样式支持的 API 已稳定（GH 10125）。

merge 现在接受参数 indicator，它会向输出对象添加一个分类列（默认名为 _merge），其值为（GH 8790）。

观察来源	`_merge` 值
仅在 `'left'` 框架中的合并键	`left_only`
仅在 `'right'` 框架中的合并键	`right_only`
在两个框架中的合并键	`both`

In [40]: df1 = pd.DataFrame({"col1": [0, 1], "col_left": ["a", "b"]})

In [41]: df2 = pd.DataFrame({"col1": [1, 2, 2], "col_right": [2, 2, 2]})

In [42]: pd.merge(df1, df2, on="col1", how="outer", indicator=True)
Out[42]: 
 col1 col_left  col_right      _merge
0     0        a        NaN   left_only
1     1        b        2.0        both
2     2      NaN        2.0  right_only
3     2      NaN        2.0  right_only

[4 rows x 4 columns]

更多信息，请查看更新的文档。

pd.to_numeric 是一个将字符串强制转换为数字（可能会进行强制转换）的新函数（GH 11133）。
pd.merge 现在允许重复的列名，如果它们没有被合并（GH 10639）。
pd.pivot 现在允许将索引传递为 None（GH 3962）。

pd.concat 现在会使用提供的现有 Series 名称（GH 10698）。

In [43]: foo = pd.Series([1, 2], name="foo")

In [44]: bar = pd.Series([1, 2])

In [45]: baz = pd.Series([4, 5])

先前的行为：

In [1]: pd.concat([foo, bar, baz], axis=1)
Out[1]:
 0  1  2
 0  1  1  4
 1  2  2  5

新行为：

In [46]: pd.concat([foo, bar, baz], axis=1)
Out[46]: 
 foo  0  1
0    1  1  4
1    2  2  5

[2 rows x 3 columns]

DataFrame 现在具有 nlargest 和 nsmallest 方法（GH 10393）。

添加一个 limit_direction 关键字参数，与 limit 一起使用，以使 interpolate 前向、后向或两者填充 NaN 值（GH 9218，GH 10420，GH 11115）。

In [47]: ser = pd.Series([np.nan, np.nan, 5, np.nan, np.nan, np.nan, 13])

In [48]: ser.interpolate(limit=1, limit_direction="both")
Out[48]: 
0     NaN
1     5.0
2     5.0
3     7.0
4     NaN
5    11.0
6    13.0
Length: 7, dtype: float64

添加了一个 DataFrame.round 方法来将值四舍五入到可变的小数位数（GH 10568）。

In [49]: df = pd.DataFrame(
 ....:    np.random.random([3, 3]),
 ....:    columns=["A", "B", "C"],
 ....:    index=["first", "second", "third"],
 ....: )
 ....: 

In [50]: df
Out[50]: 
 A         B         C
first   0.126970  0.966718  0.260476
second  0.897237  0.376750  0.336222
third   0.451376  0.840255  0.123102

[3 rows x 3 columns]

In [51]: df.round(2)
Out[51]: 
 A     B     C
first   0.13  0.97  0.26
second  0.90  0.38  0.34
third   0.45  0.84  0.12

[3 rows x 3 columns]

In [52]: df.round({"A": 0, "C": 2})
Out[52]: 
 A         B     C
first   0.0  0.966718  0.26
second  1.0  0.376750  0.34
third   0.0  0.840255  0.12

[3 rows x 3 columns]

drop_duplicates 和 duplicated 现在接受一个 keep 关键字以针对第一个、最后一个和所有重复项。take_last 关键字已弃用，请参阅这里（GH 6511, GH 8505）。

In [53]: s = pd.Series(["A", "B", "C", "A", "B", "D"])

In [54]: s.drop_duplicates()
Out[54]: 
0    A
1    B
2    C
5    D
Length: 4, dtype: object

In [55]: s.drop_duplicates(keep="last")
Out[55]: 
2    C
3    A
4    B
5    D
Length: 4, dtype: object

In [56]: s.drop_duplicates(keep=False)
Out[56]: 
2    C
5    D
Length: 2, dtype: object

现在 reindex 有一个 tolerance 参数，允许对重新索引填充限制进行更精细的控制（GH 10411）：

In [57]: df = pd.DataFrame({"x": range(5), "t": pd.date_range("2000-01-01", periods=5)})

In [58]: df.reindex([0.1, 1.9, 3.5], method="nearest", tolerance=0.2)
Out[58]: 
 x          t
0.1  0.0 2000-01-01
1.9  2.0 2000-01-03
3.5  NaN        NaT

[3 rows x 2 columns]

当在 DatetimeIndex、TimedeltaIndex 或 PeriodIndex 上使用时，如果可能的话，tolerance 将被转换为 Timedelta。这允许您使用字符串指定容差：

In [59]: df = df.set_index("t")

In [60]: df.reindex(pd.to_datetime(["1999-12-31"]), method="nearest", tolerance="1 day")
Out[60]: 
 x
1999-12-31  0

[1 rows x 1 columns]

tolerance 也由较低级别的 Index.get_indexer 和 Index.get_loc 方法公开。

添加了在重采样 TimeDeltaIndex 时使用 base 参数的功能（GH 10530）。
DatetimeIndex 可以使用包含 NaT 的字符串实例化（GH 7599）。
to_datetime 现在可以接受 yearfirst 关键字（GH 7599）。
大于 Day 偏移量的 pandas.tseries.offsets 现在可以与 Series 一起用于加法/减法（GH 10699）。有关更多详情，请参阅文档。
pd.Timedelta.total_seconds() 现在返回到纳秒精度的 Timedelta 时长（先前是微秒精度）（GH 10939）。
PeriodIndex 现在支持与 np.ndarray 的算术运算（GH 10638）。
支持 Period 对象的 pickling（GH 10439）。
.as_blocks 现在将接受一个 copy 可选参数以返回数据的副本，默认值为复制（与以前版本的行为相同），（GH 9607）。
DataFrame.filter 的 regex 参数现在处理数值列名而不是引发 ValueError (GH 10384)。
通过 URL 读取 gzip 压缩文件，可以通过显式设置压缩参数或通过从响应中的 HTTP Content-Encoding 标头的存在推断来实现（GH 8685）。
使用 StringIO/BytesIO 在内存中写入 Excel 文件（GH 7074）。
在 ExcelWriter 中允许将列表和字典序列化为字符串（GH 8188）。
SQL io 函数现在接受 SQLAlchemy 的连接对象。(GH 7877)
pd.read_sql 和 to_sql 现在可以接受数据库 URI 作为 con 参数。(GH 10214)
read_sql_table 现在允许从视图中读取。(GH 10750)。
在使用 table 格式时，使得能够将复杂值写入到 HDFStores 中。(GH 10447)
当 HDF 文件包含单个数据集时，使得可以在不指定键的情况下使用 pd.read_hdf。(GH 10443)
pd.read_stata 现在能够读取 Stata 118 类型的文件。(GH 9882)
msgpack 子模块已更新到 0.4.6 版本并保持向后兼容。(GH 10581)
DataFrame.to_dict 现在接受 orient='index' 关键字参数。(GH 10844).
如果传递的函数返回一个字典且 reduce=True，则 DataFrame.apply 将返回一个字典的 Series。(GH 8735).
允许将 kwargs 传递给插值方法。(GH 10378)。
当连接空的 Dataframe 对象时，改进了错误消息的显示。(GH 9157)
pd.read_csv 现在能够逐步读取 bz2 压缩文件，并且 C 解析器能够从 AWS S3 中读取 bz2 压缩文件。(GH 11070, GH 11072).
在 pd.read_csv 中，识别 s3n:// 和 s3a:// URL 作为 S3 文件存储的标识。(GH 11070, GH 11071)。
从 AWS S3 逐步读取 CSV 文件，而不是首先下载整个文件。（Python 2 中仍需要对压缩文件进行完整下载。）(GH 11070, GH 11073)
pd.read_csv 现在能够推断从 AWS S3 存储读取的文件的压缩类型。(GH 11070, GH 11074)。 ### 带有时区的日期时间

我们正在添加一种原生支持带有时区的日期时间的实现。以前Series或DataFrame列可以被赋予带有时区的日期时间，并且会作为object dtype 工作。这对于大量行存在性能问题。有关更多详细信息，请参见文档。(GH 8260, GH 10763, GH 11034)。

新的实现允许在所有行之间具有单个时区，并以高效的方式进行操作。

In [1]: df = pd.DataFrame(
 ...:    {
 ...:        "A": pd.date_range("20130101", periods=3),
 ...:        "B": pd.date_range("20130101", periods=3, tz="US/Eastern"),
 ...:        "C": pd.date_range("20130101", periods=3, tz="CET"),
 ...:    }
 ...: )
 ...: 

In [2]: df
Out[2]: 
 A                         B                         C
0 2013-01-01 2013-01-01 00:00:00-05:00 2013-01-01 00:00:00+01:00
1 2013-01-02 2013-01-02 00:00:00-05:00 2013-01-02 00:00:00+01:00
2 2013-01-03 2013-01-03 00:00:00-05:00 2013-01-03 00:00:00+01:00

[3 rows x 3 columns]

In [3]: df.dtypes
Out[3]: 
A                datetime64[ns]
B    datetime64[ns, US/Eastern]
C           datetime64[ns, CET]
Length: 3, dtype: object

In [4]: df.B
Out[4]: 
0   2013-01-01 00:00:00-05:00
1   2013-01-02 00:00:00-05:00
2   2013-01-03 00:00:00-05:00
Name: B, Length: 3, dtype: datetime64[ns, US/Eastern]

In [5]: df.B.dt.tz_localize(None)
Out[5]: 
0   2013-01-01
1   2013-01-02
2   2013-01-03
Name: B, Length: 3, dtype: datetime64[ns]

这也使用了一个新的 dtype 表示，它在外观和感觉上与它的 numpy 堂兄datetime64[ns]非常相似

In [6]: df["B"].dtype
Out[6]: datetime64[ns, US/Eastern]

In [7]: type(df["B"].dtype)
Out[7]: pandas.core.dtypes.dtypes.DatetimeTZDtype

注意

由于 dtype 的更改，底层的DatetimeIndex具有略有不同的字符串 repr，但从功能上讲这些是相同的。

先前的行为：

In [1]: pd.date_range('20130101', periods=3, tz='US/Eastern')
Out[1]: DatetimeIndex(['2013-01-01 00:00:00-05:00', '2013-01-02 00:00:00-05:00',
 '2013-01-03 00:00:00-05:00'],
 dtype='datetime64[ns]', freq='D', tz='US/Eastern')

In [2]: pd.date_range('20130101', periods=3, tz='US/Eastern').dtype
Out[2]: dtype('<M8[ns]')

新行为：

In [8]: pd.date_range("20130101", periods=3, tz="US/Eastern")
Out[8]: 
DatetimeIndex(['2013-01-01 00:00:00-05:00', '2013-01-02 00:00:00-05:00',
 '2013-01-03 00:00:00-05:00'],
 dtype='datetime64[ns, US/Eastern]', freq='D')

In [9]: pd.date_range("20130101", periods=3, tz="US/Eastern").dtype
Out[9]: datetime64[ns, US/Eastern]

释放 GIL

我们正在释放一些 cython 操作上的全局解释器锁（GIL）。这将允许在计算期间同时运行其他线程，可能允许从多线程中获得性能改进。特别是groupby、nsmallest、value_counts和一些索引操作会从中受益。(GH 8882)

例如，以下代码中的 groupby 表达式将在因子化步骤中释放 GIL，例如df.groupby('key')以及.sum()操作。

N = 1000000
ngroups = 10
df = DataFrame(
    {"key": np.random.randint(0, ngroups, size=N), "data": np.random.randn(N)}
)
df.groupby("key")["data"].sum()

GIL 的释放可以使使用线程进行用户交互（例如QT）或执行多线程计算的应用受益。一个能够处理这些类型的并行计算的库的很好的例子是dask库。

绘图子方法

Series 和 DataFrame 的.plot()方法允许通过提供kind关键字参数来自定义绘图类型。不幸的是，许多这些类型的绘图使用不同的必需和可选关键字参数，这使得很难发现任何给定绘图类型使用的是哪些可能的参数中的哪些。

为了缓解这个问题，我们添加了一个新的、可选的绘图接口，它将每种类型的绘图作为.plot属性的一个方法暴露出来。现在你可以使用series.plot.<kind>(...)来替代写series.plot(kind=<kind>, ...)：

In [10]: df = pd.DataFrame(np.random.rand(10, 2), columns=['a', 'b'])

In [11]: df.plot.bar()

../_images/whatsnew_plot_submethods.png

由于这个变化，现在可以通过制表符补全发现这些方法了：

In [12]: df.plot.<TAB>  # noqa: E225, E999
df.plot.area     df.plot.barh     df.plot.density  df.plot.hist     df.plot.line     df.plot.scatter
df.plot.bar      df.plot.box      df.plot.hexbin   df.plot.kde      df.plot.pie

每个方法签名只包含相关参数。目前，这些限于必需参数，但在将来，这些将包括可选参数。有关概述，请参阅新的绘图 API 文档。

`dt` 访问器的额外方法

Series.dt.strftime

我们现在支持了一个用于日期型的 Series.dt.strftime 方法以生成格式化的字符串 (GH 10110)。例如：

# DatetimeIndex
In [13]: s = pd.Series(pd.date_range("20130101", periods=4))

In [14]: s
Out[14]: 
0   2013-01-01
1   2013-01-02
2   2013-01-03
3   2013-01-04
Length: 4, dtype: datetime64[ns]

In [15]: s.dt.strftime("%Y/%m/%d")
Out[15]: 
0    2013/01/01
1    2013/01/02
2    2013/01/03
3    2013/01/04
Length: 4, dtype: object

# PeriodIndex
In [16]: s = pd.Series(pd.period_range("20130101", periods=4))

In [17]: s
Out[17]: 
0    2013-01-01
1    2013-01-02
2    2013-01-03
3    2013-01-04
Length: 4, dtype: period[D]

In [18]: s.dt.strftime("%Y/%m/%d")
Out[18]: 
0    2013/01/01
1    2013/01/02
2    2013/01/03
3    2013/01/04
Length: 4, dtype: object

字符串格式与 Python 标准库相同，详情请查看此处

Series.dt.total_seconds

pd.Series 类型的 timedelta64 有一个新方法 .dt.total_seconds() 返回时间间隔的持续时间（以秒为单位） (GH 10817)

# TimedeltaIndex
In [19]: s = pd.Series(pd.timedelta_range("1 minutes", periods=4))

In [20]: s
Out[20]: 
0   0 days 00:01:00
1   1 days 00:01:00
2   2 days 00:01:00
3   3 days 00:01:00
Length: 4, dtype: timedelta64[ns]

In [21]: s.dt.total_seconds()
Out[21]: 
0        60.0
1     86460.0
2    172860.0
3    259260.0
Length: 4, dtype: float64

Series.dt.strftime

我们现在支持了一个用于日期型的 Series.dt.strftime 方法以生成格式化的字符串 (GH 10110)。例如：

# DatetimeIndex
In [13]: s = pd.Series(pd.date_range("20130101", periods=4))

In [14]: s
Out[14]: 
0   2013-01-01
1   2013-01-02
2   2013-01-03
3   2013-01-04
Length: 4, dtype: datetime64[ns]

In [15]: s.dt.strftime("%Y/%m/%d")
Out[15]: 
0    2013/01/01
1    2013/01/02
2    2013/01/03
3    2013/01/04
Length: 4, dtype: object

# PeriodIndex
In [16]: s = pd.Series(pd.period_range("20130101", periods=4))

In [17]: s
Out[17]: 
0    2013-01-01
1    2013-01-02
2    2013-01-03
3    2013-01-04
Length: 4, dtype: period[D]

In [18]: s.dt.strftime("%Y/%m/%d")
Out[18]: 
0    2013/01/01
1    2013/01/02
2    2013/01/03
3    2013/01/04
Length: 4, dtype: object

字符串格式与 Python 标准库相同，详情请查看此处

Series.dt.total_seconds

pd.Series 类型的 timedelta64 有一个新方法 .dt.total_seconds() 返回时间间隔的持续时间（以秒为单位） (GH 10817)

# TimedeltaIndex
In [19]: s = pd.Series(pd.timedelta_range("1 minutes", periods=4))

In [20]: s
Out[20]: 
0   0 days 00:01:00
1   1 days 00:01:00
2   2 days 00:01:00
3   3 days 00:01:00
Length: 4, dtype: timedelta64[ns]

In [21]: s.dt.total_seconds()
Out[21]: 
0        60.0
1     86460.0
2    172860.0
3    259260.0
Length: 4, dtype: float64

期间频率增强

Period、PeriodIndex 和 period_range 现在可以接受乘以频率的输入。此外，Period.freq 和 PeriodIndex.freq 现在存储为DateOffset实例，类似于DatetimeIndex，而不是存储为str (GH 7811)

乘以频率代表相应长度的跨度。下面的示例创建了一个为期 3 天的期间。加法和减法将以其跨度移动期间。

In [22]: p = pd.Period("2015-08-01", freq="3D")

In [23]: p
Out[23]: Period('2015-08-01', '3D')

In [24]: p + 1
Out[24]: Period('2015-08-04', '3D')

In [25]: p - 2
Out[25]: Period('2015-07-26', '3D')

In [26]: p.to_timestamp()
Out[26]: Timestamp('2015-08-01 00:00:00')

In [27]: p.to_timestamp(how="E")
Out[27]: Timestamp('2015-08-03 23:59:59.999999999')

您可以在 PeriodIndex 和 period_range 中使用频率相乘。

In [28]: idx = pd.period_range("2015-08-01", periods=4, freq="2D")

In [29]: idx
Out[29]: PeriodIndex(['2015-08-01', '2015-08-03', '2015-08-05', '2015-08-07'], dtype='period[2D]')

In [30]: idx + 1
Out[30]: PeriodIndex(['2015-08-03', '2015-08-05', '2015-08-07', '2015-08-09'], dtype='period[2D]')

对 SAS XPORT 文件的支持

read_sas() 提供了对SAS XPORT格式文件的读取支持。(GH 4052).

df = pd.read_sas("sas_xport.xpt")

还可以获取迭代器并逐步读取 XPORT 文件。

for df in pd.read_sas("sas_xport.xpt", chunksize=10000):
    do_something(df)

更多详情请参阅文档。

在 .eval() 中支持数学函数

eval() 现在支持调用数学函数 (GH 4893)

df = pd.DataFrame({"a": np.random.randn(10)})
df.eval("b = sin(a)")

支持的数学函数有 sin、cos、exp、log、expm1、log1p、sqrt、sinh、cosh、tanh、arcsin、arccos、arctan、arccosh、arcsinh、arctanh、abs 和 arctan2。

这些函数映射到NumExpr引擎的内在功能。对于 Python 引擎，它们映射到NumPy调用。

Excel 中的 `MultiIndex` 的更改

在版本 0.16.2 中，带有MultiIndex列的DataFrame无法通过to_excel写入 Excel。该功能已添加（GH 10564），以及更新read_excel，使得可以通过在header和index_col参数中指定哪些列/行组成MultiIndex来读取数据而无信息丢失（GH 4679）。

有关更多详细信息，请参阅文档。

In [31]: df = pd.DataFrame(
 ....:    [[1, 2, 3, 4], [5, 6, 7, 8]],
 ....:    columns=pd.MultiIndex.from_product(
 ....:        [["foo", "bar"], ["a", "b"]], names=["col1", "col2"]
 ....:    ),
 ....:    index=pd.MultiIndex.from_product([["j"], ["l", "k"]], names=["i1", "i2"]),
 ....: )
 ....: 

In [32]: df
Out[32]: 
col1  foo    bar 
col2    a  b   a  b
i1 i2 
j  l    1  2   3  4
 k    5  6   7  8

[2 rows x 4 columns]

In [33]: df.to_excel("test.xlsx")

In [34]: df = pd.read_excel("test.xlsx", header=[0, 1], index_col=[0, 1])

In [35]: df
Out[35]: 
col1  foo    bar 
col2    a  b   a  b
i1 i2 
j  l    1  2   3  4
 k    5  6   7  8

[2 rows x 4 columns]

以前，如果序列化数据具有索引名称，则在read_excel中需要指定has_index_names参数。对于版本 0.17.0，to_excel的输出格式已更改，使得此关键字不再必要-更改如下所示。

旧

../_images/old-excel-index.png

新

../_images/new-excel-index.png

警告

在版本 0.16.2 或之前保存的具有索引名称的 Excel 文件仍然可以读取，但必须将has_index_names参数指定为True。

Google BigQuery 增强功能

在使用pandas.io.gbq.to_gbq()函数时，如果目标表/数据集不存在，现在可以自动创建表格/数据集。（GH 8325，GH 11121）。
在调用pandas.io.gbq.to_gbq()函数时，通过if_exists参数添加了替换现有表格和模式的功能。更多详细信息请参阅文档（GH 8325）。
在 gbq 模块中，InvalidColumnOrder和InvalidPageToken将引发ValueError而不是IOError。
generate_bq_schema()函数现已弃用，并将在将来的版本中删除（GH 11121）。
gbq 模块现在将支持 Python 3（GH 11094）。

使用 Unicode 东亚宽度进行显示对齐

警告

启用此选项将影响DataFrame和Series的打印性能（大约慢 2 倍）。只在实际需要时使用。

一些东亚国家使用 Unicode 字符，其宽度相当于 2 个字母。如果DataFrame或Series包含这些字符，则默认输出无法正确对齐。以下选项已添加以精确处理这些字符。

display.unicode.east_asian_width：是否使用 Unicode 东亚宽度来计算显示文本宽度（GH 2612）。
display.unicode.ambiguous_as_wide：是否处理属于 Ambiguous 的 Unicode 字符为 Wide（GH 11102）。

In [36]: df = pd.DataFrame({u"国籍": ["UK", u"日本"], u"名前": ["Alice", u"しのぶ"]})

In [37]: df
Out[37]: 
 国籍     名前
0  UK  Alice
1  日本    しのぶ

[2 rows x 2 columns]

In [38]: pd.set_option("display.unicode.east_asian_width", True)

In [39]: df
Out[39]: 
 国籍    名前
0    UK   Alice
1  日本  しのぶ

[2 rows x 2 columns]

更多细节，请参见这里

其他增强

支持 openpyxl >= 2.2. 现在样式支持的 API 已经稳定 (GH 10125)

merge 现在接受 indicator 参数，该参数将一个分类类型列（默认为 _merge）添加到输出对象中，取值为 (GH 8790)

观察来源	`_merge` 值
仅在 `'left'` 框架中的合并键	`left_only`
仅在 `'right'` 框架中的合并键	`right_only`
两个框架中的合并键	`both`

In [40]: df1 = pd.DataFrame({"col1": [0, 1], "col_left": ["a", "b"]})

In [41]: df2 = pd.DataFrame({"col1": [1, 2, 2], "col_right": [2, 2, 2]})

In [42]: pd.merge(df1, df2, on="col1", how="outer", indicator=True)
Out[42]: 
 col1 col_left  col_right      _merge
0     0        a        NaN   left_only
1     1        b        2.0        both
2     2      NaN        2.0  right_only
3     2      NaN        2.0  right_only

[4 rows x 4 columns]

不兼容的 API 更改

排序 API 的更改

排序 API 存在一些长期的不一致性。(GH 9816，GH 8239)。

这是 0.17.0 之前的 API 摘要：

Series.sort是原地的，而DataFrame.sort返回一个新对象。
Series.order返回一个新对象
可以使用Series/DataFrame.sort_index通过传递by关键字来按值排序。
Series/DataFrame.sortlevel仅适用于通过索引排序的MultiIndex。

为了解决��些问题，我们已经重新设计了 API：

我们引入了一个新方法，DataFrame.sort_values()，它是DataFrame.sort()，Series.sort()和Series.order()的合并，用于处理值的排序。
现有方法Series.sort()，Series.order()和DataFrame.sort()已被弃用，并将在将来的版本中删除。
DataFrame.sort_index()的by参数已被弃用，并将在将来的版本中删除。
现有方法.sort_index()将增加level关键字以启用级别排序。

现在我们有两种不重叠的排序方法。*标记的项目将显示FutureWarning。

按值排序：

之前	替换
* `Series.order()`	`Series.sort_values()`
* `Series.sort()`	`Series.sort_values(inplace=True)`
* `DataFrame.sort(columns=...)`	`DataFrame.sort_values(by=...)`

按索引排序：

之前	替换
`Series.sort_index()`	`Series.sort_index()`
`Series.sortlevel(level=...)`	`Series.sort_index(level=...`)
`DataFrame.sort_index()`	`DataFrame.sort_index()`
`DataFrame.sortlevel(level=...)`	`DataFrame.sort_index(level=...)`
* `DataFrame.sort()`	`DataFrame.sort_index()`

我们还弃用并更改了两个类似于 Series 的类Index和Categorical中的类似方法。

以前	替换
* `Index.order()`	`Index.sort_values()`

| * Categorical.order() | Categorical.sort_values() | ### to_datetime和to_timedelta的更改

错误处理

pd.to_datetime的默认错误处理方式已更改为errors='raise'。在之前的版本中是errors='ignore'。此外，coerce参数已被弃用，改用errors='coerce'。这意味着无效的解析将引发错误，而不像以前的版本那样返回原始输入。(GH 10636)

以前的行为：

In [2]: pd.to_datetime(['2009-07-31', 'asd'])
Out[2]: array(['2009-07-31', 'asd'], dtype=object)

新行为：

In [3]: pd.to_datetime(['2009-07-31', 'asd'])
ValueError: Unknown string format

当然您也可以强制执行这个。

In [61]: pd.to_datetime(["2009-07-31", "asd"], errors="coerce")
Out[61]: DatetimeIndex(['2009-07-31', 'NaT'], dtype='datetime64[ns]', freq=None)

要保持以前的行为，您可以使用errors='ignore'：

In [4]: pd.to_datetime(["2009-07-31", "asd"], errors="ignore")
Out[4]: Index(['2009-07-31', 'asd'], dtype='object')

此外，pd.to_timedelta也获得了类似的 API，errors='raise'|'ignore'|'coerce'，并且coerce关键字已被弃用，改用errors='coerce'。

一致的解析

to_datetime，Timestamp和DatetimeIndex的字符串解析已经变得一致。(GH 7599)

在 v0.17.0 之前，Timestamp和to_datetime可能会使用今天的日期不正确地解析仅包含年份的日期字符串，否则DatetimeIndex将使用该年的年初。Timestamp和to_datetime可能会在某些DatetimeIndex可以解析的日期字符串类型中引发ValueError，例如季度字符串。

以前的行为：

In [1]: pd.Timestamp('2012Q2')
Traceback
 ...
ValueError: Unable to parse 2012Q2

# Results in today's date.
In [2]: pd.Timestamp('2014')
Out [2]: 2014-08-12 00:00:00

v0.17.0 可以像下面这样解析它们。它也适用于DatetimeIndex。

新行为：

In [62]: pd.Timestamp("2012Q2")
Out[62]: Timestamp('2012-04-01 00:00:00')

In [63]: pd.Timestamp("2014")
Out[63]: Timestamp('2014-01-01 00:00:00')

In [64]: pd.DatetimeIndex(["2012Q2", "2014"])
Out[64]: DatetimeIndex(['2012-04-01', '2014-01-01'], dtype='datetime64[ns]', freq=None)

注意

如果您想要基于今天的日期执行计算，请使用Timestamp.now()和pandas.tseries.offsets。

In [65]: import pandas.tseries.offsets as offsets

In [66]: pd.Timestamp.now()
Out[66]: Timestamp('2024-04-10 17:55:56.541543')

In [67]: pd.Timestamp.now() + offsets.DateOffset(years=1)
Out[67]: Timestamp('2025-04-10 17:55:56.542277')

索引比较的更改

Index上的等号操作应该与Series类似（GH 9947, GH 10637）

从 v0.17.0 开始，比较不同长度的Index对象将引发ValueError。这是为了与Series的行为保持一致。

以前的行为：

In [2]: pd.Index([1, 2, 3]) == pd.Index([1, 4, 5])
Out[2]: array([ True, False, False], dtype=bool)

In [3]: pd.Index([1, 2, 3]) == pd.Index([2])
Out[3]: array([False,  True, False], dtype=bool)

In [4]: pd.Index([1, 2, 3]) == pd.Index([1, 2])
Out[4]: False

新行为：

In [8]: pd.Index([1, 2, 3]) == pd.Index([1, 4, 5])
Out[8]: array([ True, False, False], dtype=bool)

In [9]: pd.Index([1, 2, 3]) == pd.Index([2])
ValueError: Lengths must match to compare

In [10]: pd.Index([1, 2, 3]) == pd.Index([1, 2])
ValueError: Lengths must match to compare

请注意，这与numpy的行为不同，其中可以进行广播比较：

In [68]: np.array([1, 2, 3]) == np.array([1])
Out[68]: array([ True, False, False])

或者如果无法进行广播，则可以返回 False：

In [11]: np.array([1, 2, 3]) == np.array([1, 2])
Out[11]: False

布尔比较与 None 的更改

Series与None的布尔比较现在将等同于与np.nan比较，而不是引发TypeError。(GH 1079).

In [69]: s = pd.Series(range(3), dtype="float")

In [70]: s.iloc[1] = None

In [71]: s
Out[71]: 
0    0.0
1    NaN
2    2.0
Length: 3, dtype: float64

以前的行为：

In [5]: s == None
TypeError: Could not compare <type 'NoneType'> type with Series

新行为：

In [72]: s == None
Out[72]: 
0    False
1    False
2    False
Length: 3, dtype: bool

通常您只想知道哪些值为 null。

In [73]: s.isnull()
Out[73]: 
0    False
1     True
2    False
Length: 3, dtype: bool

警告

通常您会希望对这些类型的比较使用isnull/notnull，因为isnull/notnull告诉您哪些元素为 null。必须注意nan不相等，但None相等。请注意，pandas/numpy 使用np.nan != np.nan的事实，并将None视为np.nan。

In [74]: None == None
Out[74]: True

In [75]: np.nan == np.nan
Out[75]: False

HDFStore dropna 行为

使用format='table'的 HDFStore 写函数的默认行为现在是保留所有缺失的行。以前，行为是删除所有缺失的行，保存索引。以前的行为可以使用dropna=True选项复制。(GH 9382)

先前的行为：

In [76]: df_with_missing = pd.DataFrame(
 ....:    {"col1": [0, np.nan, 2], "col2": [1, np.nan, np.nan]}
 ....: )
 ....: 

In [77]: df_with_missing
Out[77]: 
 col1  col2
0   0.0   1.0
1   NaN   NaN
2   2.0   NaN

[3 rows x 2 columns]

In [27]:
df_with_missing.to_hdf('file.h5',
 key='df_with_missing',
 format='table',
 mode='w')

In [28]: pd.read_hdf('file.h5', 'df_with_missing')

Out [28]:
 col1  col2
 0     0     1
 2     2   NaN

新行为：

In [78]: df_with_missing.to_hdf("file.h5", key="df_with_missing", format="table", mode="w")

In [79]: pd.read_hdf("file.h5", "df_with_missing")
Out[79]: 
 col1  col2
0   0.0   1.0
1   NaN   NaN
2   2.0   NaN

[3 rows x 2 columns]

查看文档以获取更多详细信息。### display.precision选项的更改

display.precision选项已经明确指定为小数位数（GH 10451）

早期版本的 pandas 会将浮点数格式化为比display.precision中的值少一个小数位数。

In [1]: pd.set_option('display.precision', 2)

In [2]: pd.DataFrame({'x': [123.456789]})
Out[2]:
 x
0  123.5

如果将精度解释为“有效数字”，这对科学计数法有效，但对于具有标准格式的值，同样的解释不起作用。这也与 numpy 处理格式的方式不一致。

从现在开始，display.precision的值将直接控制小数点后的位数，用于常规��式和科学计数法，类似于 numpy 的precision打印选项的工作方式。

In [80]: pd.set_option("display.precision", 2)

In [81]: pd.DataFrame({"x": [123.456789]})
Out[81]: 
 x
0  123.46

[1 rows x 1 columns]

为了保持与先前版本的输出行为一致，display.precision的默认值已从7降低到6。### Categorical.unique的更改

Categorical.unique现在返回具有唯一categories和codes的新Categoricals，而不是返回np.array（GH 10508）

无序类别：值和类别按出现顺序排序。
有序类别：值按出现顺序排序，类别保持现有顺序。

In [82]: cat = pd.Categorical(["C", "A", "B", "C"], categories=["A", "B", "C"], ordered=True)

In [83]: cat
Out[83]: 
['C', 'A', 'B', 'C']
Categories (3, object): ['A' < 'B' < 'C']

In [84]: cat.unique()
Out[84]: 
['C', 'A', 'B']
Categories (3, object): ['A' < 'B' < 'C']

In [85]: cat = pd.Categorical(["C", "A", "B", "C"], categories=["A", "B", "C"])

In [86]: cat
Out[86]: 
['C', 'A', 'B', 'C']
Categories (3, object): ['A', 'B', 'C']

In [87]: cat.unique()
Out[87]: 
['C', 'A', 'B']
Categories (3, object): ['A', 'B', 'C']

解析器中传递的`header`为`bool`的更改

在早期版本的 pandas 中，如果在read_csv、read_excel或read_html的header参数中传递了一个布尔值，它会被隐式转换为整数，导致False时header=0，True时header=1（GH 6113）

对于header的bool输入现在会引发TypeError

In [29]: df = pd.read_csv('data.csv', header=False)
TypeError: Passing a bool to header is invalid. Use header=None for no header or
header=int or list-like of ints to specify the row(s) making up the column names

其他 API 更改

具有subplots=True的线条和 kde 图现在使用默认颜色，而不是全部黑色。指定color='k'以将所有线条绘制为黑色（GH 9894）
在具有categorical数据类型的 Series 上调用.value_counts()方法现在返回一个带有CategoricalIndex的 Series（GH 10704）
pandas 对象的子类的元数据属性现在将被序列化（GH 10553）
使用Categorical进行groupby遵循上述Categorical.unique描述的相同规则（GH 10508）。
以前使用complex64 dtype 数组构建DataFrame意味着相应的列会自动提升为complex128 dtype。现在 pandas 将保留复杂数据输入的 itemsize（GH 10952）。
一些数值缩减运算符在包含字符串和数字的对象类型上会返回ValueError，而不是TypeError（GH 11131）。
将当前不支持的chunksize参数传递给read_excel或ExcelFile.parse现在将引发NotImplementedError（GH 8011）。
允许将ExcelFile对象传递给read_excel（GH 11198）。
DatetimeIndex.union如果self和输入都将freq设为None，则不会推断freq（GH 11086）。
NaT的方法现在要么引发ValueError，要么返回np.nan或NaT（GH 9513）。

行为方法

返回np.nan weekday，isoweekday

返回NaT date，now，replace，to_datetime，today

返回np.datetime64('NaT') to_datetime64（未更改）

| 引发ValueError | 所有其��公共方法（名称不以下划线开头）| ### 弃用
对于Series，以下索引函数已被弃用（GH 10177）。

弃用函数替代

.irow(i) .iloc[i]或.iat[i]

.iget(i) .iloc[i]或.iat[i]

.iget_value(i) .iloc[i]或.iat[i]
对于DataFrame，以下索引函数已被弃用（GH 10177）。

弃用函数替代

.irow(i) .iloc[i]

.iget_value(i, j) .iloc[i, j]或.iat[i, j]

.icol(j) .iloc[:, j]

行为	方法
返回`np.nan`	`weekday`，`isoweekday`
返回`NaT`	`date`，`now`，`replace`，`to_datetime`，`today`
返回`np.datetime64('NaT')`	`to_datetime64`（未更改）

弃用函数	替代
`.irow(i)`	`.iloc[i]`或`.iat[i]`
`.iget(i)`	`.iloc[i]`或`.iat[i]`
`.iget_value(i)`	`.iloc[i]`或`.iat[i]`

弃用函数	替代
`.irow(i)`	`.iloc[i]`
`.iget_value(i, j)`	`.iloc[i, j]`或`.iat[i, j]`
`.icol(j)`	`.iloc[:, j]`

注意

这些索引函数自 0.11.0 版本起已在文档中弃用。

Categorical.name已被弃用，以使Categorical更像numpy.ndarray。请改用Series(cat, name="whatever")代替（GH 10482）。
在Categorical的categories中设置缺失值（NaN）将发出警告（GH 10748）。您仍然可以在values中有缺失值。
drop_duplicates和duplicated的take_last关键字已被弃用，推荐使用keep（GH 6511，GH 8505）。
Series.nsmallest和nlargest的take_last关键字已弃用，改用keep。 (GH 10792)
DataFrame.combineAdd和DataFrame.combineMult已弃用。可以使用add和mul方法轻松替代：DataFrame.add(other, fill_value=0)和DataFrame.mul(other, fill_value=1.) (GH 10735).
TimeSeries已弃用，改用Series（请注意，自 0.13.0 以来，这一直是别名）， (GH 10890)
SparsePanel已弃用，将在将来的版本中移除 (GH 11157).
Series.is_time_series已弃用，改用Series.index.is_all_dates (GH 11135)
已弃用传统偏移量（例如'A@JAN'）（请注意，自 0.8.0 以来，这一直是别名） (GH 10878)
WidePanel已弃用，改用Panel，LongPanel已弃用，改用DataFrame（请注意，自 0.11.0 之前，这些一直是别名）， (GH 10892)
DataFrame.convert_objects已弃用，改用类型特定的函数pd.to_datetime，pd.to_timestamp和pd.to_numeric（在 0.17.0 中新增） (GH 11133). ### 移除先前版本的弃用/更改
从Series.order()和Series.sort()中删除na_last参数，改用na_position。 (GH 5231)
从.describe()中删除percentile_width，改用percentiles。 (GH 7088)
从DataFrame.to_string()中删除colSpace参数，改用col_space，大约是在 0.8.0 版本。

自动时间序列广播已移除 (GH 2304)

In [88]: np.random.seed(1234)

In [89]: df = pd.DataFrame(
 ....:    np.random.randn(5, 2),
 ....:    columns=list("AB"),
 ....:    index=pd.date_range("2013-01-01", periods=5),
 ....: )
 ....: 

In [90]: df
Out[90]: 
 A         B
2013-01-01  0.471435 -1.190976
2013-01-02  1.432707 -0.312652
2013-01-03 -0.720589  0.887163
2013-01-04  0.859588 -0.636524
2013-01-05  0.015696 -2.242685

[5 rows x 2 columns]

以前

In [3]: df + df.A
FutureWarning: TimeSeries broadcasting along DataFrame index by default is deprecated.
Please use DataFrame.<op> to explicitly broadcast arithmetic operations along the index

Out[3]:
 A         B
2013-01-01  0.942870 -0.719541
2013-01-02  2.865414  1.120055
2013-01-03 -1.441177  0.166574
2013-01-04  1.719177  0.223065
2013-01-05  0.031393 -2.226989

现有

In [91]: df.add(df.A, axis="index")
Out[91]: 
 A         B
2013-01-01  0.942870 -0.719541
2013-01-02  2.865414  1.120055
2013-01-03 -1.441177  0.166574
2013-01-04  1.719177  0.223065
2013-01-05  0.031393 -2.226989

[5 rows x 2 columns]

在HDFStore.put/append中删除table关键字，改用format= (GH 4645)
在read_excel/ExcelFile中删除kind，因为它没有被使用 (GH 4712)
从pd.read_html中删除infer_type关键字，因为它没有被使用 (GH 4770, GH 7032)
从Series.tshift/shift中删除offset和timeRule关键字，改用freq (GH 4853, GH 4864)
删除pd.load/pd.save别名，改用pd.to_pickle/pd.read_pickle (GH 3787) ### 排序 API 的更改

排序 API 存在长期的不一致性。(GH 9816, GH 8239).

这是 0.17.0 之前 API 的摘要：

Series.sort是INPLACE的，而DataFrame.sort返回一个新对象。
Series.order返回一个新对象
可以使用Series/DataFrame.sort_index通过传递by关键字来按值排序。
Series/DataFrame.sortlevel仅在MultiIndex上对索引进行排序。

为了解决这些问题，我们已经重构了 API：

我们引入了一个新方法，DataFrame.sort_values()，它是DataFrame.sort()、Series.sort()和Series.order()的合并，用于处理值的排序。
现有的方法Series.sort()、Series.order()和DataFrame.sort()已被弃用，并将在未来的版本中移除。
DataFrame.sort_index()的by参数已被弃用，并将在未来的版本中移除。
现有方法.sort_index()将增加level关键字以启用级别排序。

现在我们有了两种不同且不重叠的排序方法。*标记了将显示FutureWarning的项目。

要按值排序：

以前	替换
* `Series.order()`	`Series.sort_values()`
* `Series.sort()`	`Series.sort_values(inplace=True)`
* `DataFrame.sort(columns=...)`	`DataFrame.sort_values(by=...)`

要按索引排序：

以前	替换
`Series.sort_index()`	`Series.sort_index()`
`Series.sortlevel(level=...)`	`Series.sort_index(level=...')
`DataFrame.sort_index()`	`DataFrame.sort_index()`
`DataFrame.sortlevel(level=...)`	`DataFrame.sort_index(level=...)`
* `DataFrame.sort()`	`DataFrame.sort_index()`

我们还已经弃用和更改了两个类似于 Series 的类Index和Categorical中的类似方法。

以前	替换
* `Index.order()`	`Index.sort_values()`
* `Categorical.order()`	`Categorical.sort_values()`

`to_datetime`和`to_timedelta`的更改

错误处理

pd.to_datetime错误处理的默认值已更改为errors='raise'。在之前的版本中是errors='ignore'。此外，coerce参数已被弃用，改用errors='coerce'。这意味着无效的解析将引发错误，而不是像以前的版本那样返回原始输入。(GH 10636)

以前的行为：

In [2]: pd.to_datetime(['2009-07-31', 'asd'])
Out[2]: array(['2009-07-31', 'asd'], dtype=object)

新的行为：

In [3]: pd.to_datetime(['2009-07-31', 'asd'])
ValueError: Unknown string format

当然你也可以强制执行这个。

In [61]: pd.to_datetime(["2009-07-31", "asd"], errors="coerce")
Out[61]: DatetimeIndex(['2009-07-31', 'NaT'], dtype='datetime64[ns]', freq=None)

要保持以前的行为，可以使用errors='ignore'：

In [4]: pd.to_datetime(["2009-07-31", "asd"], errors="ignore")
Out[4]: Index(['2009-07-31', 'asd'], dtype='object')

此外，pd.to_timedelta已经获得了类似的 API，即errors='raise'|'ignore'|'coerce'，并且coerce关键字已被弃用，改用errors='coerce'。

一致的解析

to_datetime，Timestamp和DatetimeIndex的字符串解析已经统一。 (GH 7599)

在 v0.17.0 之前，Timestamp和to_datetime可能会使用今天的日期错误地解析年份日期字符串，否则DatetimeIndex将使用年初。在某些DatetimeIndex可以解析的日期时间字符串（如季度字符串）中，Timestamp和to_datetime可能会引发ValueError。

先前的行为：

In [1]: pd.Timestamp('2012Q2')
Traceback
 ...
ValueError: Unable to parse 2012Q2

# Results in today's date.
In [2]: pd.Timestamp('2014')
Out [2]: 2014-08-12 00:00:00

v0.17.0 可以解析如下。它也适用于DatetimeIndex。

新行为：

In [62]: pd.Timestamp("2012Q2")
Out[62]: Timestamp('2012-04-01 00:00:00')

In [63]: pd.Timestamp("2014")
Out[63]: Timestamp('2014-01-01 00:00:00')

In [64]: pd.DatetimeIndex(["2012Q2", "2014"])
Out[64]: DatetimeIndex(['2012-04-01', '2014-01-01'], dtype='datetime64[ns]', freq=None)

注意

如果您想要基于今天的日期执行计算，请使用Timestamp.now()和pandas.tseries.offsets。

In [65]: import pandas.tseries.offsets as offsets

In [66]: pd.Timestamp.now()
Out[66]: Timestamp('2024-04-10 17:55:56.541543')

In [67]: pd.Timestamp.now() + offsets.DateOffset(years=1)
Out[67]: Timestamp('2025-04-10 17:55:56.542277')

错误处理

pd.to_datetime的错误处理默认值已更改为errors='raise'。在之前的版本中，它是errors='ignore'。此外，coerce参数已弃用，改为errors='coerce'。这意味着无效的解析将引发而不是返回原始输入，与以前的版本不同。 (GH 10636)

先前的行为：

In [2]: pd.to_datetime(['2009-07-31', 'asd'])
Out[2]: array(['2009-07-31', 'asd'], dtype=object)

新行为：

In [3]: pd.to_datetime(['2009-07-31', 'asd'])
ValueError: Unknown string format

当然，您也可以强制执行这一点。

In [61]: pd.to_datetime(["2009-07-31", "asd"], errors="coerce")
Out[61]: DatetimeIndex(['2009-07-31', 'NaT'], dtype='datetime64[ns]', freq=None)

要保持先前的行为，您可以使用errors='ignore'：

In [4]: pd.to_datetime(["2009-07-31", "asd"], errors="ignore")
Out[4]: Index(['2009-07-31', 'asd'], dtype='object')

此外，pd.to_timedelta已经获得了相似的 API，即errors='raise'|'ignore'|'coerce'，并且coerce关键字已经弃用，改为errors='coerce'。

一致的解析

to_datetime，Timestamp和DatetimeIndex的字符串解析已经统一。 (GH 7599)

先前的行为：

In [1]: pd.Timestamp('2012Q2')
Traceback
 ...
ValueError: Unable to parse 2012Q2

# Results in today's date.
In [2]: pd.Timestamp('2014')
Out [2]: 2014-08-12 00:00:00

v0.17.0 可以解析如下。它也适用于DatetimeIndex。

新行为：

In [62]: pd.Timestamp("2012Q2")
Out[62]: Timestamp('2012-04-01 00:00:00')

In [63]: pd.Timestamp("2014")
Out[63]: Timestamp('2014-01-01 00:00:00')

In [64]: pd.DatetimeIndex(["2012Q2", "2014"])
Out[64]: DatetimeIndex(['2012-04-01', '2014-01-01'], dtype='datetime64[ns]', freq=None)

注意

如果您想要基于今天的日期执行计算，请使用Timestamp.now()和pandas.tseries.offsets。

In [65]: import pandas.tseries.offsets as offsets

In [66]: pd.Timestamp.now()
Out[66]: Timestamp('2024-04-10 17:55:56.541543')

In [67]: pd.Timestamp.now() + offsets.DateOffset(years=1)
Out[67]: Timestamp('2025-04-10 17:55:56.542277')

索引比较的更改

Index上的等号操作应该与Series类似 (GH 9947, GH 10637)

从 v0.17.0 开始，比较长度不同的Index对象将引发ValueError。这是为了与Series的行为一致。

先前的行为：

In [2]: pd.Index([1, 2, 3]) == pd.Index([1, 4, 5])
Out[2]: array([ True, False, False], dtype=bool)

In [3]: pd.Index([1, 2, 3]) == pd.Index([2])
Out[3]: array([False,  True, False], dtype=bool)

In [4]: pd.Index([1, 2, 3]) == pd.Index([1, 2])
Out[4]: False

新行为：

In [8]: pd.Index([1, 2, 3]) == pd.Index([1, 4, 5])
Out[8]: array([ True, False, False], dtype=bool)

In [9]: pd.Index([1, 2, 3]) == pd.Index([2])
ValueError: Lengths must match to compare

In [10]: pd.Index([1, 2, 3]) == pd.Index([1, 2])
ValueError: Lengths must match to compare

请注意，这与numpy的行为不同，其中比较可以广播：

In [68]: np.array([1, 2, 3]) == np.array([1])
Out[68]: array([ True, False, False])

或者如果无法进行广播，则可以返回 False：

In [11]: np.array([1, 2, 3]) == np.array([1, 2])
Out[11]: False

对布尔比较与`None`的更改

与None比较Series的布尔比较现在等效于与np.nan比较，而不是引发TypeError。 (GH 1079).

In [69]: s = pd.Series(range(3), dtype="float")

In [70]: s.iloc[1] = None

In [71]: s
Out[71]: 
0    0.0
1    NaN
2    2.0
Length: 3, dtype: float64

先前的行为：

In [5]: s == None
TypeError: Could not compare <type 'NoneType'> type with Series

新行为：

In [72]: s == None
Out[72]: 
0    False
1    False
2    False
Length: 3, dtype: bool

通常，您只想知道哪些值为空。

In [73]: s.isnull()
Out[73]: 
0    False
1     True
2    False
Length: 3, dtype: bool

警告

通常您会希望使用 isnull/notnull 进行这些类型的比较，因为 isnull/notnull 告诉您哪些元素为空。需要注意的是 nan 不相等，但 None 相等。请注意，pandas/numpy 使用 np.nan != np.nan 这一事实，并将 None 视为 np.nan。

In [74]: None == None
Out[74]: True

In [75]: np.nan == np.nan
Out[75]: False

HDFStore dropna 行为

使用 format='table' 的 HDFStore 写入函数的默认行为现在是保留所有丢失的行。以前，行为是删除所有丢失的行，除了索引。以前的行为可以使用 dropna=True 选项复制。（GH 9382）

以前的行为：

In [76]: df_with_missing = pd.DataFrame(
 ....:    {"col1": [0, np.nan, 2], "col2": [1, np.nan, np.nan]}
 ....: )
 ....: 

In [77]: df_with_missing
Out[77]: 
 col1  col2
0   0.0   1.0
1   NaN   NaN
2   2.0   NaN

[3 rows x 2 columns]

In [27]:
df_with_missing.to_hdf('file.h5',
 key='df_with_missing',
 format='table',
 mode='w')

In [28]: pd.read_hdf('file.h5', 'df_with_missing')

Out [28]:
 col1  col2
 0     0     1
 2     2   NaN

新行为：

In [78]: df_with_missing.to_hdf("file.h5", key="df_with_missing", format="table", mode="w")

In [79]: pd.read_hdf("file.h5", "df_with_missing")
Out[79]: 
 col1  col2
0   0.0   1.0
1   NaN   NaN
2   2.0   NaN

[3 rows x 2 columns]

更多详情请参阅文档。

对 `display.precision` 选项的更改

display.precision 选项已经明确指出是指小数位数（GH 10451）。

早期版本的 pandas 会将浮点数格式化为比 display.precision 中的值少一个小数位数。

In [1]: pd.set_option('display.precision', 2)

In [2]: pd.DataFrame({'x': [123.456789]})
Out[2]:
 x
0  123.5

如果将精度解释为“有效数字”，这对科学计数法确实有效，但对于标准格式的值，相同的解释则不起作用。这也与 numpy 处理格式的方式不一致。

未来，display.precision 的值将直接控制小数点后的位数，适用于常规格式和科学计数法，类似于 numpy 的 precision 打印选项的工作方式。

In [80]: pd.set_option("display.precision", 2)

In [81]: pd.DataFrame({"x": [123.456789]})
Out[81]: 
 x
0  123.46

[1 rows x 1 columns]

为了保留与之前版本的输出行为相同，display.precision 的默认值已从 7 减少到 6。

对 `Categorical.unique` 的更改

Categorical.unique 现在返回具有唯一 categories 和 codes 的新 Categoricals，而不是返回 np.array （GH 10508）

无序类别：值和类别按出现顺序排序。
有序类别：值按出现顺序排序，类别保留现有顺序。

In [82]: cat = pd.Categorical(["C", "A", "B", "C"], categories=["A", "B", "C"], ordered=True)

In [83]: cat
Out[83]: 
['C', 'A', 'B', 'C']
Categories (3, object): ['A' < 'B' < 'C']

In [84]: cat.unique()
Out[84]: 
['C', 'A', 'B']
Categories (3, object): ['A' < 'B' < 'C']

In [85]: cat = pd.Categorical(["C", "A", "B", "C"], categories=["A", "B", "C"])

In [86]: cat
Out[86]: 
['C', 'A', 'B', 'C']
Categories (3, object): ['A', 'B', 'C']

In [87]: cat.unique()
Out[87]: 
['C', 'A', 'B']
Categories (3, object): ['A', 'B', 'C']

更改传递给解析器中的 `header` 的 `bool`

在 pandas 的早期版本中，如果 read_csv、read_excel 或 read_html 的 header 参数传递了一个布尔值，则它会被隐式转换为整数，结果是 False 对应 header=0，True 对应 header=1 （GH 6113）

header 的 bool 输入现在会引发 TypeError。

In [29]: df = pd.read_csv('data.csv', header=False)
TypeError: Passing a bool to header is invalid. Use header=None for no header or
header=int or list-like of ints to specify the row(s) making up the column names

其他 API 更改

使用 subplots=True 时，线条和 kde 图现在使用默认颜色，而不是全黑色。指定 color='k' 来绘制所有线条为黑色（GH 9894）。
对具有 categorical dtype 的 Series 调用 .value_counts() 方法现在将返回一个具有 CategoricalIndex 的 Series（GH 10704）
pandas 对象的子类的元数据属性现在将被序列化（GH 10553）。
使用 Categorical 进行 groupby 遵循与上述 Categorical.unique 相同的规则（GH 10508）
当使用 complex64 dtype 的数组构建 DataFrame 时，以前会自动将相应列提升为 complex128 dtype。pandas 现在会保留复杂数据的输入项大小（GH 10952）
一些数值缩减运算符在包含字符串和数字的对象类型上会返回 ValueError 而不是 TypeError（GH 11131）
将当前不支持的 chunksize 参数传递给 read_excel 或 ExcelFile.parse 现在会引发 NotImplementedError（GH 8011）
允许将 ExcelFile 对象传递给 read_excel（GH 11198）
如果 self 和输入的 freq 都为 None，则 DatetimeIndex.union 不会推断 freq（GH 11086）

NaT 的方法现在要么引发 ValueError，要么返回 np.nan 或 NaT（GH 9513）

行为	方法
返回 `np.nan`	`weekday`、`isoweekday`
返回 `NaT`	`date`、`now`、`replace`、`to_datetime`、`today`
返回 `np.datetime64('NaT')`	`to_datetime64`（不变）
引发 `ValueError`	所有其他公共方法（名称不以下划线开头）

弃用

对于 Series，以下索引函数已被弃用（GH 10177）。

废弃的函数替代

.irow(i) .iloc[i] 或 .iat[i]

.iget(i) .iloc[i] 或 .iat[i]

.iget_value(i) .iloc[i] 或 .iat[i]
对于 DataFrame，以下索引函数已被弃用（GH 10177）。

废弃的函数替代

.irow(i) .iloc[i]

.iget_value(i, j) .iloc[i, j] 或 .iat[i, j]

.icol(j) .iloc[:, j]

废弃的函数	替代
`.irow(i)`	`.iloc[i]` 或 `.iat[i]`
`.iget(i)`	`.iloc[i]` 或 `.iat[i]`
`.iget_value(i)`	`.iloc[i]` 或 `.iat[i]`

废弃的函数	替代
`.irow(i)`	`.iloc[i]`
`.iget_value(i, j)`	`.iloc[i, j]` 或 `.iat[i, j]`
`.icol(j)`	`.iloc[:, j]`

注意

这些索引函数自 0.11.0 版本以来已在文档中弃用。

Categorical.name 被弃用以使 Categorical 更像 numpy.ndarray。使用 Series(cat, name="whatever") 替代（GH 10482）。
在 Categorical 的 categories 中设置缺失值（NaN）将发出警告（GH 10748）。您仍然可以在 values 中有缺失值。
drop_duplicates 和 duplicated 的 take_last 关键字已弃用，改用 keep（GH 6511，GH 8505）
Series.nsmallest 和 nlargest 中的 take_last 关键字已废弃，推荐使用 keep。(GH 10792)
DataFrame.combineAdd 和 DataFrame.combineMult 已废弃。它们可以很容易地被 add 和 mul 方法替代：DataFrame.add(other, fill_value=0) 和 DataFrame.mul(other, fill_value=1.) (GH 10735).
TimeSeries 已废弃，推荐使用 Series（注意，自 0.13.0 起这是一个别名）(GH 10890)
SparsePanel 已废弃，并将在将来的版本中移除 (GH 11157)
Series.is_time_series 已废弃，推荐使用 Series.index.is_all_dates (GH 11135)
弃用传统偏移（如 'A@JAN'）（注意，自 0.8.0 起这是一个别名）(GH 10878)
WidePanel 已废弃，推荐使用 Panel，LongPanel 废弃，推荐使用 DataFrame（注意，这些自 < 0.11.0 起是别名）, (GH 10892)
DataFrame.convert_objects 已废弃，推荐使用类型特定的函数 pd.to_datetime、pd.to_timestamp 和 pd.to_numeric（0.17.0 新增）(GH 11133).

移除先前版本的弃用/更改

移除 Series.order() 和 Series.sort() 中的 na_last 参数，推荐使用 na_position (GH 5231)
从 .describe() 中移除 percentile_width，推荐使用 percentiles。(GH 7088)
移除 DataFrame.to_string() 中的 colSpace 参数，推荐使用 col_space，大约在 0.8.0 版本时期。

移除自动时间序列广播 (GH 2304)

In [88]: np.random.seed(1234)

In [89]: df = pd.DataFrame(
 ....:    np.random.randn(5, 2),
 ....:    columns=list("AB"),
 ....:    index=pd.date_range("2013-01-01", periods=5),
 ....: )
 ....: 

In [90]: df
Out[90]: 
 A         B
2013-01-01  0.471435 -1.190976
2013-01-02  1.432707 -0.312652
2013-01-03 -0.720589  0.887163
2013-01-04  0.859588 -0.636524
2013-01-05  0.015696 -2.242685

[5 rows x 2 columns]

以前

In [3]: df + df.A
FutureWarning: TimeSeries broadcasting along DataFrame index by default is deprecated.
Please use DataFrame.<op> to explicitly broadcast arithmetic operations along the index

Out[3]:
 A         B
2013-01-01  0.942870 -0.719541
2013-01-02  2.865414  1.120055
2013-01-03 -1.441177  0.166574
2013-01-04  1.719177  0.223065
2013-01-05  0.031393 -2.226989

当前

In [91]: df.add(df.A, axis="index")
Out[91]: 
 A         B
2013-01-01  0.942870 -0.719541
2013-01-02  2.865414  1.120055
2013-01-03 -1.441177  0.166574
2013-01-04  1.719177  0.223065
2013-01-05  0.031393 -2.226989

[5 rows x 2 columns]

移除 HDFStore.put/append 中的 table 关键字，推荐使用 format= (GH 4645)
移除 read_excel/ExcelFile 中的 kind 参数，因为它没有被使用 (GH 4712)
移除 pd.read_html 中的 infer_type 关键字，因为它没有被使用 (GH 4770, GH 7032)
移除 Series.tshift/shift 中的 offset 和 timeRule 关键字，推荐使用 freq (GH 4853, GH 4864)
移除 pd.load/pd.save 的别名，推荐使用 pd.to_pickle/pd.read_pickle (GH 3787)

性能改进

用于与 Air Speed Velocity library 进行基准测试的开发支持。(GH 8361)
为备选的 ExcelWriter 引擎和读取 Excel 文件添加了 vbench 基准测试。(GH 7171)
Categorical.value_counts 的性能提升。(GH 10804)
SeriesGroupBy.nunique、SeriesGroupBy.value_counts 和 SeriesGroupby.transform 的性能提升。(GH 10820, GH 11077)
在整数数据类型的 DataFrame.drop_duplicates 中的性能改进。(GH 10917)
DataFrame.duplicated 在宽表格中的性能提升。(GH 10161, GH 11180)
时间差字符串解析的 4 倍改进。(GH 6755, GH 10426)
在 timedelta64 和 datetime64 操作中有 8 倍的改进。(GH 6755)
使用切片器索引 MultiIndex 的性能显著提升。(GH 10287)
使用类似列表的输入时，iloc 的性能提升 8 倍。(GH 10791)
对于日期时间样式/整数 Series，改进了 Series.isin 的性能。(GH 10287)
在分类数据相同时，concat 中的性能提升 20 倍。(GH 10587)
当指定的格式字符串是 ISO8601 时，to_datetime 的性能改进。(GH 10178)
对于浮点数据类型，Series.value_counts 的改进 2 倍。(GH 10821)
当日期组件没有 0 填充时，在 to_datetime 中启用了 infer_datetime_format。(GH 11142)
从 0.16.1 开始构建嵌套字典的 DataFrame 中的回归。(GH 11084)
使用 Series 或 DatetimeIndex 的 DateOffset 的加法/减法操作的性能改进。(GH 10744, GH 11205)

Bug 修复

由于溢出，.mean() 在 timedelta64[ns] 上的计算错误。(GH 9442)
在旧版 numpy 上的 .isin 中的错误。(GH 11232)
DataFrame.to_html(index=False) 中呈现不必要的 name 行的错误（GH 10344）
无法传递 column_format 参数给 DataFrame.to_latex() 的错误（GH 9402）
使用 NaT 进行本地化时 DatetimeIndex 中的错误（GH 10477）
Series.dt 操作中保留元数据的错误（GH 10477）
在传递 NaT 时保留 NaT 的错误，否则 to_datetime 构造无效。 (GH 10477)
当函数返回分类系列时 DataFrame.apply 中的错误。 (GH 9573)
to_datetime 中存在的日期和格式无效的错误（GH 10154）
Index.drop_duplicates 中删除名称的错误（GH 10115）
Series.quantile 中丢失名称的错误（GH 10881）
在空 Series 上设置值时 pd.Series 中的错误，其索引具有频率。 (GH 10193)
使用无效的 order 关键字值时 pd.Series.interpolate 中的错误。 (GH 10633)
当颜色名称由多个字符指定时，DataFrame.plot 中引发 ValueError 的错误（GH 10387）
使用混合元组列表构建 Index 的错误（GH 10697）
当索引包含 NaT 时 DataFrame.reset_index 中的错误。 (GH 10388)
当工作表为空时 ExcelReader 中的错误（GH 6403）
BinGrouper.group_info 中返回的值与基类不兼容的错误（GH 10914）
在 DataFrame.pop 上清除缓存以及后续的原地操作时的错误（GH 10912）
使用混合整数 Index 进行索引导致 ImportError 的错误（GH 10610）
当索引包含空值时 Series.count 中的错误（GH 10946）
非常规频率 DatetimeIndex 的 pickling 错误（GH 11002）
当帧具有对称形状时，DataFrame.where 导致不尊重 axis 参数的错误。 (GH 9736)
Table.select_column 中未保留名称的错误（GH 10392）
在 offsets.generate_range 中 start 和 end 的精度比 offset 更高的错误（GH 9907）
在 pd.rolling_* 中，输出中将丢失 Series.name 的错误（GH 10565）
当索引或列不唯一时，在 stack 中的错误（GH 10417）
在轴具有 MultiIndex 时设置 Panel 时的错误（GH 10360）
USFederalHolidayCalendar 中 USMemorialDay 和 USMartinLutherKingJr 不正确的错误（GH 10278 和 GH 9760）
在 .sample() 中如果设置了返回对象，则会给出不必要的 SettingWithCopyWarning 的错误（GH 10738）
在 .sample() 中，如果权重作为 Series 传递，那么在被处理前未对齐权重的索引，可能会导致与被抽样对象不对齐的问题。（GH 10738）
在 (GH 9311, GH 6620, GH 9345) 中修复的回归问题，当 groupby 结合某些聚合器将日期时间型转换为浮点型时的错误（GH 10979）
DataFrame.interpolate 中 axis=1 和 inplace=True 时的错误（GH 10395）
在指定多个列作为主键时，io.sql.get_schema 中的错误（GH 10385）
使用日期时间型 Categorical 的 groupby(sort=False) 引发 ValueError 的错误（GH 10505）
groupby(axis=1) 与 filter() 结合使用会抛出 IndexError 的错误（GH 11041）
在大端编译版本上 test_categorical 中的错误（GH 10425）
Series.shift 和 DataFrame.shift 中不支持分类数据的错误（GH 9416）
使用分类 Series 的 Series.map 引发 AttributeError 的错误（GH 10324）
使用包括 Categorical 的 MultiIndex.get_level_values 引发 AttributeError 的错误（GH 10460）
使用 sparse=True 时 pd.get_dummies 未返回 SparseDataFrame 的错误（GH 10531）
Index 子类型（如 PeriodIndex）在 .drop 和 .insert 方法中未返回自己的类型的错误（GH 10620）
algos.outer_join_indexer 中的一个 bug，当 right 数组为空时（GH 10618）
在对多个键进行分组时，filter（从 0.16.0 中退化）和 transform 中的一个 bug，其中一个键类似于 datetime（GH 10114）
在 to_datetime 和 to_timedelta 中导致 Index 名称丢失的 bug（GH 10875）
len(DataFrame.groupby) 中的一个 bug，当存在只包含 NaN 的列时引发 IndexError（GH 11016）
在对空 Series 进行重新采样时引发段错误的 bug（GH 10228）
DatetimeIndex 和 PeriodIndex.value_counts 中的一个 bug，重置结果的名称，但在结果的 Index 中保留（GH 10150）
使用 numexpr 引擎的 pd.eval 中的一个 bug，将 1 个元素的 numpy 数组强制转换为标量（GH 10546）
使用 axis=0 时，列的 dtype 为 category 的 pd.concat 中的一个 bug（GH 10177）
read_msgpack 中的一个 bug，未始终检查输入类型（GH 10369，GH 10630）
pd.read_csv 中的一个 bug，使用 kwargs index_col=False、index_col=['a', 'b'] 或 dtype 时引发错误（GH 10413、GH 10467、GH 10577）
Series.from_csv 中使用 header kwarg 时未设置 Series.name 或 Series.index.name 的 bug（GH 10483）
groupby.var 中的一个 bug，导致小浮点值的方差不准确（GH 10448）
Series.plot(kind='hist') 中的一个 bug，Y 轴标签不具备信息性（GH 10485）
使用生成 uint8 类型的转换器时的 read_csv 中的一个 bug（GH 9266）
在时间序列线条和区域图中引发内存泄漏的 bug（GH 9003）
设置 Panel 沿主轴或次要轴切片时出现 bug，右侧为 DataFrame 时（GH 11014）
当 Panel 的操作函数（例如 .add）未实现时，导致返回 None 并且不引发 NotImplementedError 的 bug（GH 7692）
在 subplots=True 时，线条和 kde 绘图不能接受多个颜色的 bug（GH 9894）
DataFrame.plot 中的一个 bug，当颜色名称由多个字符指定时引发 ValueError（GH 10387）
具有MultiIndex的Series左右align中的错误可能被颠倒（GH 10665）
具有MultiIndex的左右join中的错误可能被颠倒（GH 10741）
在读取具有在columns中设置不同顺序的文件时的read_stata中的错误（GH 10757）
当类别包含tz或Period时，Categorical中的错误可能无法正确表示（GH 10713）
Categorical.__iter__中的错误可能不会返回正确的datetime和Period（GH 10713）
在具有PeriodIndex的对象上进行索引时出现错误，具有PeriodIndex（GH 4125）
使用engine='c'的read_csv中的错误：EOF 之前的注释、空行等未正确处理（GH 10728，GH 10548）
通过DataReader读取“famafrench”数据导致 HTTP 404 错误，因为网站 URL 已更改（GH 10591）。
在read_msgpack中，要解码的 DataFrame 具有重复列名（GH 9618)
io.common.get_filepath_or_buffer中的错误，导致读取有效的 S3 文件失败，如果存储桶还包含用户没有读取权限的键（GH 10604）
使用 python datetime.date和 numpy datetime64对时间戳列进行矢量化设置中的错误（GH 10408，GH 10412）
Index.take中的错误可能会添加不必要的freq属性（GH 10791）
具有空DataFrame的merge中可能会引发IndexError的错误（GH 10824）
在to_latex中，某些已记录参数的意外关键字参数错误（GH 10888）
大型DataFrame索引中的错误，未捕获IndexError（GH 10645和GH 10692）
在read_csv中使用nrows或chunksize参数时，如果文件只包含标题行，则会出现错误（GH 9535）
在存在替代编码时，category类型在 HDF5 中的序列化中存在错误。(GH 10366)
在使用字符串 dtype 构建空 DataFrame 时的pd.DataFrame中的错误（GH 9428）
在 DataFrame 未合并时，pd.DataFrame.diff 存在错误（GH 10907）
对于具有 datetime64 或 timedelta64 dtype 的数组的 pd.unique 存在错误，这意味着返回的是对象 dtype 而不是原始 dtype 的数组（GH 9431）
在从 0s 切片时引发错误的 Timedelta 中存在错误（GH 10583）
在对无效索引进行 DatetimeIndex.take 和 TimedeltaIndex.take 时可能不会引发 IndexError 的错误（GH 10295）
Series([np.nan]).astype('M8[ms]') 中存在错误，现在返回 Series([pd.NaT])（GH 10747）
在 PeriodIndex.order 重置频率时存在错误（GH 10295）
在 date_range 中，当 freq 除以 end 时，导致 nanos 的错误（GH 10885）
允许使用负整数访问 Series 外界限内存的 iloc 中存在错误（GH 10779）
在 read_msgpack 中，不尊重编码的错误（GH 10581）
当使用包含适当负整数的列表进行 iloc 访问时，阻止访问第一个索引的错误（GH 10547, GH 10779）
在 TimedeltaIndex 格式化程序中存在错误，尝试使用 to_csv 保存带有 TimedeltaIndex 的 DataFrame 时会出错（GH 10833）
在处理 Series 切片时，DataFrame.where 存在错误（GH 10218, GH 9558）
当 pd.read_gbq 返回零行时抛出 ValueError 的错误（GH 10273）
在序列化 0-rank ndarray 时导致段错误的 to_json 中存在错误（GH 9576）
在绘图函数中，在 GridSpec 上绘制时可能引发 IndexError 的错误（GH 10819）
绘图结果中可能显示不必要的次要刻度标签的错误（GH 10657）
在具有 NaT 的 DataFrame 上进行聚合时 groupby 计算错误（例如 first、last、min）（GH 10590, GH 11010）
在构造 DataFrame 时，传递只包含标量值的字典并指定列未引发错误的错误（GH 10856）
在高度相似值的情况下，.var() 导致舍入误差的错误（GH 10242）
使用DataFrame.plot(subplots=True)时，重复列会输出不正确的结果 (GH 10962)
Index 算术中的错误可能导致类别不正确 (GH 10638)
date_range 中的错误导致如果频率为负数，则结果为空（年度、季度和月度） (GH 11018)
DatetimeIndex 中的错误无法推断负频率 (GH 11018)
移除一些已弃用的 numpy 比较操作，主要是在测试中。 (GH 10569)
Index dtype 中的错误可能未正确应用 (GH 11017)
在测试最低 google api 客户端版本时，io.gbq 中的错误 (GH 10652)
从嵌套的dict中使用timedelta键构建DataFrame时的错误 (GH 11129)
.fillna 中的错误可能在数据包含日期时间 dtype 时引发TypeError (GH 7095, GH 11153)
.groupby 中的错误，当分组的键数与索引长度相同时 (GH 11185)
convert_objects 中的错误，如果全部为 null 并且coerce，则可能不返回转换后的值 (GH 9589)
convert_objects 中的错误，copy 关键字未被尊重 (GH 9589)

贡献者

总共有 112 人为这个版本贡献了补丁。名字后面带有“+”符号的人第一次贡献了补丁。

Alex Rothberg
Andrea Bedini +
Andrew Rosenfeld
Andy Hayden
Andy Li +
Anthonios Partheniou +
Artemy Kolchinsky
Bernard Willers
Charlie Clark +
Chris +
Chris Whelan
Christoph Gohlke +
Christopher Whelan
Clark Fitzgerald
Clearfield Christopher +
Dan Ringwalt +
Daniel Ni +
数据和代码专家在数据上尝试代码实验 +
David Cottrell
David John Gagne +
David Kelly +
ETF +
Eduardo Schettino +
Egor +
Egor Panfilov +
Evan Wright
Frank Pinter +
Gabriel Araujo +
Garrett-R
Gianluca Rossi +
Guillaume Gay
Guillaume Poulin
Harsh Nisar +
Ian Henriksen +
Ian Hoegen +
Jaidev Deshpande +
Jan Rudolph +
Jan Schulz
Jason Swails +
Jeff Reback
Jonas Buyl +
Joris Van den Bossche
Joris Vankerschaver +
Josh Levy-Kramer +
Julien Danjou
Ka Wo Chen
Karrie Kehoe +
Kelsey Jordahl
Kerby Shedden
Kevin Sheppard
Lars Buitinck
Leif Johnson +
Luis Ortiz +
Mac +
Matt Gambogi +
Matt Savoie +
Matthew Gilbert +
Maximilian Roos +
Michelangelo D’Agostino +
Mortada Mehyar
Nick Eubank
Nipun Batra
Ondřej Čertík
Phillip Cloud
Pratap Vardhan +
Rafal Skolasinski +
Richard Lewis +
Rinoc Johnson +
Rob Levy
Robert Gieseke
Safia Abdalla +
Samuel Denny +
Saumitra Shahapure +
Sebastian Pölsterl +
Sebastian Rubbert +
Sheppard, Kevin +
Sinhrks
Siu Kwan Lam +
Skipper Seabold
Spencer Carrucciu +
Stephan Hoyer
Stephen Hoover +
Stephen Pascoe +
Terry Santegoeds +
Thomas Grainger
Tjerk Santegoeds +
Tom Augspurger
Vincent Davis +
Winterflower +
Yaroslav Halchenko
Yuan Tang (Terry) +
agijsberts
ajcr +
behzad nouri
cel4
chris-b1 +
cyrusmaher +
davidovitch +
ganego +
jreback
juricast +
larvian +
maximilianr +
msund +
rekcahpassyla
robertzk +
scls19fr
seth-p
sinhrks
springcoil +
terrytangyuan +
tzinckgraf +

posted @ 2024-06-26 10:36 绝不原创的飞龙阅读(8) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

龙哥盟

掠夺·扩张·投机·博弈

Pandas-2-2-中文文档-五十六-

Pandas 2.2 中文文档（五十六）

版本 0.17.1（2015 年 11 月 21 日）

新功能

条件 HTML 格式化

Deprecations

新功能

条件 HTML 格式

增强功能

API 变更

弃用项

性能改进

Bug 修复

贡献者

版本 0.17.0 (2015 年 10 月 9 日)

新功能

带时区的日期时间

Series.dt.strftime

Series.dt.total_seconds

Excel 中的更改与MultiIndex

Google BigQuery 增强

排序 API 的更改

错误处理

统一解析

索引比较的更改

布尔比较与 None 的更改

HDFStore dropna 行为

更改parser中传递的header为bool

其他 API 更改

带时区的 Datetime

Series.dt.strftime

Series.dt.total_seconds

对 Excel 的更改与MultiIndex

Google BigQuery 增强

释放 GIL

绘图子方法

dt 访问器的额外方法

Series.dt.strftime

Series.dt.total_seconds

Series.dt.strftime

Series.dt.total_seconds

期间频率增强

对 SAS XPORT 文件的支持

在 .eval() 中支持数学函数

Excel 中的 MultiIndex 的更改

Google BigQuery 增强功能

使用 Unicode 东亚宽度进行显示对齐

其他增强

不兼容的 API 更改

排序 API 的更改

错误处理

一致的解析

索引比较的更改

布尔比较与 None 的更改

HDFStore dropna 行为

解析器中传递的header为bool的更改

其他 API 更改

to_datetime和to_timedelta的更改

错误处理

一致的解析

错误处理

一致的解析

索引比较的更改

对布尔比较与None的更改

HDFStore dropna 行为

对 display.precision 选项的更改

对 Categorical.unique 的更改

更改传递给解析器中的 header 的 bool

其他 API 更改

弃用

移除先前版本的弃用/更改

性能改进

Bug 修复

贡献者

公告

Excel 中的更改与`MultiIndex`

更改`parser`中传递的`header`为`bool`

`Series.dt.strftime`

`Series.dt.total_seconds`

对 Excel 的更改与`MultiIndex`

`dt` 访问器的额外方法

Excel 中的 `MultiIndex` 的更改

解析器中传递的`header`为`bool`的更改

`to_datetime`和`to_timedelta`的更改

对布尔比较与`None`的更改

对 `display.precision` 选项的更改

对 `Categorical.unique` 的更改

更改传递给解析器中的 `header` 的 `bool`