二值化与分段

二值化与分段

sklearn.preprocessing.Binarizer#

from sklearn.preprocessing import Binarizer
import pandas as pd
data = pd.read_csv("./data_full", index_col=0)
data
Age Survived Sex_female Sex_male Embarked_C Embarked_Q Embarked_S
0 22.000000 0.0 0.0 1.0 0.0 0.0 1.0
1 38.000000 2.0 1.0 0.0 1.0 0.0 0.0
2 26.000000 2.0 1.0 0.0 0.0 0.0 1.0
3 35.000000 2.0 1.0 0.0 0.0 0.0 1.0
4 35.000000 0.0 0.0 1.0 0.0 0.0 1.0
... ... ... ... ... ... ... ...
884 25.000000 0.0 0.0 1.0 0.0 0.0 1.0
885 39.000000 0.0 1.0 0.0 0.0 0.0 1.0
886 27.000000 0.0 1.0 0.0 0.0 0.0 1.0
887 19.000000 2.0 0.0 1.0 1.0 0.0 0.0
888 29.699118 0.0 0.0 1.0 0.0 1.0 0.0

887 rows × 7 columns

data1 = data.copy()
age = data1.iloc[:,0].values.reshape(-1,1)
result = Binarizer(threshold=30).fit_transform(age)
data1.iloc[:,0] = result
data1
Age Survived Sex_female Sex_male Embarked_C Embarked_Q Embarked_S
0 0.0 0.0 0.0 1.0 0.0 0.0 1.0
1 1.0 2.0 1.0 0.0 1.0 0.0 0.0
2 0.0 2.0 1.0 0.0 0.0 0.0 1.0
3 1.0 2.0 1.0 0.0 0.0 0.0 1.0
4 1.0 0.0 0.0 1.0 0.0 0.0 1.0
... ... ... ... ... ... ... ...
884 0.0 0.0 0.0 1.0 0.0 0.0 1.0
885 1.0 0.0 1.0 0.0 0.0 0.0 1.0
886 0.0 0.0 1.0 0.0 0.0 0.0 1.0
887 0.0 2.0 0.0 1.0 1.0 0.0 0.0
888 0.0 0.0 0.0 1.0 0.0 1.0 0.0

887 rows × 7 columns

preprocessing.KBinsDiscretizer#

from sklearn.preprocessing import KBinsDiscretizer
data2 = data.copy()
age = data2.iloc[:,0].values.reshape(-1,1)
result2 = KBinsDiscretizer(n_bins=5,encode="ordinal",strategy="uniform").fit_transform(age)
set(result2.ravel())
{0.0, 1.0, 2.0, 3.0, 4.0}
result3 = KBinsDiscretizer(n_bins=5, encode="onehot",strategy="quantile").fit_transform(age)
result3.toarray()
array([[0., 1., 0., 0., 0.],
       [0., 0., 0., 0., 1.],
       [0., 1., 0., 0., 0.],
       ...,
       [0., 1., 0., 0., 0.],
       [1., 0., 0., 0., 0.],
       [0., 0., 0., 1., 0.]])

作者:Hovey

出处:https://www.cnblogs.com/thankcat/p/17299251.html

版权:本作品采用「署名-非商业性使用-相同方式共享 4.0 国际」许可协议进行许可。

posted @   ThankCAT  阅读(12)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· DeepSeek 开源周回顾「GitHub 热点速览」
· 物流快递公司核心技术能力-地址解析分单基础技术分享
· .NET 10首个预览版发布:重大改进与新特性概览!
· AI与.NET技术实操系列(二):开始使用ML.NET
· 单线程的Redis速度为什么快?
more_horiz
keyboard_arrow_up dark_mode palette
选择主题
menu
点击右上角即可分享
微信分享提示