sklearn包

sklearn官方学习资料
https://scikit-learn.org/stable/user_guide.html
1 Supervised learning监督学习
1.1 线性模型
1.2 线性模型和二次判别分析
1.3 核岭回归
1.4 SVM
1.5 随机梯度下降
1.6 最近邻
1.7 高斯过程
1.8 交叉分解cross decomposition
1.9 朴素贝叶斯
1.10 决策树
1.11 集成算法
1.12 多类别算法
1.13 特征选择
1.14 半监督
1.15 保序回归
1.16 probability calibration
1.17 神经网络

2 非监督学习
2.1 高斯混合模型
2.2 流型学习
2.3 聚类
2.4 双聚类
2.5 矩阵分解
2.6 协方差估计
2.7 异常点、离群点检测
2.8 密度估计
2.9 神经网络

3 模型选择和评估
3.1 交叉验证
3.2 调参
3.3 指标和评分
3.4 模型的持续性
3.5 验证曲线

4 检查inspection
4.1 依赖曲线
4.2 排序(置换)特征重要性

5 可视化

6 数据转化
6.1 管道
6.2 特征抽取
6.3 预处理数据
6.4 缺失值插补
6.5 非监督降维
6.6 随机投影
6.7 核近似
6.8 pairwise metrics,affinities and kernels
6.9 转化预测目标

7 数据集

6.3 preprocessing data数据预处理
https://scikit-learn.org/stable/modules/preprocessing.html#standardization-or-mean-removal-and-variance-scaling
归一化、正则化、标准化的区别
https://blog.csdn.net/tianguiyuyu/article/details/80694669
6.3.1 Standardization, or mean removal and variance scaling标准化(均值为0,方差为1)
preprocessing.scale
preprocessing.StandardScaler 在训练样本上使用后,可以同时应用到测试样本
6.3.1.1. Scaling features to a range
preprocessing.MinMaxScaler 把数据标准化到指定的最大值最小值之间
preprocessing.MaxAbsScaler 把数据标准化到指定的最大的绝对值之间
6.3.1.2. Scaling sparse data
preprocessing.MaxAbsScaler(要用transform API)
preprocessing.maxabs_scale
6.3.1.3. Scaling data with outliers
robust_scale
RobustScaler(要用transform API)
6.3.1.4. Centering kernel matrices
KernalCenterer
6.3.2. Non-linear transformation 非线性转化
6.3.2.1. Mapping to a Uniform distribution
QuantileTransformer
quantile_transform
6.3.2.2. Mapping to a Gaussian distribution
PowerTransformer
6.3.3. Normalization 归一化
Normalization is the process of scaling individual samples to have unit norm.
normalize
Normalizer(要用transform API)
6.3.4. Encoding categorical features
OrdinalEncoder(顺序编码)
OneHotEncoder
6.3.5. Discretization离散化
For instance, pre-processing with a discretizer can introduce nonlinearity to linear models.
6.3.5.1. K-bins discretization
The ‘uniform’ strategy uses constant-width bins. The ‘quantile’ strategy uses the quantiles values to have equally populated bins in each feature. The ‘kmeans’ strategy defines bins based on a k-means clustering procedure performed on each feature independently.
6.3.5.2. Feature binarization(二值化)
preprocessing.Binarizer(threshold=1.1)
6.3.6. Imputation of missing values
6.3.7. Generating polynomial features
from sklearn.preprocessing import PolynomialFeatures
PolynomialFeatures(degree=3, interaction_only=True)
6.3.8. Custom transformers(定制化转化)
convert an existing Python function into a transformer to assist in data cleaning or processing

posted on 2019-12-16 17:02  静静的白桦林_andy  阅读(226)  评论(0编辑  收藏  举报

导航