Stay Hungry,Stay Foolish!

Transforming the prediction target of sklearn

concept

https://scikit-learn.org/stable/modules/preprocessing_targets.html#preprocessing-targets

对于监督性学习,其目标值需要进行转化,才能作为模型的目标,或者更加有效地适应模型。

These are transformers that are not intended to be used on features, only on supervised learning targets.

See also Transforming target in regression if you want to transform the prediction target for learning, but evaluate the model in the original (untransformed) space.

 

模型自适应

https://scikit-learn.org/stable/tutorial/basic/tutorial.html#model-persistence

有的模型,其目标支持,原始类型(字符串,或者数值类型)。如下所示。

对于这种模型,转换并不是必要的,但是对目标的转换时一种更加通用的做法。

 

复制代码
>>> from sklearn import datasets
>>> from sklearn.svm import SVC
>>> iris = datasets.load_iris()
>>> clf = SVC()
>>> clf.fit(iris.data, iris.target)
SVC()

>>> list(clf.predict(iris.data[:3]))
[0, 0, 0]

>>> clf.fit(iris.data, iris.target_names[iris.target])
SVC()

>>> list(clf.predict(iris.data[:3]))
['setosa', 'setosa', 'setosa']
复制代码

 

 

LabelBinarizer

https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelBinarizer.html#sklearn.preprocessing.LabelBinarizer

     一些回归和二值分类算法,需要使用此工具,将目标转换,进而支持multiclass分类。

Binarize labels in a one-vs-all fashion.

Several regression and binary classification algorithms are available in scikit-learn. A simple way to extend these algorithms to the multi-class classification case is to use the so-called one-vs-all scheme.

At learning time, this simply consists in learning one regressor or binary classifier per class. In doing so, one needs to convert multi-class labels to binary labels (belong or does not belong to the class). LabelBinarizer makes this process easy with the transform method.

At prediction time, one assigns the class for which the corresponding model gave the greatest confidence. LabelBinarizer makes this easy with the inverse_transform method.

 

code:

复制代码
from sklearn import preprocessing
import numpy as np

lb = preprocessing.LabelBinarizer()
lb.fit([1, 2, 6, 4, 2])

print(lb.classes_)

print(lb.transform([1, 6]))

transformed_label =  np.array([[1, 0, 0, 0],[0, 0, 0, 1]]) 

print(lb.inverse_transform(transformed_label))
复制代码

output

[1 2 4 6]
[[1 0 0 0]
 [0 0 0 1]]
[1 6]

 

MultiLabelBinarizer

https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MultiLabelBinarizer.html#sklearn.preprocessing.MultiLabelBinarizer

      将多标记的目标转换为 二值型目标。

Transform between iterable of iterables and a multilabel format.

Although a list of sets or tuples is a very intuitive format for multilabel data, it is unwieldy to process. This transformer converts between this intuitive format and the supported multilabel format: a (samples x classes) binary matrix indicating the presence of a class label.

 

In multilabel learning, the joint set of binary classification tasks is expressed with a label binary indicator array: each sample is one row of a 2d array of shape (n_samples, n_classes) with binary values where the one, i.e. the non zero elements, corresponds to the subset of labels for that sample. An array such as np.array([[1, 0, 0], [0, 1, 1], [0, 0, 0]]) represents label 0 in the first sample, labels 1 and 2 in the second sample, and no labels in the third sample.

Producing multilabel data as a list of sets of labels may be more intuitive. The MultiLabelBinarizer transformer can be used to convert between a collection of collections of labels and the indicator format:

 

>>> from sklearn.preprocessing import MultiLabelBinarizer
>>> y = [[2, 3, 4], [2], [0, 1, 3], [0, 1, 2, 3, 4], [0, 1, 2]]
>>> MultiLabelBinarizer().fit_transform(y)
array([[0, 0, 1, 1, 1],
       [0, 0, 1, 0, 0],
       [1, 1, 0, 1, 0],
       [1, 1, 1, 1, 1],
       [1, 1, 1, 0, 0]])

 

LabelEncoder

https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html#sklearn.preprocessing.LabelEncoder

      将标签(字符 或者 数字), 转化为紧致型的 multiclass目标。

 

Encode target labels with value between 0 and n_classes-1.

This transformer should be used to encode target values, i.e. y, and not the input X.

Read more in the User Guide.

 

复制代码
>>> from sklearn import preprocessing
>>> le = preprocessing.LabelEncoder()
>>> le.fit([1, 2, 2, 6])
LabelEncoder()
>>> le.classes_
array([1, 2, 6])
>>> le.transform([1, 1, 2, 6])
array([0, 0, 1, 2])
>>> le.inverse_transform([0, 0, 1, 2])
array([1, 1, 2, 6])
复制代码
复制代码
>>> le = preprocessing.LabelEncoder()
>>> le.fit(["paris", "paris", "tokyo", "amsterdam"])
LabelEncoder()
>>> list(le.classes_)
['amsterdam', 'paris', 'tokyo']
>>> le.transform(["tokyo", "tokyo", "paris"])
array([2, 2, 1])
>>> list(le.inverse_transform([2, 2, 1]))
['tokyo', 'tokyo', 'paris']
复制代码

 

posted @   lightsong  阅读(97)  评论(0编辑  收藏  举报
编辑推荐:
· 理解Rust引用及其生命周期标识(上)
· 浏览器原生「磁吸」效果!Anchor Positioning 锚点定位神器解析
· 没有源码,如何修改代码逻辑?
· 一个奇形怪状的面试题:Bean中的CHM要不要加volatile?
· [.NET]调用本地 Deepseek 模型
阅读排行:
· 全网最简单!3分钟用满血DeepSeek R1开发一款AI智能客服,零代码轻松接入微信、公众号、小程
· .NET 10 首个预览版发布,跨平台开发与性能全面提升
· 《HelloGitHub》第 107 期
· 全程使用 AI 从 0 到 1 写了个小工具
· 从文本到图像:SSE 如何助力 AI 内容实时呈现?(Typescript篇)
历史上的今天:
2016-12-28 JQuery DOM clone(true),对于克隆对象事件触发后,处理函数中this指代克隆对象
2016-12-28 XML和JSON数据格式对比
千山鸟飞绝,万径人踪灭
点击右上角即可分享
微信分享提示