Transforming the prediction target of sklearn
concept
https://scikit-learn.org/stable/modules/preprocessing_targets.html#preprocessing-targets
对于监督性学习,其目标值需要进行转化,才能作为模型的目标,或者更加有效地适应模型。
These are transformers that are not intended to be used on features, only on supervised learning targets.
See also Transforming target in regression if you want to transform the prediction target for learning, but evaluate the model in the original (untransformed) space.
模型自适应
https://scikit-learn.org/stable/tutorial/basic/tutorial.html#model-persistence
有的模型,其目标支持,原始类型(字符串,或者数值类型)。如下所示。
对于这种模型,转换并不是必要的,但是对目标的转换时一种更加通用的做法。
>>> from sklearn import datasets >>> from sklearn.svm import SVC >>> iris = datasets.load_iris() >>> clf = SVC() >>> clf.fit(iris.data, iris.target) SVC() >>> list(clf.predict(iris.data[:3])) [0, 0, 0] >>> clf.fit(iris.data, iris.target_names[iris.target]) SVC() >>> list(clf.predict(iris.data[:3])) ['setosa', 'setosa', 'setosa']
LabelBinarizer
https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelBinarizer.html#sklearn.preprocessing.LabelBinarizer
一些回归和二值分类算法,需要使用此工具,将目标转换,进而支持multiclass分类。
Binarize labels in a one-vs-all fashion.
Several regression and binary classification algorithms are available in scikit-learn. A simple way to extend these algorithms to the multi-class classification case is to use the so-called one-vs-all scheme.
At learning time, this simply consists in learning one regressor or binary classifier per class. In doing so, one needs to convert multi-class labels to binary labels (belong or does not belong to the class). LabelBinarizer makes this process easy with the transform method.
At prediction time, one assigns the class for which the corresponding model gave the greatest confidence. LabelBinarizer makes this easy with the inverse_transform method.
code:
from sklearn import preprocessing import numpy as np lb = preprocessing.LabelBinarizer() lb.fit([1, 2, 6, 4, 2]) print(lb.classes_) print(lb.transform([1, 6])) transformed_label = np.array([[1, 0, 0, 0],[0, 0, 0, 1]]) print(lb.inverse_transform(transformed_label))
output
[1 2 4 6] [[1 0 0 0] [0 0 0 1]] [1 6]
MultiLabelBinarizer
https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MultiLabelBinarizer.html#sklearn.preprocessing.MultiLabelBinarizer
将多标记的目标转换为 二值型目标。
Transform between iterable of iterables and a multilabel format.
Although a list of sets or tuples is a very intuitive format for multilabel data, it is unwieldy to process. This transformer converts between this intuitive format and the supported multilabel format: a (samples x classes) binary matrix indicating the presence of a class label.
In multilabel learning, the joint set of binary classification tasks is expressed with a label binary indicator array: each sample is one row of a 2d array of shape (n_samples, n_classes) with binary values where the one, i.e. the non zero elements, corresponds to the subset of labels for that sample. An array such as
np.array([[1, 0, 0], [0, 1, 1], [0, 0, 0]])
represents label 0 in the first sample, labels 1 and 2 in the second sample, and no labels in the third sample.Producing multilabel data as a list of sets of labels may be more intuitive. The
MultiLabelBinarizer
transformer can be used to convert between a collection of collections of labels and the indicator format:
>>> from sklearn.preprocessing import MultiLabelBinarizer >>> y = [[2, 3, 4], [2], [0, 1, 3], [0, 1, 2, 3, 4], [0, 1, 2]] >>> MultiLabelBinarizer().fit_transform(y) array([[0, 0, 1, 1, 1], [0, 0, 1, 0, 0], [1, 1, 0, 1, 0], [1, 1, 1, 1, 1], [1, 1, 1, 0, 0]])
LabelEncoder
https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html#sklearn.preprocessing.LabelEncoder
将标签(字符 或者 数字), 转化为紧致型的 multiclass目标。
Encode target labels with value between 0 and n_classes-1.
This transformer should be used to encode target values, i.e.
y
, and not the inputX
.Read more in the User Guide.
>>> from sklearn import preprocessing >>> le = preprocessing.LabelEncoder() >>> le.fit([1, 2, 2, 6]) LabelEncoder() >>> le.classes_ array([1, 2, 6]) >>> le.transform([1, 1, 2, 6]) array([0, 0, 1, 2]) >>> le.inverse_transform([0, 0, 1, 2]) array([1, 1, 2, 6])
>>> le = preprocessing.LabelEncoder() >>> le.fit(["paris", "paris", "tokyo", "amsterdam"]) LabelEncoder() >>> list(le.classes_) ['amsterdam', 'paris', 'tokyo'] >>> le.transform(["tokyo", "tokyo", "paris"]) array([2, 2, 1]) >>> list(le.inverse_transform([2, 2, 1])) ['tokyo', 'tokyo', 'paris']
· 理解Rust引用及其生命周期标识(上)
· 浏览器原生「磁吸」效果!Anchor Positioning 锚点定位神器解析
· 没有源码,如何修改代码逻辑?
· 一个奇形怪状的面试题:Bean中的CHM要不要加volatile?
· [.NET]调用本地 Deepseek 模型
· 全网最简单!3分钟用满血DeepSeek R1开发一款AI智能客服,零代码轻松接入微信、公众号、小程
· .NET 10 首个预览版发布,跨平台开发与性能全面提升
· 《HelloGitHub》第 107 期
· 全程使用 AI 从 0 到 1 写了个小工具
· 从文本到图像:SSE 如何助力 AI 内容实时呈现?(Typescript篇)
2016-12-28 JQuery DOM clone(true),对于克隆对象事件触发后,处理函数中this指代克隆对象
2016-12-28 XML和JSON数据格式对比