理解 sklearn.preprocessing.MinMaxScaler
公式
非常有用的工具,可以把数据集的不同特征缩放到固定范围。
先从简单的说起,[0,1]缩放,公式
\(X_{scaled} = \frac{x-x_{min}}{x_{max}-x_{min}}\)
MinMaxScaler
可以缩放到任意范围[MIN,MAX],因此更一般化的公式是
\(X_{std} = \frac{x-x_{min}}{x_{max}-x_{min}}\)
\(X_{scaled} = \frac{X_{std}}{MAX-MIN} + MIN\)
当\(MIN\)和\(MAX\)为0和1时,公式等价于[0,1]缩放。
代码
再来看源代码。
def transform(self, X):
"""Scale features of X according to feature_range.
Parameters
----------
X : array-like of shape (n_samples, n_features)
Input data that will be transformed.
Returns
-------
Xt : array-like of shape (n_samples, n_features)
Transformed data.
"""
check_is_fitted(self)
X = check_array(X, copy=self.copy, dtype=FLOAT_DTYPES,
force_all_finite="allow-nan")
X *= self.scale_
X += self.min_
return X
"""
min_ : ndarray of shape (n_features,)
Per feature adjustment for minimum. Equivalent to
``min - X.min(axis=0) * self.scale_``
scale_ : ndarray of shape (n_features,)
Per feature relative scaling of the data. Equivalent to
``(max - min) / (X.max(axis=0) - X.min(axis=0))``
.. versionadded:: 0.17
*scale_* attribute.
data_min_ : ndarray of shape (n_features,)
Per feature minimum seen in the data
.. versionadded:: 0.17
*data_min_*
data_max_ : ndarray of shape (n_features,)
Per feature maximum seen in the data
.. versionadded:: 0.17
*data_max_*
"""
这里的scale_
相当于\(\frac{MAX-MIN}{x_{max}-x_{min}}\),所以min_
相当于\(MIN-x_{min}*\frac{MAX-MIN}{x_{max}-x_{min}}\),这两个参数主要是方便以下逆变换
def inverse_transform(self, X):
"""Undo the scaling of X according to feature_range.
Parameters
----------
X : array-like of shape (n_samples, n_features)
Input data that will be transformed. It cannot be sparse.
Returns
-------
Xt : array-like of shape (n_samples, n_features)
Transformed data.
"""
check_is_fitted(self)
X = check_array(X, copy=self.copy, dtype=FLOAT_DTYPES,
force_all_finite="allow-nan")
X -= self.min_
X /= self.scale_
return X