Common Loss Functions and Frequently Used Model Evaluation Metrics
1 Loss Function
1.1 0~1 Loss
$$L(y_{i},f(x_{i}))=\begin{Bmatrix}
1 & y_{i}\neq f(x_{i})\\
0& y_{i}=f(x_{i})
\end{Bmatrix}$$
The 0-1 loss is simple and easy to understand, and is commonly used in classification tasks. If the predicted label matches the labeled data, the loss is 0; otherwise, it is 1. Of course, if the requirement of exact equality is considered too strict, it can be relaxed by taking the absolute difference between the actual value and the predicted value.
$$L(y_{i},f(x_{i}))=\begin{Bmatrix}
1 & \mid y_{i}- f(x_{i})\mid \geqslant t\\
0& \mid y_{i}-f(x_{i})\mid < t
\end{Bmatrix}$$
1.2 Squared Loss Function
$$L(y_{i},f(x_{i}))=(y_{i}-f(x_{i}))^{2}$$
Linear Regression Loss Function
$$L(\omega,x)=\frac{1}{2N}\sum_{i=1}^{N}(y^{i}-\omega^{T}x^{i})^{2}+\frac{\lambda}{2}\left \| \omega \right \|^{2}$$
1.3 Absolute Loss Function
$$L(y_{i},f(x_{i}))=\left | y_{i}-f(x_{i}) \right |$$
1.4 Log loss or cross-entropy loss
$$L(y_{i},f(x_{i}))=-logP(y_{i}|x_{i})$$
1.5 Hinge Loss
$$L(y_{i},f(x_{i}))=max(0,1-y_{i}f(x_{i}))$$
1.6 Exponential Loss Function
$$L(y_{i},f(x_{i}))=exp(-y_{i}f(x_{i}))$$
2 Classification Evaluation Metrics
| Confusion Matrix | Prediction | ||
| positive | negtive | ||
| True Value | positive | TP | FN |
| negtive | FP | TN | |
- P All positive samples
- N All negative samples
2.1 Accuracy
$$Accuracy=\frac{TP+TN}{P+N}$$
2.2 Precision
$$Precision=\frac{TP}{TP+FP}$$
2.3 Recall
$$Recall=\frac{TP}{P}$$
2.4 F1
$$F1=2*\frac{Precision*Recall}{Precision+Recall}$$
2.5 ROC
In ROC space, the FPR (False Positive Rate) is set as the horizontal axis, and the TPR (True Positive Rate) is set as the vertical axis.
$$TPR=\frac{TP}{P}$$
$$FPR=\frac{FP}{N}$$
The Process of Generating an ROC Curve:
- Sort the predicted probability scores of the samples in descending order.
- From highest to lowest, we sequentially use each score as a threshold. Samples with a score greater than or equal to the threshold are predicted as positive, while those with a score less than the threshold are predicted as negative. We then calculate the TPR and FPR at each threshold and plot the ROC curve.
2.6 AUC
Roc area
3 Regression Evaluation Metrics
3.1 Mean Absolute Error (MAE)
-
Formula: MAE = (1/n) * Σ|yᵢ - ŷᵢ|
-
Interpretation: Average absolute difference between predicted and actual values
-
Range: 0 to ∞ (lower is better)
-
Best for: When all errors are equally important
-
Robustness: Less sensitive to outliers than MSE
3.2 Mean Squared Error (MSE)
-
Formula: MSE = (1/n) * Σ(yᵢ - ŷᵢ)²
-
Interpretation: Average squared difference between predicted and actual values
-
Range: 0 to ∞ (lower is better)
-
Best for: When large errors should be penalized more heavily
-
Note: Not in same units as target variable (squared units)
3.3 Root Mean Squared Error (RMSE)
-
Formula: RMSE = √(MSE)
-
Interpretation: Square root of MSE - back to original units
-
Range: 0 to ∞ (lower is better)
-
Best for: Most common metric; interpretable in original units
-
Property: More sensitive to outliers than MAE
3.4 R² Score (Coefficient of Determination)
-
Formula: R² = 1 - (SS_res / SS_tot)
-
SS_res = Σ(yᵢ - ŷᵢ)² (residual sum of squares)
-
SS_tot = Σ(yᵢ - ȳ)² (total sum of squares)
-
-
Interpretation: Proportion of variance explained by model
-
Range: -∞ to 1 (higher is better)
-
1 = Perfect fit
-
0 = Model as good as mean prediction
-
Negative = Worse than mean prediction
-
-
Best for: Comparing models on different datasets
3.5 Mean Absolute Percentage Error (MAPE)
-
Formula: MAPE = (100%/n) * Σ|(yᵢ - ŷᵢ)/yᵢ|
-
Interpretation: Average percentage error
-
Range: 0% to ∞% (lower is better)
-
Best for: When relative errors matter more than absolute
-
Limitation: Undefined when yᵢ = 0; biased for low values
3.6 Adjusted R²
-
Formula: Adjusted R² = 1 - [(1 - R²)(n-1)/(n-p-1)]
-
n = number of samples
-
p = number of features
-
-
Interpretation: R² adjusted for number of predictors
-
Best for: Comparing models with different numbers of features
-
Property: Penalizes adding irrelevant features
3.7 Mean Absolute Scaled Error (MASE)
-
Formula: MASE = MAE / (MAE of naive forecast)
-
Interpretation: Error relative to simple baseline
-
Range: 0 to ∞ (lower is better)
-
<1 = Better than naive forecast
-
1 = Worse than naive forecast
-
-
Best for: Time series; comparing across different scales
4 Cluster Algorithm Metrics
4.1 Internal Metrics (No ground truth labels required)
4.1.1 Silhouette Coefficient
-
Measures: Cohesion (within-cluster) vs Separation (between-cluster)
-
Formula:
-
For each point i: s(i) = (b(i) - a(i)) / max(a(i), b(i))
-
a(i) = average distance to other points in same cluster
-
b(i) = min average distance to points in another cluster
-
-
Overall: Average of s(i) for all points
-
-
Range: -1 to 1
-
+1 = Well-clustered (dense and separated)
-
0 = Overlapping clusters
-
-1 = Misclassified
-
-
Best for: Determining optimal k (number of clusters)
4.1.2 Davies-Bouldin Index
-
Measures: Average similarity between clusters
-
Formula:
-
DB = (1/k) * Σ max(R_ij) for i≠j
-
R_ij = (s_i + s_j) / d(c_i, c_j)
-
s_i = average distance within cluster i
-
d(c_i, c_j) = distance between centroids
-
-
-
Range: 0 to ∞ (lower is better)
-
Best for: Compact, well-separated clusters
4.1.3 Calinski-Harabasz Index (Variance Ratio Criterion)
-
Measures: Ratio of between-cluster to within-cluster dispersion
-
Formula:
-
CH = [SS_B/(k-1)] / [SS_W/(n-k)]
-
SS_B = between-cluster sum of squares
-
SS_W = within-cluster sum of squares
-
-
Range: 0 to ∞ (higher is better)
-
Best for: Finding k when clusters are dense and separated
4.1.4. Dunn Index
-
Measures: Ratio of minimum inter-cluster distance to maximum intra-cluster distance
-
Formula:
-
DI = min(inter-cluster distance) / max(intra-cluster diameter)
-
-
Range: 0 to ∞ (higher is better)
-
Best for: Identifying compact, well-separated clusters
-
Limitation: Sensitive to noise and outliers
4.2 External Metrics (Ground truth labels available)
4.2.1 Adjusted Rand Index (ARI)
-
Measures: Similarity between two clusterings (corrected for chance)
-
Interpretation: Similarity score between predicted and true labels
-
Range: -1 to 1
-
1 = Perfect match
-
0 = Random labeling
-
-1 = Complete disagreement
-
-
Best for: Comparing to ground truth without assumptions
4.2.2 Normalized Mutual Information (NMI)
-
Measures: Mutual information between clusterings, normalized
-
Formula:
-
NMI = 2 * I(X;Y) / [H(X) + H(Y)]
-
I = Mutual information
-
H = Entropy
-
-
Range: 0 to 1
-
1 = Perfect correlation
-
0 = Independent clusterings
-
-
Best for: Information-theoretic comparison
4.2.3 Homogeneity, Completeness, V-Beta Score
-
Homogeneity: Each cluster contains only members of a single class
-
Completeness: All members of a given class are in same cluster
-
V-Measure: Harmonic mean of homogeneity and completeness
-
Range: 0 to 1 (higher is better)
4.2.4 Fowlkes-Mallows Index (FMI)
-
Measures: Geometric mean of precision and recall for pairwise clustering
-
Formula: FMI = TP / √[(TP+FP)(TP+FN)]
-
Range: 0 to 1 (higher is better)
-
Best for: Binary clustering comparison

浙公网安备 33010602011771号