Common Loss Functions and Frequently Used Model Evaluation Metrics

1 Loss Function

1.1 0~1 Loss

$$L(y_{i},f(x_{i}))=\begin{Bmatrix}
1 & y_{i}\neq f(x_{i})\\
0& y_{i}=f(x_{i})
\end{Bmatrix}$$

The 0-1 loss is simple and easy to understand, and is commonly used in classification tasks. If the predicted label matches the labeled data, the loss is 0; otherwise, it is 1. Of course, if the requirement of exact equality is considered too strict, it can be relaxed by taking the absolute difference between the actual value and the predicted value.

$$L(y_{i},f(x_{i}))=\begin{Bmatrix}
1 & \mid y_{i}- f(x_{i})\mid \geqslant t\\
0& \mid y_{i}-f(x_{i})\mid < t
\end{Bmatrix}$$

1.2 Squared Loss Function

 $$L(y_{i},f(x_{i}))=(y_{i}-f(x_{i}))^{2}$$

Linear Regression Loss Function

$$L(\omega,x)=\frac{1}{2N}\sum_{i=1}^{N}(y^{i}-\omega^{T}x^{i})^{2}+\frac{\lambda}{2}\left \| \omega \right \|^{2}$$

1.3 Absolute Loss Function

$$L(y_{i},f(x_{i}))=\left | y_{i}-f(x_{i}) \right |$$

1.4 Log loss or cross-entropy loss

 $$L(y_{i},f(x_{i}))=-logP(y_{i}|x_{i})$$

1.5 Hinge Loss

$$L(y_{i},f(x_{i}))=max(0,1-y_{i}f(x_{i}))$$

1.6 Exponential Loss Function

$$L(y_{i},f(x_{i}))=exp(-y_{i}f(x_{i}))$$

 

2 Classification Evaluation Metrics

 

Confusion Matrix Prediction
positive negtive
True Value positive TP FN
negtive FP TN

 

  • P All positive samples
  • N All negative samples

 

2.1 Accuracy

 $$Accuracy=\frac{TP+TN}{P+N}$$

2.2 Precision

 $$Precision=\frac{TP}{TP+FP}$$

2.3 Recall

 $$Recall=\frac{TP}{P}$$

2.4 F1

$$F1=2*\frac{Precision*Recall}{Precision+Recall}$$ 

2.5 ROC

 

  In ROC space, the FPR (False Positive Rate) is set as the horizontal axis, and the TPR (True Positive Rate) is set as the vertical axis.

 $$TPR=\frac{TP}{P}$$

 $$FPR=\frac{FP}{N}$$

  The Process of Generating an ROC Curve:

  • Sort the predicted probability scores of the samples in descending order.
  • From highest to lowest, we sequentially use each score as a threshold. Samples with a score greater than or equal to the threshold are predicted as positive, while those with a score less than the threshold are predicted as negative. We then calculate the TPR and FPR at each threshold and plot the ROC curve.

 

2.6 AUC

Roc area

3 Regression Evaluation Metrics

3.1 Mean Absolute Error (MAE)

  • Formula: MAE = (1/n) * Σ|yᵢ - ŷᵢ|

  • Interpretation: Average absolute difference between predicted and actual values

  • Range: 0 to ∞ (lower is better)

  • Best for: When all errors are equally important

  • Robustness: Less sensitive to outliers than MSE

3.2 Mean Squared Error (MSE)

  • Formula: MSE = (1/n) * Σ(yᵢ - ŷᵢ)²

  • Interpretation: Average squared difference between predicted and actual values

  • Range: 0 to ∞ (lower is better)

  • Best for: When large errors should be penalized more heavily

  • Note: Not in same units as target variable (squared units)

3.3 Root Mean Squared Error (RMSE)

  • Formula: RMSE = √(MSE)

  • Interpretation: Square root of MSE - back to original units

  • Range: 0 to ∞ (lower is better)

  • Best for: Most common metric; interpretable in original units

  • Property: More sensitive to outliers than MAE

3.4 R² Score (Coefficient of Determination)

  • Formula: R² = 1 - (SS_res / SS_tot)

    • SS_res = Σ(yᵢ - ŷᵢ)² (residual sum of squares)

    • SS_tot = Σ(yᵢ - ȳ)² (total sum of squares)

  • Interpretation: Proportion of variance explained by model

  • Range: -∞ to 1 (higher is better)

    • 1 = Perfect fit

    • 0 = Model as good as mean prediction

    • Negative = Worse than mean prediction

  • Best for: Comparing models on different datasets

3.5 Mean Absolute Percentage Error (MAPE)

  • Formula: MAPE = (100%/n) * Σ|(yᵢ - ŷᵢ)/yᵢ|

  • Interpretation: Average percentage error

  • Range: 0% to ∞% (lower is better)

  • Best for: When relative errors matter more than absolute

  • Limitation: Undefined when yᵢ = 0; biased for low values

3.6 Adjusted R²

  • Formula: Adjusted R² = 1 - [(1 - R²)(n-1)/(n-p-1)]

    • n = number of samples

    • p = number of features

  • Interpretation: R² adjusted for number of predictors

  • Best for: Comparing models with different numbers of features

  • Property: Penalizes adding irrelevant features

3.7 Mean Absolute Scaled Error (MASE)

  • Formula: MASE = MAE / (MAE of naive forecast)

  • Interpretation: Error relative to simple baseline

  • Range: 0 to ∞ (lower is better)

    • <1 = Better than naive forecast

    • 1 = Worse than naive forecast

  • Best for: Time series; comparing across different scales

 

4 Cluster Algorithm Metrics

4.1 Internal Metrics (No ground truth labels required)

4.1.1 Silhouette Coefficient

  • Measures: Cohesion (within-cluster) vs Separation (between-cluster)

  • Formula:

    • For each point i: s(i) = (b(i) - a(i)) / max(a(i), b(i))

      • a(i) = average distance to other points in same cluster

      • b(i) = min average distance to points in another cluster

    • Overall: Average of s(i) for all points

  • Range: -1 to 1

    • +1 = Well-clustered (dense and separated)

    • 0 = Overlapping clusters

    • -1 = Misclassified

  • Best for: Determining optimal k (number of clusters)

4.1.2 Davies-Bouldin Index

  • Measures: Average similarity between clusters

  • Formula:

    • DB = (1/k) * Σ max(R_ij) for i≠j

    • R_ij = (s_i + s_j) / d(c_i, c_j)

      • s_i = average distance within cluster i

      • d(c_i, c_j) = distance between centroids

  • Range: 0 to ∞ (lower is better)

  • Best for: Compact, well-separated clusters

4.1.3 Calinski-Harabasz Index (Variance Ratio Criterion)

  • Measures: Ratio of between-cluster to within-cluster dispersion

  • Formula:

    • CH = [SS_B/(k-1)] / [SS_W/(n-k)]

    • SS_B = between-cluster sum of squares

    • SS_W = within-cluster sum of squares

  • Range: 0 to ∞ (higher is better)

  • Best for: Finding k when clusters are dense and separated

4.1.4. Dunn Index

  • Measures: Ratio of minimum inter-cluster distance to maximum intra-cluster distance

  • Formula:

    • DI = min(inter-cluster distance) / max(intra-cluster diameter)

  • Range: 0 to ∞ (higher is better)

  • Best for: Identifying compact, well-separated clusters

  • Limitation: Sensitive to noise and outliers

4.2 External Metrics (Ground truth labels available)

4.2.1 Adjusted Rand Index (ARI)

  • Measures: Similarity between two clusterings (corrected for chance)

  • Interpretation: Similarity score between predicted and true labels

  • Range: -1 to 1

    • 1 = Perfect match

    • 0 = Random labeling

    • -1 = Complete disagreement

  • Best for: Comparing to ground truth without assumptions

4.2.2 Normalized Mutual Information (NMI)

  • Measures: Mutual information between clusterings, normalized

  • Formula:

    • NMI = 2 * I(X;Y) / [H(X) + H(Y)]

    • I = Mutual information

    • H = Entropy

  • Range: 0 to 1

    • 1 = Perfect correlation

    • 0 = Independent clusterings

  • Best for: Information-theoretic comparison

4.2.3 Homogeneity, Completeness, V-Beta Score

  • Homogeneity: Each cluster contains only members of a single class

  • Completeness: All members of a given class are in same cluster

  • V-Measure: Harmonic mean of homogeneity and completeness

  • Range: 0 to 1 (higher is better)

4.2.4 Fowlkes-Mallows Index (FMI)

  • Measures: Geometric mean of precision and recall for pairwise clustering

  • Formula: FMI = TP / √[(TP+FP)(TP+FN)]

  • Range: 0 to 1 (higher is better)

  • Best for: Binary clustering comparison

posted @ 2019-04-26 15:34  ylxn  阅读(1034)  评论(0)    收藏  举报