Some Basic Concepts in Machine Learning

Model measurement

Error

Training error, Empirical error, Generalization error

Fitting

overfitting, underfitting

Measure methods

  • Hold-out: \(D=S \cup T, ~S\cap T=\emptyset , ~where ~S~means~\textit{Sample},~T~means~\textit{Training}.\) Stratified sampling.

  • Cross validation: \(D=\bigcup_k D_k,~D_i\cap D_j=\emptyset ~ (i\neq j),\) K-fold cross validation. \(when ~K=\# D, ~we ~call ~it~ LOO ~\): Leave-one-out.

  • Bootstraping: give samples set \(D\), \(\#D=m\), do stratified sampling with replacement on \(D\), each time take one case to \(D'\), Repeat \(m\) times, then we get a new set \(D' ~ (\#D' = m)\). \(D'\) becomes training set, \(D - D'\) becomes testing set. out-of-bag estimate.

  • Parameter tuning: step, validation set.

  • Performance measure:

    • Samples \(D=\{(x_1,y_1),...(x_m,y_m)\}\), \(x_i \to y_i\), learner \(f\), learning result \(f(x)\), real tag \(y\).

    • Regression: mean squared error.

      \[E(f;D)=\frac{1}m\sum_{i=1}^m (f(x_i)-y_i)^2 ~. \\ \int_{x \sim d} (f(x)-y)^2p(x)dx ~. \]

    • Error rate:

      \[E(f;D)=\frac{1}m \sum_m\mathbb{I}(f(x_i)\neq y_i) ~. \]

    • Accuracy rate:

      \[acc(f; D)=\frac{1}m\sum_m\mathbb{I}(f(x_i) = y_i) \]

      \[=1-E(f; D) \]

    • Precision and Recall:

    \[P=\frac{TP}{TP+FP}\\ R=\frac{TP}{TP+FN} \]

    • P-R curve
    • Break-Even Point, BEP
    • \(F_\beta\) measure:

    \[\frac{1}{F_\beta} = \frac{1}{1+\beta^2}\left( \frac{1}{P}+\frac{\beta^2}{R} \right) \]

    • macro-R, macro-P, macro-\(F_\beta\)

    \[macro-R=\frac{1}n \sum_nP_i \\ macro-P=\frac{1}n \sum_nR_i \\ macro-F_\beta = F_\beta(macro-R, macro-P) \]

    • micro-R:

    \[micro-R = \frac{\overline{TP}} {\overline{TP}+\overline{FP}}. \]

    • ROC: Receiver Operating Characteristic curve, x-axis: FPR, y-axis: TPR, TPR=R,

    \[FPR=\frac{FP}{TP+FP} ~. \]

posted @ 2018-03-17 10:47  小鑫同学  阅读(200)  评论(0编辑  收藏  举报

© 2018 追梦网 | Created by Cai M$ | 博客园 | 推荐使用 Microsoft EdgeWindows 10