一些统计量

均值 (Mean):

\[\overline{x}=\frac{1}{n}\sum_{i=1}^{n} x_i \]

方差 (Variance): 衡量单类样本偏离均值的程度

\[D(x)=\frac{1}{n}\sum_{i=1}^{n}(x_i-\overline{x})^2 \]

协方差 (Covariance): 反映两个随机变量的相关程度

\[\begin{aligned} \text{Cov}(x,y) &= E[(X-E(X))(Y-E(Y))] \\ &= \frac{1}{n}\sum_{i=1}^{n} (x_i-\overline{x})(y_i-\overline{y}) \\ &= E(XY) - E(X)\cdot E(Y) \\ &= (\frac{1}{n}\sum_{i=1}^{n} x_i y_i) -\overline{x}\cdot\overline{y} \end{aligned} \]

皮尔森相关系数 (Pearson Correlation Coefficient, PCC): 标准化协方差,消除量纲影响

数据中心化后两个n维向量的夹角余弦

\[\begin{aligned} \rho(x,y) &= \frac{\text{Cov}(x,y)}{\sqrt{D(x)}\sqrt{D(y)}} \\ &= \frac{\sum_{i=1}^{n} (x_i-\overline{x})(y_i-\overline{y})}{\sqrt{\sum_{i=1}^{n}(x_i-\overline{x})^2}\sqrt{\sum_{i=1}^{n}(y_i-\overline{y})^2}} \end{aligned}\]

posted @ 2022-08-21 02:18  4thirteen2one  阅读(28)  评论(0编辑  收藏  举报