ZhangZhihui's Blog  

Pearson vs. Spearman Correlation: When to Use Each?

Both Pearson and Spearman correlations measure the relationship between two variables, but they are used in different situations based on the type of data and assumptions.


1. Pearson Correlation (皮尔逊相关系数)

  • Measures the linear relationship between two continuous variables.
  • Assumes that both variables are normally distributed and have a linear relationship.
  • Returns a value between -1 and 1:
    • +1 → Perfect positive linear correlation
    • 0 → No correlation
    • -1 → Perfect negative linear correlation

Formula:

r=∑(Xi−Xˉ)(Yi−Yˉ)∑(Xi−Xˉ)2⋅∑(Yi−Yˉ)2r = \frac{\sum (X_i - \bar{X})(Y_i - \bar{Y})}{\sqrt{\sum (X_i - \bar{X})^2} \cdot \sqrt{\sum (Y_i - \bar{Y})^2}}r=(XiXˉ)2(YiYˉ)2(XiXˉ)(YiYˉ)

where:

  • Xi,YiX_i, Y_iXi,Yi are data points,
  • Xˉ,Yˉ\bar{X}, \bar{Y}Xˉ,Yˉ are means of XXX and YYY.

When to Use Pearson?

  • When data is continuous (e.g., height, weight, temperature).
  • When the relationship between variables is linear.
  • When data is normally distributed.

Example:

  • Relationship between height and weight.
  • Relationship between study hours and exam scores (assuming a linear trend).

2. Spearman Correlation (斯皮尔曼秩相关系数)

  • Measures the monotonic relationship between two variables (not necessarily linear).
  • Works for ordinal, interval, or ratio data.
  • Does not assume normality.
  • Instead of using raw values, it ranks the data before computing correlation.

Formula:

rs=1−6∑di2n(n2−1)r_s = 1 - \frac{6 \sum d_i^2}{n(n^2 - 1)}rs=1n(n21)6di2

where:

  • did_idi is the difference between the ranks of XiX_iXi and YiY_iYi,
  • nnn is the number of observations.

When to Use Spearman?

  • When data is not normally distributed.
  • When data has outliers (Spearman is robust to outliers).
  • When the relationship is monotonic but not necessarily linear.
  • When working with ranked (ordinal) data.

Example:

  • Relationship between customer satisfaction and service rating (ranked 1-5).
  • Relationship between income and happiness (where higher income generally means higher happiness, but not in a strict linear way).

Key Differences:

FeaturePearson CorrelationSpearman Correlation
Measures Linear relationship Monotonic relationship
Data Type Continuous Continuous or Ordinal
Normality Assumption Yes (data should be normally distributed) No assumption about normality
Sensitivity to Outliers High (outliers can distort results) Low (ranks reduce the effect of outliers)
Best Use Case When the relationship is strictly linear When the relationship is monotonic but not linear

Which One to Use?

  • If your data is normally distributed and you expect a linear relationship → Use Pearson.
  • If your data is not normally distributed, contains outliers, or has a monotonic but not linear relationship → Use Spearman.
posted on   ZhangZhihuiAAA  阅读(11)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· 震惊!C++程序真的从main开始吗?99%的程序员都答错了
· 【硬核科普】Trae如何「偷看」你的代码?零基础破解AI编程运行原理
· 单元测试从入门到精通
· 上周热点回顾(3.3-3.9)
· winform 绘制太阳,地球,月球 运作规律
 
点击右上角即可分享
微信分享提示