Pearson vs. Spearman Correlation: When to Use Each?
Both Pearson and Spearman correlations measure the relationship between two variables, but they are used in different situations based on the type of data and assumptions.
1. Pearson Correlation (皮尔逊相关系数)
- Measures the linear relationship between two continuous variables.
- Assumes that both variables are normally distributed and have a linear relationship.
- Returns a value between -1 and 1:
- +1 → Perfect positive linear correlation
- 0 → No correlation
- -1 → Perfect negative linear correlation
Formula:
r=∑(Xi−Xˉ)(Yi−Yˉ)∑(Xi−Xˉ)2⋅∑(Yi−Yˉ)2r = \frac{\sum (X_i - \bar{X})(Y_i - \bar{Y})}{\sqrt{\sum (X_i - \bar{X})^2} \cdot \sqrt{\sum (Y_i - \bar{Y})^2}}r=∑(Xi−Xˉ)2where:
- Xi,YiX_i, Y_iXi,Yi are data points,
- Xˉ,Yˉ\bar{X}, \bar{Y}Xˉ,Yˉ are means of XXX and YYY.
When to Use Pearson?
- When data is continuous (e.g., height, weight, temperature).
- When the relationship between variables is linear.
- When data is normally distributed.
Example:
- Relationship between height and weight.
- Relationship between study hours and exam scores (assuming a linear trend).
2. Spearman Correlation (斯皮尔曼秩相关系数)
- Measures the monotonic relationship between two variables (not necessarily linear).
- Works for ordinal, interval, or ratio data.
- Does not assume normality.
- Instead of using raw values, it ranks the data before computing correlation.
Formula: