Python scikit-learn (metrics): difference between r2_score and explained_variance_score?
I noticed that that 'r2_score' and 'explained_variance_score' are both build-in sklearn.metrics methods for regression problems.
I was always under the impression that r2_score is the percent variance explained by the model. How is it different from 'explained_variance_score'?
When would you choose one over the other?
Thanks!
OK, look at this example:
In [123]: #data y_true = [3, -0.5, 2, 7] y_pred = [2.5, 0.0, 2, 8] print metrics.explained_variance_score(y_true, y_pred) print metrics.r2_score(y_true, y_pred) 0.957173447537 0.948608137045 In [124]: #what explained_variance_score really is 1-np.cov(np.array(y_true)-np.array(y_pred))/np.cov(y_true) Out[124]: 0.95717344753747324 In [125]: #what r^2 really is 1-((np.array(y_true)-np.array(y_pred))**2).sum()/(4*np.array(y_true).std()**2) Out[125]: 0.94860813704496794 In [126]: #Notice that the mean residue is not 0 (np.array(y_true)-np.array(y_pred)).mean() Out[126]: -0.25 In [127]: #if the predicted values are different, such that the mean residue IS 0: y_pred=[2.5, 0.0, 2, 7] (np.array(y_true)-np.array(y_pred)).mean() Out[127]: 0.0 In [128]: #They become the same stuff print metrics.explained_variance_score(y_true, y_pred) print metrics.r2_score(y_true, y_pred) 0.982869379015 0.982869379015
So, when the mean residue is 0, they are the same. Which one to choose dependents on your needs, that is, is the mean residue suppose to be 0?
Most of the answers I found (including here) emphasize on the difference between R2 and Explained Variance Score, that is: The Mean Residue (i.e. The Mean of Error).
However, there is an important question left behind, that is: Why on earth I need to consider The Mean of Error?
Refresher:
R2: is the Coefficient of Determination which measures the amount of variation explained by the (least-squares) Linear Regression.
You can look at it from a different angle for the purpose of evaluating the predicted values of y
like this:
Varianceactual_y × R2actual_y = Variancepredicted_y
So intuitively, the more R2 is closer to 1
, the more actual_y and predicted_y will have samevariance (i.e. same spread)
As previously mentioned, the main difference is the Mean of Error; and if we look at the formulas, we find that's true:
R2 = 1 - [(Sum of Squared Residuals / n) / Variancey_actual]
Explained Variance Score = 1 - [Variance(Ypredicted - Yactual) / Variancey_actual]
in which:
Variance(Ypredicted - Yactual) = (Sum of Squared Residuals - Mean Error) / n
So, obviously the only difference is that we are subtracting the Mean Error from the first formula! ... But Why?
When we compare the R2 Score with the Explained Variance Score, we are basically checking the Mean Error; so if R2 = Explained Variance Score, that means: The Mean Error = Zero!
The Mean Error reflects the tendency of our estimator, that is: the Biased v.s Unbiased Estimation.
In Summary:
If you want to have unbiased estimator so our model is not underestimating or overestimating, you may consider taking Mean of Error into account.
-------------------------------------------
个性签名:无论在哪里做什么,只要坚持服务、创新、创造价值,其它的东西自然都会来的。
如果觉得这篇文章对你有小小的帮助的话,记得在右下角点个“推荐”哦,博主在此感谢!
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 开发者必知的日志记录最佳实践
· SQL Server 2025 AI相关能力初探
· Linux系列:如何用 C#调用 C方法造成内存泄露
· AI与.NET技术实操系列(二):开始使用ML.NET
· 记一次.NET内存居高不下排查解决与启示
· 开源Multi-agent AI智能体框架aevatar.ai,欢迎大家贡献代码
· Manus重磅发布:全球首款通用AI代理技术深度解析与实战指南
· 被坑几百块钱后,我竟然真的恢复了删除的微信聊天记录!
· 没有Manus邀请码?试试免邀请码的MGX或者开源的OpenManus吧
· 园子的第一款AI主题卫衣上架——"HELLO! HOW CAN I ASSIST YOU TODAY