六个统计上的错误, 而且被常用到

六个统计上的错误, 而且被常用到
 arXiv:2209.09073 (cross-list from physics.data-an) [pdf, ps, other]
Six textbook mistakes in data analysis
Comments: 15 pages, 7 figures
Subjects: Data Analysis, Statistics and Probability (physics.data-an); Instrumentation and Methods for Astrophysics (astro-ph.IM)

This article discusses a number of incorrect statements appearing in textbooks on data analysis, machine learning, or computational methods; the common theme in all these cases is the relevance and application of statistics to the study of scientific or engineering data; these mistakes are also quite prevalent in the research literature. Crucially, we do not address errors made by an individual author, focusing instead on mistakes that are widespread in the introductory literature. After some background on frequentist and Bayesian linear regression, we turn to our six paradigmatic cases, providing in each instance a specific example of the textbook mistake, pointers to the specialist literature where the topic is handled properly, along with a correction that summarizes the salient points. The mistakes (and corrections) are broadly relevant to any technical setting where statistical techniques are used to draw practical conclusions, ranging from topics introduced in an elementary course on experimental measurements all the way to more involved approaches to regression.


A. Maximum-likelihood parameter estimation

小量的时候, MLE不能用于uncertainty

B. Chi-squared statistic and quality of fit
不能过度追求ki^2/dof=1

C. Confidence intervals for model parameters
概念问题, 实在想要置信范围的话, 用贝叶斯统计.

D. Empirical rule in the multivariate case 
这个错误很常见
E. What is random in frequentist vs Bayesian regression
概念错误

F. Posterior predictive distribution, noise, and samples
概念错误.

附件列表

     

    posted @ 2022-09-20 12:19  zouyc  阅读(27)  评论(0编辑  收藏  举报