Double Descent Phenomenon
https://openai.com/index/deep-double-descent/
我似乎没必要把 openai 博客翻译成大家都看不懂的汉语了。
总结一下主旨:
The charts above show test and train error as a function of both model size and number of optimization steps. For a given number of optimization steps (fixed y-coordinate), test and train error exhibit model-size double descent. For a given model size (fixed x-coordinate), as training proceeds, test and train error decreases, increases, and decreases again; we call this phenomenon epoch-wise double descent.
In general, the peak of test error appears systematically when models are just barely able to fit the train set.
openai 博客里面有一个概念实 interpolation threshold,它描述的就是模型刚刚好拟合训练集的时刻。不难理解对于一个特定 epoch 数,那么存在一个 model size 使得比它小的过拟合,比它大的欠拟合。对于特定 model size 同理。
Our intuition is that, for models at the interpolation threshold, there is effectively only one model that fits the train data, and forcing it to fit even slightly noisy or misspecified labels will destroy its global structure. That is, there are no “good models” which both interpolate the train set and perform well on the test set. However, in the over-parameterized regime, there are many models that fit the train set and there exist such good models. Moreover, the implicit bias of stochastic gradient descent (SGD) leads it to such good models, for reasons we don’t yet understand.
We leave fully understanding the mechanisms behind double descent in deep neural networks as an important open question.
openai 认为:这个现象告诉我们 “There is a regime where training longer reverses overfitting.”,这是我们可能可以利用它的方式。
我在网上找了几个分析其背后原因的论文,但是它们一看就是凸分析手段用多了,我是看不懂的。。。
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 震惊!C++程序真的从main开始吗?99%的程序员都答错了
· 【硬核科普】Trae如何「偷看」你的代码?零基础破解AI编程运行原理
· 单元测试从入门到精通
· 上周热点回顾(3.3-3.9)
· winform 绘制太阳,地球,月球 运作规律