Double Descent Phenomenon

https://openai.com/index/deep-double-descent/

我似乎没必要把 openai 博客翻译成大家都看不懂的汉语了。

总结一下主旨：

The charts above show test and train error as a function of both model size and number of optimization steps. For a given number of optimization steps (fixed y-coordinate), test and train error exhibit model-size double descent. For a given model size (fixed x-coordinate), as training proceeds, test and train error decreases, increases, and decreases again; we call this phenomenon epoch-wise double descent.

In general, the peak of test error appears systematically when models are just barely able to fit the train set.

openai 博客里面有一个概念实 interpolation threshold，它描述的就是模型刚刚好拟合训练集的时刻。不难理解对于一个特定 epoch 数，那么存在一个 model size 使得比它小的过拟合，比它大的欠拟合。对于特定 model size 同理。

Our intuition is that, for models at the interpolation threshold, there is effectively only one model that fits the train data, and forcing it to fit even slightly noisy or misspecified labels will destroy its global structure. That is, there are no “good models” which both interpolate the train set and perform well on the test set. However, in the over-parameterized regime, there are many models that fit the train set and there exist such good models. Moreover, the implicit bias of stochastic gradient descent (SGD) leads it to such good models, for reasons we don’t yet understand.

We leave fully understanding the mechanisms behind double descent in deep neural networks as an important open question.

openai 认为：这个现象告诉我们 “There is a regime where training longer reverses overfitting.”，这是我们可能可以利用它的方式。

我在网上找了几个分析其背后原因的论文，但是它们一看就是凸分析手段用多了，我是看不懂的。。。

https://arxiv.org/pdf/2303.14151

https://arxiv.org/pdf/2403.10459

posted @ 2024-05-30 23:57 yspm 阅读(142) 评论(0) 收藏举报

刷新页面返回顶部

yspm

玻璃碎片一定可以变成水晶雕塑

Double Descent Phenomenon

公告