NTU ML2023Spring Part3.5 transformer

第一难：安装依赖包时 fairseq 装不上。解决方法：将 pip 降级到 24.1 以下：pip install pip==24.0。

第二难：ImportError: cannot import name 'utils' from 'fairseq' (unknown location).

~~于是放弃。~~

发现 ML2025Spring 有类似的作业（hw4），虽然没法提交（但 ML2023 的这个作业也没法提交）。~~最新的版本应该不会再有什么依赖问题了吧？~~

然后无脑调参，基本上就是数据规模往大了调就对了。epoch？加倍！nhead？加倍！反正就是调大就完了。

第一次调到 loss 为 0.98，输出结果长这样：

一眼看过去好像有点样子，但也只是一眼。再多看一眼就会爆炸发现不对劲的地方。

然后又调了一次，loss 到 0.60 了。但忘了保存，导致我现在也不知道当时怎么调出来的。

然后再试着调大参数，结果 cuda out of memory 了。本来以为是 batch size 的锅，但后来发现应该就是模型大小的问题。还是要做点优化，而不是无脑调大参数。

尝试调成这样：

gpt2_config = {
    "activation_function": "gelu_new",    # Activation function used in the model
    "architectures": ["GPT2LMHeadModel"],  # Specifies the model type
    "attn_pdrop": 0.2,            # Dropout rate for attention layers
    "embd_pdrop": 0.2,            # Dropout rate for embeddings
    "initializer_range": 0.05,        # Standard deviation for weight initialization
    "layer_norm_epsilon": 1e-05,       # Small constant to improve numerical stability in layer norm
    "model_type": "gpt2",           # Type of model
    "n_ctx": 128,               # Context size (maximum sequence length)
    "n_embd": 256,              # Embedding size
    "n_head": 16,               # Number of attention heads
    "n_layer": 16,              # Number of transformer layers
    "n_positions": 800,           # Maximum number of token positions
    "resid_pdrop": 0.2,           # Dropout rate for residual connections
    "vocab_size": num_classes,       # Number of unique tokens in vocabulary
    "pad_token_id": None,          # Padding token ID (None means no padding token)
    "eos_token_id": None,          # End-of-sequence token ID (None means not explicitly defined)
}

结果 loss 直接干到 1.3 来了，结果不忍直视。

尝试把 nhead 调大，发现效果会比较好，loss 为 0.79，最后长这样：

虽然人眼看起来很不连续，但起码比之前好。

colab 调着调着突然报了个不知所云的错：

RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

没找到报错原因，于是只好上 kaggle 跑。但过几分钟重启 colab 又好了。鉴定为玄学。

再调调，epoch 开到 200 之后发现 loss 降到了 0.44，看起来像这样：

感觉不错，可惜没法提交。

posted @ 2025-04-05 08:19 383494 阅读(35) 评论(0) 收藏举报

刷新页面返回顶部

x383494

NTU ML2023Spring Part3.5 transformer

公告