自己训练一个小模型
Chatgpt都出4o为什么还要自己训练一个小模型呢?
-
如果你能自己训练一个小模型,说明你已经掌握Transformer了
-
有些场景就需要小模型,小而美,就像单片机就是有它的市场
-
训练小模型不是目的,目的是通过训练小模型跑通大模型训练的基本流程
想要看懂或写出训练小模型的代码,需要学习前置 Transformer模型、机器学习、线性代数等相关知识
看不懂也没关系,直接运行试试效果
上代码
!pip install numpy requests torch tiktoken matplotlib pandas
import os
import requests
import math
import tiktoken
import torch
import torch.nn as nn
from torch.nn import functional as F
# Hyperparameters
batch_size = 4 # How many batches per training step
context_length = 16 # Length of the token chunk each batch
d_model = 64 # The size of our model token embeddings
num_blocks = 8 # Number of transformer blocks
num_heads = 4 # Number of heads in Multi-head attention
learning_rate = 1e-3 # 0.001
dropout = 0.1 # Dropout rate
max_iters = 5000 # Total of training iterations <- Change this to smaller number for testing
eval_interval = 50 # How often to evaluate
eval_iters = 20 # Number of iterations to average for evaluation
device = 'cuda' if torch.cuda.is_available() else 'cpu' # Use GPU if it's available.
TORCH_SEED = 1337
torch.manual_seed(TORCH_SEED)
# Load training data
if not os.path.exists('data/sales_textbook.txt'):
url = 'https://huggingface.co/datasets/goendalf666/sales-textbook_for_convincing_and_selling/raw/main/sales_textbook.txt'
with open('data/sales_textbook.txt', 'w') as f:
f.write(requests.get(url).text)
with open('data/sales_textbook.txt', 'r', encoding='utf-8') as f:
text = f.read()
# Using TikToken (Same as GPT3) to tokenize the source text
encoding = tiktoken.get_encoding("cl100k_base")
tokenized_text = encoding.encode(text)
max_token_value = max(tokenized_text) + 1 # the maximum value of the tokenized numbers
tokenized_text = torch.tensor(tokenized_text, dtype=torch.long, device=device) # put tokenized text into tensor
# Split train and validation
split_idx = int(len(tokenized_text) * 0.9)
train_data = tokenized_text[:split_idx]
val_data = tokenized_text[split_idx:]
# Define Feed Forward Network
class FeedForward(nn.Module):
def __init__(self):
super().__init__()
self.d_model = d_model
self.dropout = dropout
self.ffn = nn.Sequential(
nn.Linear(in_features=self.d_model,