李宏毅《机器学习》总结 - 2022 HW11(Domain Adaptation) Strong Baseline
不是很难做的作业,调调 epoch,改改权重就能过 medium,最后按照论文调一下可变的权重就能过 strong。
题目大意是说给定训练集是 10 种现实里的生物和他们的标签,希望对测试集中种类相同但是画风不同的生物进行分类。
采用 DaNN 的架构,feature extractor 用于 “提取” 不同 domain 中的相同部分,domain classfier 用于检查“提取”的结果(如果 domain classfier 已经无法分辨出是哪个 domain 了,就说明 feature extractor 的效果很好,进一步说明无 label 的其它 domain 的图片的预测准确的概率更大。)
训练的时候可以像论文中提到的加一层 GRL(把反向传播时的梯度在 GRL 取相反数,再反向传播),这样可以不用分开训练了。也可以像训练 GAN 一样将 feature extractor & label predictor 训练一次,再把 domain classifier 训练一次。
代码链接:https://www.kaggle.com/code/skyrainwind/hw11-domain-adaptation
题目分析
medium:调大 epoch=1000,再调整 loss = (label_predictor 的 loss) - λ*(domain_classfier 的 loss)的 λ,发现 0.7 的时候效果较好。
strong:按照 DaNN论文 设定一个 λ 的调整函数。
代码分析
使用 canny 将 RGB 的图转化为和测试集中画风相近的图(黑白的图)
与主题无关不放代码了。
feature extractor 利用 CNN 和池化层将图片识别为 512 维的向量。
label predictor 就是利用 FCN 得到一个 10 维的 vector,利用 cross entropy 得到预测的类
domain classifier 利用多层 FCN 得到一个是否是/不是训练集的背景,最后得到一个 scalar。
class FeatureExtractor(nn.Module):
def __init__(self):
super(FeatureExtractor, self).__init__()
self.conv = nn.Sequential(
nn.Conv2d(1, 64, 3, 1, 1),
nn.BatchNorm2d(64),
nn.ReLU(),
nn.MaxPool2d(2),
nn.Conv2d(64, 128, 3, 1, 1),
nn.BatchNorm2d(128),
nn.ReLU(),
nn.MaxPool2d(2),
nn.Conv2d(128, 256, 3, 1, 1),
nn.BatchNorm2d(256),
nn.ReLU(),
nn.MaxPool2d(2),
nn.Conv2d(256, 256, 3, 1, 1),
nn.BatchNorm2d(256),
nn.ReLU(),
nn.MaxPool2d(2),
nn.Conv2d(256, 512, 3, 1, 1),
nn.BatchNorm2d(512),
nn.ReLU(),
nn.MaxPool2d(2)
)
def forward(self, x):
x = self.conv(x).squeeze()
return x
class LabelPredictor(nn.Module):
def __init__(self):
super(LabelPredictor, self).__init__()
self.layer = nn.Sequential(
nn.Linear(512, 512),
nn.ReLU(),
nn.Linear(512, 512),
nn.ReLU(),
nn.Linear(512, 10),
)
def forward(self, h):
c = self.layer(h)
return c
class DomainClassifier(nn.Module):
def __init__(self):
super(DomainClassifier, self).__init__()
self.layer = nn.Sequential(
nn.Linear(512, 512),
nn.BatchNorm1d(512),
nn.ReLU(),
nn.Linear(512, 512),
nn.BatchNorm1d(512),
nn.ReLU(),
nn.Linear(512, 512),
nn.BatchNorm1d(512),
nn.ReLU(),
nn.Linear(512, 512),
nn.BatchNorm1d(512),
nn.ReLU(),
nn.Linear(512, 1),
)
def forward(self, h):
y = self.layer(h)
return y
训练过程中,和 GAN 类似,分两个步骤,先训练 domain classifier,再训练其余两个。
这里的 loss 定义成两个 loss 相减,其实这样做的反向传播过程不是很理解,待填。
def adaptive_lambda(epoch, total_epochs):
p = epoch / total_epochs
lambda_ = (2 / (1 + np.exp(-10*p))) - 1
return lambda_
def train_epoch(source_dataloader, target_dataloader, lamb):
'''
Args:
source_dataloader: source data的dataloader
target_dataloader: target data的dataloader
lamb: control the balance of domain adaptatoin and classification.
'''
# D loss: Domain Classifier的loss
# F loss: Feature Extrator & Label Predictor的loss
running_D_loss, running_F_loss = 0.0, 0.0
total_hit, total_num = 0.0, 0.0
for i, ((source_data, source_label), (target_data, _)) in enumerate(zip(source_dataloader, target_dataloader)):
source_data = source_data.cuda()
source_label = source_label.cuda()
target_data = target_data.cuda()
# Mixed the source data and target data, or it'll mislead the running params
# of batch_norm. (runnning mean/var of soucre and target data are different.)
mixed_data = torch.cat([source_data, target_data], dim=0)
domain_label = torch.zeros([source_data.shape[0] + target_data.shape[0], 1]).cuda()
# set domain label of source data to be 1.
domain_label[:source_data.shape[0]] = 1
# Step 1 : train domain classifier
feature = feature_extractor(mixed_data)
# We don't need to train feature extractor in step 1.
# Thus we detach the feature neuron to avoid backpropgation.
domain_logits = domain_classifier(feature.detach())
loss = domain_criterion(domain_logits, domain_label)
running_D_loss+= loss.item()
loss.backward()
optimizer_D.step()
# Step 2 : train feature extractor and label classifier
class_logits = label_predictor(feature[:source_data.shape[0]])
domain_logits = domain_classifier(feature)
# loss = cross entropy of classification - lamb * domain binary cross entropy.
# The reason why using subtraction is similar to generator loss in disciminator of GAN
loss = class_criterion(class_logits, source_label) - lamb * domain_criterion(domain_logits, domain_label)
running_F_loss+= loss.item()
loss.backward()
optimizer_F.step()
optimizer_C.step()
optimizer_D.zero_grad()
optimizer_F.zero_grad()
optimizer_C.zero_grad()
total_hit += torch.sum(torch.argmax(class_logits, dim=1) == source_label).item()
total_num += source_data.shape[0]
print(i, end='\r')
return running_D_loss / (i+1), running_F_loss / (i+1), total_hit / total_num
# train 200 epochs
total_epochs=1000
for epoch in range(total_epochs):
train_D_loss, train_F_loss, train_acc = train_epoch(source_dataloader, target_dataloader, lamb=adaptive_lambda(epoch, total_epochs))
torch.save(feature_extractor.state_dict(), f'extractor_model.bin')
torch.save(label_predictor.state_dict(), f'predictor_model.bin')
print('epoch {:>3d}: train D loss: {:6.4f}, train F loss: {:6.4f}, acc {:6.4f}'.format(epoch, train_D_loss, train_F_loss, train_acc))