debug



看gluoncv网络的笔记

from gluoncv.model_zoo import get_model
import mxnet as mx
from mxnet import autograd
#
net = get_model('center_net_resnet50_v1b_coco',pretrained=False, pretrained_base=False)
# net = get_model('center_net_resnet50_v1b_dcnv2_coco',pretrained=False, pretrained_base=False) #mxnet1.7版本才开始支持dcnv2
# print(net)
# net.initialize()

#1.看网络结构
# x = mx.nd.uniform(shape=(1, 3, 512, 512))
# ids, scores, bboxes = net(x)
# print(net)

#2.看中间层网络输出特征尺寸
# resnet = net.base_network.base_network(mx.sym.var(name="data1"))
# decov = net.base_network.deconv(mx.sym.var(name="data2"))
# fshape1 = resnet.infer_shape(data1=(1, 3, 512, 512))[1][0]   #(1, 2048, 16, 16)
# fshape2 = decov.infer_shape(data2=(1, 2048, 16, 16))[1][0]   #(1, 64, 128, 128)
# # print(fshape1)
# print(len(fshape2[0]), fshape2[0])
# print(fshape2[1])

#3.看训练时最后输出结果尺寸
x = mx.nd.uniform(shape=(1, 3, 512, 512))
ids, scores, bboxes = net(x)
with autograd.train_mode():
    heatmap_pred, wh_pred, center_reg_pred = net(x)
print(heatmap_pred.shape, wh_pred.shape, center_reg_pred.shape)

卷积和池化过程中

输出size计算：N=（image_h + 2*pad_h – kernel_h）/stride_h+ 1

N不为整数时：卷积向下取整数，池化向上取整数(不同框架实现不同，mxnet默认卷积和池化都像下取整)

Debug deep learning program¶

In this tutorial we'll walk through some common issues during deep learning application development and methods to resolve.

We pick handwritten digit recognition application with multilayer perceptron as example.

In [ ]:

from __future__ import print_function
import mxnet as mx
from mxnet import gluon
from mxnet.gluon import nn, data
from mxnet import autograd as ag

class Net(gluon.HybridBlock):
    def __init__(self, **kwargs):
        super(Net, self).__init__(**kwargs)
        with self.name_scope():
            self.conv1 = nn.Conv2D(20, kernel_size=(5,5))
            self.pool1 = nn.MaxPool2D(pool_size=(2,2), strides = (2,2))
            self.conv2 = nn.Conv2D(50, kernel_size=(5,5))
            self.pool2 = nn.MaxPool2D(pool_size=(2,2), strides = (2,2))
            self.fc1 = nn.Dense(500)
            self.fc2 = nn.Dense(10)

    def hybrid_forward(self, F, x):
        x = self.pool1(F.tanh(self.conv1(x)))
        print("pool1 output: %s" % str(x))
        x = self.pool2(F.tanh(self.conv2(x)))
        x = x.reshape((0, -1))
        x = F.tanh(self.fc1(x))
        x = F.tanh(self.fc2(x))
        return x

I. Check Data IO Issues

1. Use standard dataset

Standard datasets, such as mnist and cifar10, are ideal starting points and can help with minimizing the issues in input data itself. In this tutorial we use mnist as input data.

In [ ]:

mnist = mx.test_utils.get_mnist()

2. Check data loader

MXNet gluon uses DataLoader class for data io. We need to create a Dataset object which wraps up input data and a Sampler object which defines how to draw data samples. Gluon has some built-in classes for most common use cases. In this tutorial we use built-in ArrayDataset and BatchSampler. If you need to implement customized Dataset or Sampler, add some unit tests to ensure these customized data loading modules behave as expected.

In [ ]:

batch_size = 1
train_dataset = data.dataset.ArrayDataset(mnist['train_data'], mnist['train_label'])
val_dataset = data.dataset.ArrayDataset(mnist['test_data'], mnist['test_label'])
train_dataloader = data.dataloader.DataLoader(train_dataset, batch_size=batch_size)
val_dataloader = data.dataloader.DataLoader(val_dataset, batch_size=batch_size)

3. Check data preprocessing

Input data usually requires preprocessing and to be standardized. In this tutorial, the pixels in input mnist images are divied by 255 and limited between 0 and 1.0:

   image = image.reshape(image.shape[0], 1, 28, 28).astype(np.float32)/255

Some networks require input pixels to between -1.0 and 1.0. Don't forget to preprocess input data.

II. Check Implementation Issues

1. Check the correctness of loss function

We use SoftmaxCrossEntropyLoss as loss function. Similar to data loader, you can create customized loss function. Add unit tests to ensure the correctness. A common issue in implementing customized loss function is numerical instability. Many loss functions use logistic function and we need to make sure the input shouldn't be small enough to return 'nan'. Cilpping input is a common way to deal such situation:

    eps = 10e-8
    input = mx.nd.clip(input, a_min=eps, a_max=1.0 - eps)
    output = mx.nd.log(input)

In [ ]:

loss_func = gluon.loss.SoftmaxCrossEntropyLoss()

Don't forget to set from_logits=True if input is already a log probability:

    loss_fuc = gluon.loss.SoftmaxCrossEntropyLoss(from_logits=True)

2. Check parameter initialization

If your are not sure, Xavier is a good choice to start. Try different initializers if current initialization leads your model to a bad local minimum.

In [ ]:

model = Net()
ctx = [mx.cpu()]
model.collect_params().initialize(mx.init.Xavier(magnitude=2.24), ctx=ctx)

3. Check trainer

Gluon defines optimizer in Trainer module. You need to setup the parameters to be updated. Call collect_params() to specify the model paramters. Make sure you collect the parameters for the correct model.

Also try different hyperparameters or select different optimizers if the training can't get much progress.

In [ ]:

trainer = gluon.Trainer(model.collect_params(), 'sgd', {'learning_rate': 0.1})

III. Debug during training

One big advantage of gluon is that you can easily switch between imperative and symbolic training. Imperative mode is suitable for debug. You can add print statements in forward function. While debugging is finished, you can call hybridize() to use hybrid mode to accelerate training.

1. Watch layer output

Insert print statements to forward function to monitor layer outputs. In this tutorial, we print the ouputs of pool1.

In [ ]:

iter_num = 1
for i, batch in enumerate(train_dataloader):
    if i >= iter_num:
        break
    output = model(batch[0])

2. Watch parameters values

We can also print parameter values. Call block.params to get ParameterDict of each layer.

In [ ]:

fc1_params = model.fc1.params
print(fc1_params)

Print values of fc1 layer parameters.

In [ ]:

for key, val in fc1_params.items():
    print("%s:" % key)
    print(str(val.data()) + '\n')

3. Watch gradients

We can also print gradients to see if gradients vanish or explode.

In [ ]:

iter_num = 1
for i, batch in enumerate(train_dataloader):
    if i >= iter_num:
        break
    data = batch[0]
    label = batch[1]
    with ag.record():
        output = model(data)
        loss = loss_func(output, label)
        loss.backward() 

for key, val in fc1_params.items():
    print("%s grad:" % key)
    print(str(val.grad()) + '\n')

posted @ 2024-09-21 13:16 silence_cho 阅读(3) 评论(0) 收藏举报

刷新页面返回顶部

silence_cho

debug

Debug deep learning program¶

I. Check Data IO Issues

1. Use standard dataset

2. Check data loader

3. Check data preprocessing

II. Check Implementation Issues

1. Check the correctness of loss function

2. Check parameter initialization

3. Check trainer

III. Debug during training

1. Watch layer output

2. Watch parameters values

3. Watch gradients

公告