CV baseline之GoogLeNet v1

作业内容：

1：文字回答：GoogLeNet采用了几个辅助损失？辅助损失函数的权重是多少？为什么要采用辅助损失函数？

在Inception4b和Inception4e增加两个辅助分类层，用于计算辅助损失，对于每种辅助损失，论文中使用的权重值为0.3。作用：loss回传；充当正则约束，迫使中间层特征也能具备分类能力

2：文字回答：Inception模块中有几个分支？分别是哪些操作？Inception模块输出时特征图采用什么方式融合？

4个分支，1）有128个卷积核的1x1卷积 2）进行64个卷积核的1x1卷积，然后192个卷积核的3x3卷积 3）64个卷积核的1x1卷积，96个卷积核的5x5卷积 4）3x3 pool层，64个卷积核的1x1卷积

特征融合：GoogLeNet应用了赫布理论（the Hebbian principle：一起激发的神经元连在一起），Inception将各个分支所生成的特征图按照从左往右将通道加在一起，然后融合输出。其中，每个分支的特征图到紧挨在一起的，这也体现了赫布理论。

3：文字回答：读完该论文，对你的启发点有哪些？

1）池化损失空间分辨率，但在定位、检测和人体姿态中仍应用。即定位、检测和人体姿态识别这些任务十分注重空间分辨率信息

Despite concerns that max-pooling layers result in loss of accurate spatial information, the same convolutioanl network architecture has also been successfully employed for localization, object detection and human pose estimation.

2）增加模型深度和宽度，可有效提升性能，但有两个缺点：容易过拟合，以及计算量过大
The most straightforward way of improving the performance of deep neural networks is by increasing their size. Bigger size typically means a larger number of parameters, which makes the enlarged network more prone to overfitting.

3）为节省内存消耗，先将分辨率降低，再堆叠使用Inception module

For technical reasons(memory efficiency during training), it seemed beneficial to start using Inception modules only at higher layers while keeping the lower layers in traditional convolutional fashion.

4）最后一个全连接层，是为了更方便的微调，迁移学习

We use an extra linear layer. This enables adapting and fine-tuning our networks for other label sets easily.

5）网络中间层特征对于分类也具有判别性

One interesting insight is that the strong performance of relatively shallower networks on this task suggests that the features produced by the layers in the middle of the network shoud be very discriminative.

6）学习率下降策略哦为每8个epochs下降4%（loss曲线很平滑）

fixed learning rate schedule (decreasing the learning rate by 4% every 8 epochs)

7）数据增强指导方针：尺寸在8%-100%；2.长宽比在[3/4,4/3]；3.光照畸变有效

Still, one prescriotion that was verified to work very well after competition includes sampling of various sized pathces of the image whose size is distributied evenly between 8% and 100% of the image area and whose aspect ratio is chosed randomly between 3/4 and 4/3. Also, we found that the photometric distortions were useful to combat overfitting to some extent.

8）随机采用差值方法可提升性能

We started to use random interpolation methods(bilinear, area, nearest neighbor and cubic, with equal probability) for resizing relatively late and in conjuction with other hyperparameter changes.

9）实际应用中没必要用144 crops

We note that such agrresive cropping may not be necessary, in reall applications.

4：代码实现：从网上找一张图片，执行GoogLeNet，观察top5输出的类别，并将输出结果截图进行打卡。

img: Golden Retriever from baidu.jpg is: golden retriever
207 n02099601 狗, golden retriever

5.文字：本篇论文的学习笔记及总结

posted @ 2020-08-05 15:38 sariel_sakura 阅读(243) 评论(0) 编辑收藏举报

刷新页面返回顶部

sariel_sakura

属于自己的小窝

CV baseline之GoogLeNet v1