HRNet学习笔记

对于位置信息敏感的视觉问题，高分辨率的表征representation是非常重要，如人体姿态估计、语义分割、物体检测。而以往的都是先降后升，比如encoder-decoder、SegNet、UNet，先通过一个backbone降低分辨率，然后再通过上采样或者反卷积等恢复分辨率，或者使用空洞卷积来避免一些下采样降低分辨率。提出一种新的结构，HRNet，在整个过程中能保持分辨率不变。

总体结构：首先，经过stem，将分辨率降为1/4： We input the image into a stem, which consists of two stride-2 3 × 3 convolutions decreasing the resolution to 1/4然后，从高分辨率的卷积流开始，慢慢地往最后一个卷积增加一个从高分辨率到低分辨率的卷积流，在增加新的卷积流的同时，进行多分辨率融合，以此来交换或者说共享不同尺度的信息。We start from a high resolution convolution stream, gradually add high-to-low resolution convolution streams one by one, and connect the multi-resolution streams in parallel. The resulting network consists of several (4 in this paper) stages as depicted in Figure 2, and the nth stage contains n streams corresponding to n resolutions. We conduct repeated multi-resolution fusions by exchanging the information across the parallel streams over and over
Repeated Multi-Resolution Fusions： The goal of the fusion module is to exchange the information across multi-resolution representations.
Representation Head

posted on 2020-10-25 18:51 ZhicongHou 阅读(264) 评论(0) 编辑收藏举报

刷新页面返回顶部