CV baseline之GoogLeNet v1
Despite concerns that max-pooling layers result in loss of accurate spatial information, the same convolutioanl network architecture has also been successfully employed for localization, object detection and human pose estimation.
The most straightforward way of improving the performance of deep neural networks is by increasing their size. Bigger size typically means a larger number of parameters, which makes the enlarged network more prone to overfitting.
3)为节省内存消耗,先将分辨率降低,再堆叠使用Inception module
For technical reasons(memory efficiency during training), it seemed beneficial to start using Inception modules only at higher layers while keeping the lower layers in traditional convolutional fashion.
We use an extra linear layer. This enables adapting and fine-tuning our networks for other label sets easily.
One interesting insight is that the strong performance of relatively shallower networks on this task suggests that the features produced by the layers in the middle of the network shoud be very discriminative.
fixed learning rate schedule (decreasing the learning rate by 4% every 8 epochs)
Still, one prescriotion that was verified to work very well after competition includes sampling of various sized pathces of the image whose size is distributied evenly between 8% and 100% of the image area and whose aspect ratio is chosed randomly between 3/4 and 4/3. Also, we found that the photometric distortions were useful to combat overfitting to some extent.
We started to use random interpolation methods(bilinear, area, nearest neighbor and cubic, with equal probability) for resizing relatively late and in conjuction with other hyperparameter changes.
9)实际应用中没必要用144 crops
We note that such agrresive cropping may not be necessary, in reall applications.
img: Golden Retriever from baidu.jpg is: golden retriever
207 n02099601 狗, golden retriever