Cross-stitch Networks
继那天整理完Multi-task Learning的一些基本常识以后,最近开始看涉及到的一些经典文章,下面对cross-stitch unit整理
1. Motivation
Multi-task 网络有task-specific 和 shared representation 两个部分。传统网络在设计这两个部分时,一般时通过尝试所有可能网络选择最优网络,这种方法效率很低。文章基于这种提出了一种网络可以自动学习an optimal combination of shared and task-specificrepresentations.可以理解为对之前的方法进行了数学建模。
2.Cross-stitch unit
称αABαBA为αD,即different task parameter, αAAαBB为αS,即same-task parameter。通过改变他们的值,可以设计网络为same-task 或者 different task
3.Design desicions for cross-stitching
3.1 Cross-stitch units initialization and learning rates:
The initialization of α in the range[0,1]
3.2 Network initialization——how should one initialize the networks A and B?
(基于AlexNet)
1. Initialize networks A and B by networks that were trained on these tasks separately,
2. Have the same initialization and train them jointly.
4.Future work
1.Where in the network should they be used;
2.How should their weights be constrained, is an interesting future direction.