Learning Multiple Tasks with Multilinear Relationship Networks

继cross-stitch unit后，阅读了第二篇Feature Transformation Approach.

这篇文章中将Multi-task learning 方法分为了两类：1)multi-task feature learning that learns a shared fearture representation;

2)multi-task relationship learning that models inherent task relationship

1. Motivation

How to exploit the task relatedness underlying parameter tensors and improve feature transferability in the multiple task-specific layers.

2.Multilinear Relationship Network (MRN)

MRN integrates deep neural networks with tensor normal priors over the network parameters of all task-specific layers, which model the task relatedness through the covariance structures over tasks, classes and features to enable transfer across related tasks. By jointly learning transferable features and multilinear relationships, MRN is able to circumvent the dilemma of negative-transfer in feature layers and under-transfer in classifier layer.

2.1 Tensor Normal Distribution

Probability Density Function

进一步地

Maximum Likelihood Estimation

2.2 Multilinear Relationship Networks

2.3 Model

The empirical error of CNN on {X_t, Y_t} is

where J is the cross-entropy loss function, and f_t (x^t_n) is the conditional probability that CNN assigns x^t_n to label y^t_n.

In order to capture the task relationship in the network parameters of all T tasks, we construct the l-th layer parameter tensor as W^l=[W^1,l;...;W^T,l]∈R^D₁l×D₂l^×T.the set of parameter tensors of all the task-specific layers L={fc7, fc8}.

defines the prior for the l-th layer parameter tensor by tensor normal distribution as

Multilinear Relationship Network (MRN) formally writing as

(1)

3.Algorithm

(1) is jointly non-convex with respect to the parameter tensors W as well as feature covariance Σ₁^l , class covariance Σ₂^l , and task covariance Σ₃^l .Thus, we alternatively optimize one set of variables with the others fixed.

4.Discussion

1.Learning with feature covariances can be viewed as a representative formulation in feature-based methods;

This can bd viewed as a special cade of Equation(1) by setting all covariance matrices but the feature covariance to identity matrix, i.e. Σ_k=I|_k=2^K;

2.Learning with task relations is for parameter-based methods

This can be viewed as a special case of Equation(1) by setting all covariance matrices but the task covariance to identity matrix, i.e. Σ_k=I|_k=1^K-1

The proposed MRN is more general in the architecture perspective in dealing with parameter tensors in multiple layers of deep neural networks.

这篇文章主要是围绕how to exploit the task relatedness underlying parameter tensors and improve feature transferability in the multiple task-specific layers. 文章使用的方法是基于 multiple task-specific layesr寻找task relationships ，基于这种task relationship去寻找transfered features.

基于tensor normal distribution 可以推出task relationship的最大后验估计，使用训练数据对其进行更新。

posted on 2023-09-19 22:49 wkkh 阅读(25) 评论(0) 编辑收藏举报

刷新页面返回顶部

wenjun99