Retrofitting Analysis
Retrofitting Analysis
To figure out the process of retrofitting[1] objective updating, we do the following math.
Forward Derivation
\[\psi(Q) = \sum_{i=1}^{n}\left[ \alpha_i||q_i-\hat{q_i}||^2 + \sum\beta||q_i-q_j||^2 \right] \\
\frac{\partial \psi(Q)}{\partial q_i} = \alpha_i(q_i-\hat{q_i}) + \sum\beta(q_i-q_j) = 0 \\
(\alpha_i+\sum\beta_{ij})q_i -\alpha_i\hat{q_i} -\sum\beta_{ij}q_j = 0 \\
q_i = \frac{\sum\beta_{ij}q_j+\alpha_i\hat{q_i}}{\sum\beta_{ij}+\alpha_i}
\]
Backward Derivation
This is how I understood this updating equation.
In the paper[1], it has mentioned "We take the first derivative of \(\psi\) with respect to one qi vector, and by equating it to zero", hence we get follow idea:
\[\frac{\partial\psi(Q)}{\partial q_i} = 0
\]
And,
\[q_i = \frac{\sum\beta_{ij}q_j+\alpha_i\hat{q_i}}{\sum\beta_{ij}+\alpha_i} \\
\alpha_iq_i - \alpha_i\hat{q_j} + \sum\beta_{ij}q_i - \sum\beta q_j = 0 \\
\alpha_i(q_i-\hat{q_j})+ \sum\beta_{ij}(q_i-q_j) = 0
\]
Apparently,
\[\frac{\partial\psi(Q)}{\partial q_i} = \alpha_i(q_i-\hat{q_j})+ \sum\beta_{ij}(q_i-q_j) = 0
\]
Reference
Faruqui M, Dodge J, Jauhar S K, et al. Retrofitting Word Vectors to Semantic Lexicons[J]. ACL, 2015.
智慧在街市上呼喊,在宽阔处发声。