HOP-Rec: High-Order Proximity for Implicit Recommendation

概
符号说明
HOP-Rec

Yang J., Chen C., Wang C. and Tsai M. HOP-Rec: high-order proximity for implicit recommendation. In ACM Conference on Recommender Systems (RecSys), 2018.

概

利用高阶信息处理图的分解模型.

符号说明

\(U\), users;
\(I\), items;
\(A \in \{0, 1\}^{|U| \times |I|}\), biadjacency matrix;
\(V = U \cup I\), 结点;
\(e_{ij} \in E \text{ if } a_{ij} = 1\), 边;
\(G = (V, E)\), 二部图;
\(S_u = (u, i_1, u_1, \ldots, u_{k-1}, i_k, \ldots)\) 是从 \(G\) 中以 \(u\) 为起点通过 rankdom walk 采样的一个序列;
\(p_u^k(i_k)\) 表示 \(S_u\) 中 \(I_k = i_k\) 的概率;
\(\theta_U \in \mathbb{R}^{|U| \times d}, \theta_I \in \mathbb{R}^{|I| \times d}\), user, item 的 embeddings.

HOP-Rec

一般的 MF 方法可以归结为:

\[\mathcal{L} = \sum_{u, i} c_{ui} (a_{ui} - \theta_u^T \theta_i)^2 + \lambda_{\Theta} \|\Theta\|_2^2; \]
而采样 BPR 或者 WARP 方法则可以归结为

\[\mathcal{L}_{rank} = \sum_{u, (i, i')} \mathcal{F}(\theta_u^T\theta_{i'}, \theta_u^T \theta_i) + \lambda_{\Theta} \|\Theta\|_2^2, \]
其中 \(i, i'\) 分别表示和 \(u\) 有关联 (positive) 和无关联 (negative) 的items.
本文的思路是采用如下的损失:

\[\mathcal{L}_{HOP} = \sum_u \sum_{1 \le k\le K} \overbrace{C(k) \mathbb{E}_{i \sim P_u^k, i' \sim P_N}}^{\text{graph model}} \overbrace{[\mathcal{F}(\theta_u^T\theta_{i'}, \theta_u^T \theta_i) ]}^{\text{factorization model}} + \lambda_{\Theta} \|\Theta\|_2^2, \]
这里 \(P_N\) 表示在所有 items 上的一个均匀采样, \(C(k)\) 为一个因子, 这里

\[\mathcal{F}(\theta_u^T\theta_{i'}, \theta_u^T \theta_i) = \mathbb{I}(\theta_u^T \theta_{i'} - \theta^T_u \theta_i \ge \epsilon_k) \log [\sigma(\theta_u^T \theta_{i'} - \theta_u^T \theta_i)], \]
通常 \(\epsilon_k = \epsilon / k\);
注意到对于每一个 \(u\) 该损失相当于 (没看代码, 这里只是单纯的个人的理解) :
- 通过 random walk 采样 \(S_u\);
- 对于每一个 \(k\), 将 \(S_u\) 中的 \(i_k\) 作为正样本, 然后随机从其它的所有的 items 中采样一负样本 \(i_k'\), 然后通过
  \[C(k)\mathcal{F}(\theta_u^T\theta_{i_k'}, \theta_u^T \theta_{i_k}) + \lambda_{\Theta} \|\Theta\|_2^2, \]
  计算损失;
我们知道, \(i_k\) 并不一定是用户 \(u\) 直接的邻居, 可能是周转好几次的邻居, 所以一般来说 \(k\) 越大, 这个'邻居'是正样本, 即和 \(u\) 紧密联系的概率就会越小, 而因子 \(C(k)\) 就是为了度量这一性质存在的, 通常我们会取

\[C(k) = 1 / k \]
来逐渐弱化;
此外, 作者认为, 因为在现实生活中, 每个结点的度 (degree) 通常是服从二八律的, 即 degree 较小的结点占据了大多数, 故如果每一个结点随机跳转到其它结点容易导致采样的大部分是 degree 较小的点 (我不是很认同啊, why?). 故作者设计的转移概率为:

\[P(y|x) = \left \{ \begin{array}{ll} \frac{a_{xy} deg(y)}{\sum_{y'} a_{xy'} deg(y')} & \text{if } x \in U \\ \frac{a_{yx} deg(y)}{\sum_{y'} a_{y'x} deg(y')} & \text{if } x \in I \\ \end{array} \right . \]
即转移概率倾向于那些 degree 大的点.

posted @ 2022-07-24 11:16 馒头and花卷阅读(153) 评论(0) 收藏举报

刷新页面返回顶部

馒头and花卷

HOP-Rec: High-Order Proximity for Implicit Recommendation

概

符号说明

HOP-Rec

公告