Learning to Denoise Unreliable Interactions for Graph Collaborative Filtering

Tian C., Xie Y., Li Y., Yang N. and Zhao W. Learning to denoise unreliable interactions for graph collaborative filtering. In ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2022.

本文用 Graph Denoising Module 去噪, 用 Diversity Preserving Module 扩展 diversity, 并用 InfoNCE 加以约束 (这部分设计还挺有意思的).

符号说明

  • \(\mathcal{U}\), user;
  • \(\mathcal{I}\), item;
  • \(R \in \mathbb{R}^{|\mathcal{U}| \times |\mathcal{I}|}\), 交互矩阵;
  • \(E_U \in \mathbb{R}^{|\mathcal{U}| \times d}, E_I \in \mathbb{R}^{|\mathcal{I}| \times d}\), embeddings;

reliability degree

为了计算 reliablity degree, 可以通过如下方式:

  1. 计算 one-hop features:

    \[H_U^s = RE_I, H_I^s = R^T E_U; \]

  2. 对于任一的 \(u, i\), 二者的 reliability degree 为

    \[\tag{3} s_{u, i} = (\cos (h_u^s, h_i^s) + 1) / 2, \]

    其中

    \[\cos(h_u^s, h_i^s) = \frac{{h_u^s}^T h_i^s}{\|h_u^s\|_2 \cdot \|h_v \|_2}. \]

    需要注意的是, 这里用 (3) 而不直接用 cosine similarity 的原因是 (3) 是非负的.

Denoised Interaction Graph

有了上面的 reliability degree, 作者希望将一些不那么靠谱的 (u, i) 的交互去掉, 从而获得 denoised interaction graph \(\widetilde{R}\):

\[\widetilde{r}_{u, i} = \mathbb{I}(s_{u, i} > \beta) \cdot s_{u, i}, \text{if } r_{u, i} \not = 0, \text{ else } \widetilde{r}_{u, i} = r_{u, i}. \]

Diversity Preserving Module

因为 \(\tilde{R}\) 删除了很多 (u, i) 交互, 可能会导致最终的结果的 diversity 不足 (?), 所以, 本文额外构造一个 augmented graph \(\ddot{R}\):

  1. 首先从不存在交互的节点中均匀采样:

    \[C = \text{Sample}(\{(u, i)| u \in \mathcal{U}, i \in \mathcal{I}, r_{u, i} = 0\}); \]

  2. 从中选取可靠的点:

    \[\ddot{r}_{u, i} = \mathbb{I}(s_{u, i} \in \text{top-M}(S)) \cdot s_{u, i}, \: \text{if } r_{u, i} \not = 0, \text{ else } \ddot{r}_{u, i} = r_{u, i}. \]

流程

  1. 初始化 \(H_U^{(0)} = E_U, H_I^{(0)} = E_I\), 计算 \(\widetilde{R}, \ddot{R}\);

  2. 计算

    \[\widetilde{H}_U, \widetilde{H}_I = \text{GNN}(H_U^{(0)}, H_I^{(0)}, \widetilde{R}), \\ \ddot{H}_U, \ddot{H}_I = \text{GNN}(H_U^{(0)}, H_I^{(0)}, \ddot{R}), \]

    这里 GNN 可以是别的推荐模型, 比如 LightGCN;

  3. 计算 (\(\cdot\) 这里表示内积)

    \[\hat{y}_{u, i} = \widetilde{h}_u \cdot \widetilde{h}_i, \]

    并以此计算 BPR 损失:

    \[\mathcal{L}_{BPR} = \sum_{u \in \mathcal{U}} \sum_{i \in \mathcal{N}_u} \sum_{j \in I, j \not \in \mathcal{N}_u} -\log \sigma(\hat{y}_{u, i} - \hat{y}_{u, j}); \]

  4. 我们希望 \(\widetilde{R}\)\(\ddot{R}\) 所隐含的特征分布是近似的, 作者采用对比损失来实现 (把 \(\widetilde{R}, \ddot{R}\) 看成是两个不同的 augmentation 后的结果):

    \[\mathcal{L}_{DIV} = -\sum_{u, v \in \mathcal{U}} \log \frac{\exp(\cos(\widetilde{h}_u, \ddot{h}_u) / \tau )}{\exp(\cos(\widetilde{h}_u, \ddot{h}_u) / \tau ) + \sum_{v \not = u} \exp(\cos(\widetilde{h}_u, \ddot{h}_v) / \tau )}; \]

  5. 最后总的损失为

    \[\mathcal{L} = \mathcal{L}_{BPR} + \lambda_1 \mathcal{L}_{DIV} + \lambda \|\Theta\|_2^2. \]

posted @ 2022-07-27 15:58  馒头and花卷  阅读(312)  评论(0编辑  收藏  举报