Neural Factorization Machines for Sparse Predictive Analytics

He X. and Chua T. Neural factorization machines for sparse predictive analytics. In International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2017.

引入 B-Interaction Layer 引入 二阶的特征交叉, 并通过 MLP 提取 high-order 信息. 和 DeepFM 的区别就是并联和串联的区别?

主要内容

  1. 稀疏特征 \(\bm{x}\);
  2. 通过 embedding layer 获得:

\[\mathcal{V}_x = \{x_1 \bm{v}_1, x_2 \bm{v}_2, \cdots, x_n \bm{v}_n\}; \]

  1. 通过 Bi-Interaction Layer 获得交叉特征:

\[f_{BI}(\mathcal{V}_x) = \sum_{i=1}^n \sum_{j = i + 1} x_i \bm{v}_i \odot x_j \bm{v}_j, \]

其中 \(\odot\) 是 element-wise 乘法;
4. 通过 MLP 获得 high-order 信息:

\[\bm{z}_1 = \sigma_1(W_1 f_{BI}(\mathcal{V}_x) + \bm{b}_1), \\ \bm{z}_2 = \sigma_2(W_2 \bm{z}_1) + \bm{b}_2), \\ \vdots \\ \bm{z}_L = \sigma_L(W_L \bm{z}_{L-1}) + \bm{b}_L). \\ \]

  1. NFM:

\[\hat{y}_{NFM}(\bm{x}) = w_0 + \bm{w}^T\bm{x} + \bm{h}^T \bm{z}_L. \]

  1. 如果是预测得分, 可以通过

\[L_{reg} = \sum_{\bm{x} \in \mathcal{X}} (\hat{y}(\bm{x}) - y(\bm{x}))^2 \]

来训练, 如果是分类, 则可以用 log loss ...

代码

[official]
[PyTorch]
[TensorFlow]

posted @ 2022-05-25 11:44  馒头and花卷  阅读(68)  评论(0编辑  收藏  举报