论文阅读 DeepFM:A Factorization-Machine based Neural Network for CTR Prediction

2020.7.30

Introduction

  • CTR数据的重要性
  • 隐式数据
    • 很难通过先验知识得到
    • can only be captured automatically by machine learning.
  • 线性模型特征提取能力较弱
  • 因子分解机
    • model pairwise feature interactions as inner product of latent vectors between features and show very promising results.
    • However : only order-2 feature interactions are considered
  • Deep Neural Network
    • 优点:特征提取
    • CNN:偏向于相邻要素交互
    • RNN:适合顺序依赖型的点击数据
  • This paper:
    • it is possible to derive a learning model that is able to learn feature interactions of all orders in an endto-end manner, without any feature engineering besides raw features.
    • Low-order frature: FM
    • High-order feature:DNN

Our Approach

2.1 DeepFM

  • 数据类型\((\mathcal X ,y)\)
    • \(\mathcal X\) : 表示user 和 item的一系列特征
    • \(y\) : 数据标签
    • \(x=[x_{field_1},x_{field_2},...,x_{field_m}]\)
      • 一般来说,x维数高且非常稀疏
  • task : \(\hat y = CTR_{model}(x)\)
  • 模型:

  • \(\hat y =sigmoid(y_{FM}+y_{DNN})\) (1)

FM模型

  • 过去的方法只能训练同时出现在同一数据集中的特征,FM可以通过内积方法发现隐式信息向量,所以更多的信息可以被发现

\[y_{FM}=\left \langle w,x\right\rangle+\sum_{j_1=1}^d \sum^d_{j_2 = j_1+1}\left \langle V_i,V_j\right \rangle x_{j_1} \cdot x_{j_2} \quad\quad\quad\qquad(2) \]

  • \(\langle w,x\rangle\)反映了第一层的特征,内及部分反映了第二层的特征

Deep Component

  • 嵌入层的设计输出表示为\(a^{(0)} = [e_1,e_2,...e_m]\) ,\(e_i\)是第\(i\)个特征的embedding

  • 然后把\(a^{(0)}\)喂入神经网络

    \[a^{(l+1)}=\sigma \left(W^{(l)}a^{(l)}+b^{(l)}\right) \\ y_{DNN}=\sigma\left(W^{|H|+1}\cdot a^H +b^{|H|+1} \right) \]

it is worth pointing out that FM component and deep component share the same feature embedding, which brings two important benefits: 1) it learns both low- and high-order feature interactions from raw features; 2) there is no need for expertise feature engineering of the input, as required in Wide& Deep

2.2 Relationship with the other Neural Networks

其他模型图示

模型对比

  • FNN
    • the embedding parameters might be over affected by FM
    • the efficiency is reduced by the overhead introduced by the pre-training stage.
  • PNN
    • it still suffers from high computational complexity
    • Like FNN, all PNNs ignore low-order feature interactions.
  • Wide & Deep
    • there is a need for expertise feature engineering on the input to the “wide” part

DeepFM is the only model that requires no pre-training and no feature engineering , and captures both low- and high-order feature interactions

Experiments

3.1 Experiment Setup

  • 数据集
  • 评估指标
    • AUC (Area Under ROC) and Logloss (cross entropy).
  • Model Comparison
    • LR, FM, FNN, PNN (three variants), Wide & Deep, and DeepFM.
  • 参数设置

3.2 Performance Evaluation

Efficiency Comparison(效率)

\[\frac{t_{train} \ of \ deep \ CTR \ model }{t_{trian}\ of \ LR} \]

  • 实验结果

    1. pre-training of FNN makes it less efficient

    2. Although the speed up of IPNN and PNN on GPU is higher than the other models, they are still computationally expensive because of the inefficient inner product operations;

    3. The DeepFM achieves almost the most efficient in both tests.

Effectiveness Comparison(效果)

3.3 Hyper-Parameter Study

超参数

  1. activation functions; 2) dropout rate; 3) number of neurons per layer; 4) number of hidden layers; 5) network shape.
  • activation function

    • relu is more appropriate than tanh for all the deep models, except for IPNN
  • Dropout

    • Dropout is a regularization technique to compromise the precision and the complexity of the neural network.
    • all the models are able to reach their own best performance when the dropout is properly set (from 0.6 to 0.9).
    • The result shows that adding reasonable randomness to model can strengthen model’s robustness
  • Number of Neurons per Layer

    • increasing the number of neurons does not always bring benefit.
  • Number of Hidden Layers

    • increasing number of hidden layers improves the performance of the models at the beginning
    • overfitting
  • Network Shape

    • the “constant” network shape is empirically better than the other three options, which is consistent with previous studies
  • CTR模型重要作用
    • linear models and FM
    • tree-based model ; tensor based model ; support vector machine ; bayesian model
  • 深度学习在推荐系统的运用
    • improve Collaborative Filtering via deep learning
    • extract content feature by deep learning to improve the performance of music recommendation
    • a deep learning network to consider both image feature and basic feature of display adverting
    • two-stage deep learning framework for YouTube video recommendation

Conclusions

posted @ 2020-10-16 15:29  无证_骑士  阅读(125)  评论(0)    收藏  举报
页脚HTML页码