论文阅读 DeepFM:A Factorization-Machine based Neural Network for CTR Prediction
2020.7.30
Introduction
- CTR数据的重要性
- 隐式数据
- 很难通过先验知识得到
- can only be captured automatically by machine learning.
- 线性模型特征提取能力较弱
- 因子分解机
- model pairwise feature interactions as inner product of latent vectors between features and show very promising results.
- However : only order-2 feature interactions are considered
- Deep Neural Network
- 优点:特征提取
- CNN:偏向于相邻要素交互
- RNN:适合顺序依赖型的点击数据
- This paper:
- it is possible to derive a learning model that is able to learn feature interactions of all orders in an endto-end manner, without any feature engineering besides raw features.
- Low-order frature: FM
- High-order feature:DNN
Our Approach
2.1 DeepFM
- 数据类型\((\mathcal X ,y)\)
- \(\mathcal X\) : 表示user 和 item的一系列特征
- \(y\) : 数据标签
- \(x=[x_{field_1},x_{field_2},...,x_{field_m}]\)
- 一般来说,x维数高且非常稀疏
- task : \(\hat y = CTR_{model}(x)\)
- 模型:
- \(\hat y =sigmoid(y_{FM}+y_{DNN})\) (1)
FM模型
- 过去的方法只能训练同时出现在同一数据集中的特征,FM可以通过内积方法发现隐式信息向量,所以更多的信息可以被发现
- \(\langle w,x\rangle\)反映了第一层的特征,内及部分反映了第二层的特征
Deep Component
-
嵌入层的设计输出表示为\(a^{(0)} = [e_1,e_2,...e_m]\) ,\(e_i\)是第\(i\)个特征的embedding
-
然后把\(a^{(0)}\)喂入神经网络
\[a^{(l+1)}=\sigma \left(W^{(l)}a^{(l)}+b^{(l)}\right) \\ y_{DNN}=\sigma\left(W^{|H|+1}\cdot a^H +b^{|H|+1} \right) \]
it is worth pointing out that FM component and deep component share the same feature embedding, which brings two important benefits: 1) it learns both low- and high-order feature interactions from raw features; 2) there is no need for expertise feature engineering of the input, as required in Wide& Deep
2.2 Relationship with the other Neural Networks
其他模型图示
模型对比
- FNN
- the embedding parameters might be over affected by FM
- the efficiency is reduced by the overhead introduced by the pre-training stage.
- PNN
- it still suffers from high computational complexity
- Like FNN, all PNNs ignore low-order feature interactions.
- Wide & Deep
- there is a need for expertise feature engineering on the input to the “wide” part
DeepFM is the only model that requires no pre-training and no feature engineering , and captures both low- and high-order feature interactions
Experiments
3.1 Experiment Setup
- 数据集
- 评估指标
- AUC (Area Under ROC) and Logloss (cross entropy).
- Model Comparison
- LR, FM, FNN, PNN (three variants), Wide & Deep, and DeepFM.
- 参数设置
3.2 Performance Evaluation
Efficiency Comparison(效率)
-
实验结果
-
pre-training of FNN makes it less efficient
-
Although the speed up of IPNN and PNN on GPU is higher than the other models, they are still computationally expensive because of the inefficient inner product operations;
-
The DeepFM achieves almost the most efficient in both tests.
-
Effectiveness Comparison(效果)
3.3 Hyper-Parameter Study
超参数
- activation functions; 2) dropout rate; 3) number of neurons per layer; 4) number of hidden layers; 5) network shape.
-
activation function
- relu is more appropriate than tanh for all the deep models, except for IPNN
-
Dropout
- Dropout is a regularization technique to compromise the precision and the complexity of the neural network.
- all the models are able to reach their own best performance when the dropout is properly set (from 0.6 to 0.9).
- The result shows that adding reasonable randomness to model can strengthen model’s robustness
-
Number of Neurons per Layer
- increasing the number of neurons does not always bring benefit.
-
Number of Hidden Layers
- increasing number of hidden layers improves the performance of the models at the beginning
- overfitting
-
Network Shape
- the “constant” network shape is empirically better than the other three options, which is consistent with previous studies
Related Work
- CTR模型重要作用
- linear models and FM
- tree-based model ; tensor based model ; support vector machine ; bayesian model
- 深度学习在推荐系统的运用
- improve Collaborative Filtering via deep learning
- extract content feature by deep learning to improve the performance of music recommendation
- a deep learning network to consider both image feature and basic feature of display adverting
- two-stage deep learning framework for YouTube video recommendation