推荐系统
推荐系统——电影评分
例子:预测电影评分
有如下信息
Movie | Alice (1) | Bob (2) | Carol (3) | Dave (4) |
Love at last | 5 | 5 | 0 | 0 |
Romance forever | 5 | ? | ? | 0 |
Cute puppies of love | ? | 4 | 0 | ? |
Nonstop car chases | 0 | 0 | 5 | 4 |
Swords vs. karate | 0 | 0 | 5 | ? |
定义
nu = 用户的数量
nm = 电影的数量
r(i, j) = 1 如果用户 j 给电影 i 打分
y(i, j) = 当 r(i, j) = 1 的情况下,用户 j 给电影 i 打的分数(0-5)
目标:预测 ?的值(未评分用户对电影的评分)
现假设每个电影有两个特征
Movie | Alice (1) | Bob (2) | Carol (3) | Dave (4) |
x1 (remoance) |
x2 (action) |
Love at last | 5 | 5 | 0 | 0 | 0.9 | 0 |
Romance forever | 5 | ? | ? | 0 | 1.0 | 0.01 |
Cute puppies of love | ? | 4 | 0 | ? | 0.99 | 0 |
Nonstop car chases | 0 | 0 | 5 | 4 | 0.1 | 1.0 |
Swords vs. karate | 0 | 0 | 5 | ? | 0 | 0.9 |
这样就有了电影特征的训练集,比如对于电影 Love at least 的特征向量为
\[{x^{\left( 1 \right)}} = \left[ {\begin{array}{*{20}{c}}
{{x_0}}\\
{{x_1}}\\
{{x_2}}
\end{array}} \right] = \left[ {\begin{array}{*{20}{c}}
1\\
{0.9}\\
0
\end{array}} \right]\]
对于每个用户 j ,学习其对应的单数 θ(j) ∈ R3,然后用 (θ(1))Tx(i) 预测用户 j 对于电影 i 的评分。
用 m(j) = 代表用户 j 评分的电影的数量,则定义学习目标
\[\underbrace {\min }_{{\theta ^{\left( j \right)}}}\frac{1}{{2{m^{\left( j \right)}}}}\sum\limits_{i:{r^{\left( {i,j} \right)}} = 1} {{{\left( {{{\left( {{\theta ^{\left( j \right)}}} \right)}^T}\left( {{x^{\left( i \right)}}} \right) - {y^{\left( {i,j} \right)}}} \right)}^2}} + \frac{\lambda }{{2{m^{\left( j \right)}}}}\sum\limits_{k = 1}^n {{{\left( {\theta _k^{\left( j \right)}} \right)}^2}} \]
在推荐系统中会将 m(j) 去掉,因为他是常数且不会影响计算得到的 θ(j)。
\[\underbrace {\min }_{{\theta ^{\left( j \right)}}}\frac{1}{2}\sum\limits_{i:{r^{\left( {i,j} \right)}} = 1} {{{\left( {{{\left( {{\theta ^{\left( j \right)}}} \right)}^T}\left( {{x^{\left( i \right)}}} \right) - {y^{\left( {i,j} \right)}}} \right)}^2}} + \frac{\lambda }{2}\sum\limits_{k = 1}^n {{{\left( {\theta _k^{\left( j \right)}} \right)}^2}} \]
定义所有用户的学习目标
\[\underbrace {\min }_{{\theta ^{\left( 1 \right)}},...,{\theta ^{\left( {{n_u}} \right)}}}\frac{1}{2}\sum\limits_{j = 1}^{{n_u}} {\left[ {\sum\limits_{i:{r^{\left( {i,j} \right)}} = 1} {{{\left( {{{\left( {{\theta ^{\left( j \right)}}} \right)}^T}\left( {{x^{\left( i \right)}}} \right) - {y^{\left( {i,j} \right)}}} \right)}^2}} + \lambda \sum\limits_{k = 1}^n {{{\left( {\theta _k^{\left( j \right)}} \right)}^2}} } \right]} \]
然后运用梯度下降算法得到最优的 θ
\[\begin{array}{l}
\theta _k^{\left( j \right)}: = \theta _k^{\left( j \right)} - \alpha \sum\limits_{i:{r^{\left( {i,j} \right)}} = 1} {\left( {{{\left( {{\theta ^{\left( j \right)}}} \right)}^T}\left( {{x^{\left( i \right)}}} \right) - {y^{\left( {i,j} \right)}}} \right)x_k^{\left( i \right)}} ---for-k=0\\
\theta _k^{\left( j \right)}: = \theta _k^{\left( j \right)} - \alpha \left( {\sum\limits_{i:{r^{\left( {i,j} \right)}} = 1} {\left( {{{\left( {{\theta ^{\left( j \right)}}} \right)}^T}\left( {{x^{\left( i \right)}}} \right) - {y^{\left( {i,j} \right)}}} \right)x_k^{\left( i \right)} + + \lambda \theta _k^{\left( j \right)}} } \right)---for-k≠0
\end{array}\]
推荐系统——系统过滤
对于数据
Movie | Alice (1) | Bob (2) | Carol (3) | Dave (4) |
x1 (remoance) |
x2 (action) |
Love at last | 5 | 5 | 0 | 0 | 0.9 | 0 |
Romance forever | 5 | ? | ? | 0 | 1.0 | 0.01 |
Cute puppies of love | ? | 4 | 0 | ? | 0.99 | 0 |
Nonstop car chases | 0 | 0 | 5 | 4 | 0.1 | 1.0 |
Swords vs. karate | 0 | 0 | 5 | ? | 0 | 0.9 |
一般情况下其实很难知道一个点有有“多么‘浪漫’”(比如 浪漫概率为 0.9)或者“多么‘动作’”,因此数据应该是如下形态
Movie | Alice (1) | Bob (2) | Carol (3) | Dave (4) |
x1 (remoance) |
x2 (remoance) |
Love at last | 5 | 5 | 0 | 0 | ? | ? |
Romance forever | 5 | ? | ? | 0 | ? | ? |
Cute puppies of love | ? | 4 | 0 | ? | ? | ? |
Nonstop car chases | 0 | 0 | 5 | 4 | ? | ? |
Swords vs. karate | 0 | 0 | 5 | ? | ? | ? |
但是,我们可以调查用户有多么喜欢“浪漫”电影,多么喜欢“动作”电影。因此,我们可以得到如下数据
\[{\theta ^{\left( 1 \right)}} = \left[ {\begin{array}{*{20}{c}}
0\\
5\\
0
\end{array}} \right],{\theta ^{\left( 2 \right)}} = \left[ {\begin{array}{*{20}{c}}
0\\
5\\
0
\end{array}} \right],{\theta ^{\left( 3 \right)}} = \left[ {\begin{array}{*{20}{c}}
0\\
0\\
5
\end{array}} \right],{\theta ^{\left( 4 \right)}} = \left[ {\begin{array}{*{20}{c}}
0\\
0\\
5
\end{array}} \right]\]
分析:对于电影“Love at last”,我们知道 Alice 和 Bob 喜欢这部电影,Carol 和 Dave 不喜欢这部电影;而 Alice 和 Bob 又都喜欢“浪漫电影”,Carol 和 Dave 又都不喜欢“浪漫”电影,我们可以推断这部电影是“浪漫”电影,而不是“动作电影”,即(x1 = 1.0, x2 = 0.0)
运用公式表达就是
\[\begin{array}{l}
{\left( {{\theta ^{\left( 1 \right)}}} \right)^T}{x^{\left( 1 \right)}} \approx 5\\
{\left( {{\theta ^{\left( 2 \right)}}} \right)^T}{x^{\left( 1 \right)}} \approx 5\\
{\left( {{\theta ^{\left( 3 \right)}}} \right)^T}{x^{\left( 1 \right)}} \approx 0\\
{\left( {{\theta ^{\left( 4 \right)}}} \right)^T}{x^{\left( 1 \right)}} \approx 0
\end{array}\]
这样,在已知 θ 的情况下得到
\[{x^{\left( 1 \right)}} = \left[ {\begin{array}{*{20}{c}}
1\\
{1.0}\\
{0.0}
\end{array}} \right]\]
-------------------------------------------------------------------------
这时,我们的问题就转化成已知 θ(i),..., θ(nu)
学习 x(i)
问题转化为
\[\underbrace {\min }_{{x^{\left( j \right)}}}\frac{1}{2}\sum\limits_{i:{r^{\left( {i,j} \right)}} = 1} {{{\left( {{{\left( {{\theta ^{\left( j \right)}}} \right)}^T}\left( {{x^{\left( i \right)}}} \right) - {y^{\left( {i,j} \right)}}} \right)}^2}} + \frac{\lambda }{2}\sum\limits_{k = 1}^n {{{\left( {x_k^{\left( j \right)}} \right)}^2}} \]
对于所有 x(i),..., x(nm)。问题为
\[\underbrace {\min }_{{x^{\left( 1 \right)}},...,{x^{\left( {{n_m}} \right)}}}\frac{1}{2}\sum\limits_{j = 1}^{{n_m}} {\left[ {\sum\limits_{i:{r^{\left( {i,j} \right)}} = 1} {{{\left( {{{\left( {{\theta ^{\left( j \right)}}} \right)}^T}\left( {{x^{\left( i \right)}}} \right) - {y^{\left( {i,j} \right)}}} \right)}^2}} + \lambda \sum\limits_{k = 1}^n {{{\left( {x_k^{\left( i \right)}} \right)}^2}} } \right]} \]
因此,对于给定 x(i),..., x(nm) 和 “电影评分”,可以评价 θ(i),..., θ(nu);
给定 θ(i),..., θ(nu) 和 “电影评分”,可以评价 x(i),..., x(nm)。
当遇到问题时,可以
随机猜测 θ-->x-->θ-->x-->θ-->x-->...
-------------------------------------------------------------------------
在“协同过滤”应用中,并不是θ-->x-->θ-->x-->θ-->x-->...,而是将两者结合在一起
\[J\left( {{x^{\left( 1 \right)}},...,{x^{\left( {{n_m}} \right)}},{\theta ^{\left( 1 \right)}},...,{\theta ^{\left( {{n_u}} \right)}}} \right) = \frac{1}{2}\sum\limits_{\left( {i,j} \right):r\left( {i,j} \right) = 1} {{{\left( {{{\left( {{\theta ^{\left( j \right)}}} \right)}^T}\left( {{x^{\left( i \right)}}} \right) - {y^{\left( {i,j} \right)}}} \right)}^2}} + \frac{\lambda }{2}\sum\limits_{i = 1}^{{n_m}} {\sum\limits_{j = 1}^n {{{\left( {x_k^{\left( i \right)}} \right)}^2}} } + \frac{\lambda }{2}\sum\limits_{j = 1}^{{n_u}} {\sum\limits_{j = 1}^n {{{\left( {\theta _k^{\left( j \right)}} \right)}^2}} } \]
\[\underbrace {\min }_{{x^{\left( 1 \right)}},...,{x^{\left( {{n_m}} \right)}},{\theta ^{\left( 1 \right)}},...,{\theta ^{\left( {{n_u}} \right)}}}J\left( {{x^{\left( 1 \right)}},...,{x^{\left( {{n_m}} \right)}},{\theta ^{\left( 1 \right)}},...,{\theta ^{\left( {{n_u}} \right)}}} \right)\]
-------------------------------------------------------------------------
总结:协同过滤算法
- 随机初始化x(i),..., x(nm), θ(i),..., θ(nu)为小的随机值
- 运用梯度下降算法或者别的优化算法最小化 J(x(i),..., x(nm), θ(i),..., θ(nu))
- 当用户给定他的 θ 时,就可以结合算法学习得来的 x,运用 (θ(j))Tx 预测电影的得分。
-------------------------------------------------------------------------
协同算法的矩阵实现
对于数据
Movie | Alice (1) | Bob (2) | Carol (3) | Dave (4) |
Love at last | 5 | 5 | 0 | 0 |
Romance forever | 5 | ? | ? | 0 |
Cute puppies of love | ? | 4 | 0 | ? |
Nonstop car chases | 0 | 0 | 5 | 4 |
Swords vs. karate | 0 | 0 | 5 | ? |
如果定义
\[Y = \left[ {\begin{array}{*{20}{c}}
5&5&0&0\\
5&?&?&0\\
?&4&0&?\\
0&0&5&4\\
0&0&5&0
\end{array}} \right]\]
\[\Pr edictedratings = \left[ {\begin{array}{*{20}{c}}
{{{\left( {{\theta ^{\left( 1 \right)}}} \right)}^T}\left( {{x^{\left( 1 \right)}}} \right)}&{{{\left( {{\theta ^{\left( 2 \right)}}} \right)}^T}\left( {{x^{\left( 1 \right)}}} \right)}&.&{{{\left( {{\theta ^{\left( {{n_u}} \right)}}} \right)}^T}\left( {{x^{\left( 1 \right)}}} \right)}\\
{{{\left( {{\theta ^{\left( 1 \right)}}} \right)}^T}\left( {{x^{\left( 2 \right)}}} \right)}&{{{\left( {{\theta ^{\left( 2 \right)}}} \right)}^T}\left( {{x^{\left( 2 \right)}}} \right)}&.&{{{\left( {{\theta ^{\left( {{n_u}} \right)}}} \right)}^T}\left( {{x^{\left( 2 \right)}}} \right)}\\
.&.&.&.\\
{{{\left( {{\theta ^{\left( 1 \right)}}} \right)}^T}\left( {{x^{\left( {{n_m}} \right)}}} \right)}&{{{\left( {{\theta ^{\left( 2 \right)}}} \right)}^T}\left( {{x^{\left( {{n_m}} \right)}}} \right)}&.&{{{\left( {{\theta ^{\left( {{n_u}} \right)}}} \right)}^T}\left( {{x^{\left( {{n_m}} \right)}}} \right)}
\end{array}} \right]\]
\[X = \left[ {\begin{array}{*{20}{c}}
{ - {{\left( {{x^{\left( 1 \right)}}} \right)}^T} - }\\
{ - {{\left( {{x^{\left( 2 \right)}}} \right)}^T} - }\\
.\\
{ - {{\left( {{x^{\left( {{n_m}} \right)}}} \right)}^T} - }
\end{array}} \right],\Theta = \left[ {\begin{array}{*{20}{c}}
{ - {{\left( {{\theta ^{\left( 1 \right)}}} \right)}^T} - }\\
{ - {{\left( {{\theta ^{\left( 2 \right)}}} \right)}^T} - }\\
.\\
{ - {{\left( {{\theta ^{\left( {{n_u}} \right)}}} \right)}^T} - }
\end{array}} \right]\]
\[\Pr edictedratings = X{\Theta ^T}\]
-------------------------------------------------------------------------
发现电影的相关性
如果你通过上述算法得到了电影的特征 x(i)。
现在有 5 个已知的电影和它们的特征,如何判断上述特征为 x(i) 的的电影与现有的先惯性大呢?
分别计算这五个电影与上述电影的“距离”,并寻找最小的距离的那个电影就是与这个电影相关性较大的
\[\left\| {{x^{\left( i \right)}} - {x^{\left( j \right)}}} \right\|\]
-------------------------------------------------------------------------
协同过滤算法中的均值归一化
对于数据,如果出现用户没有对任何电影评分
Movie | Alice (1) | Bob (2) | Carol (3) | Dave (4) | Eve (5) |
Love at last | 5 | 5 | 0 | 0 | ? |
Romance forever | 5 | ? | ? | 0 | ? |
Cute puppies of love | ? | 4 | 0 | ? | ? |
Nonstop car chases | 0 | 0 | 5 | 4 | ? |
Swords vs. karate | 0 | 0 | 5 | ? |
? |
这种情况下,在最小化代价函数时
\[\underbrace {\min }_{{x^{\left( 1 \right)}},...,{x^{\left( {{n_m}} \right)}},{\theta ^{\left( 1 \right)}},...,{\theta ^{\left( {{n_u}} \right)}}}\frac{1}{2}\sum\limits_{\left( {i,j} \right):r\left( {i,j} \right) = 1} {{{\left( {{{\left( {{\theta ^{\left( j \right)}}} \right)}^T}\left( {{x^{\left( i \right)}}} \right) - {y^{\left( {i,j} \right)}}} \right)}^2}} + \frac{\lambda }{2}\sum\limits_{i = 1}^{{n_m}} {\sum\limits_{j = 1}^n {{{\left( {x_k^{\left( i \right)}} \right)}^2}} } + \frac{\lambda }{2}\sum\limits_{j = 1}^{{n_u}} {\sum\limits_{j = 1}^n {{{\left( {\theta _k^{\left( j \right)}} \right)}^2}} } \]
对于公示的第一部分
\[\frac{1}{2}\sum\limits_{\left( {i,j} \right):r\left( {i,j} \right) = 1} {{{\left( {{{\left( {{\theta ^{\left( j \right)}}} \right)}^T}\left( {{x^{\left( i \right)}}} \right) - {y^{\left( {i,j} \right)}}} \right)}^2}} \]
由于没有 r(i, j) = 1 的情况,所以这部分无用
对于
\[\frac{\lambda }{2}\sum\limits_{j = 1}^{{n_u}} {\sum\limits_{j = 1}^n {{{\left( {\theta _k^{\left( j \right)}} \right)}^2}} } \]
最小化这个部分的结果是(假设电影有两个特征)
\[{\theta ^{\left( 5 \right)}} = \left[ {\begin{array}{*{20}{c}}
0\\
0
\end{array}} \right]\]
进而导致预测结果
\[{\left( {{\theta ^{\left( 5 \right)}}} \right)^T}\left( {{x^{\left( i \right)}}} \right) = 0\]
可以看出这样是不对的或者无意义的。
均值归一化的做法是将 Y 减去每一行的均值
\[Y = \left[ {\begin{array}{*{20}{c}}
5&5&0&0&?\\
5&?&?&0&?\\
?&4&0&?&?\\
0&0&5&4&?\\
0&0&5&0&?
\end{array}} \right],\mu = \left[ {\begin{array}{*{20}{c}}
{\begin{array}{*{20}{c}}
{2.5}\\
{2.5}
\end{array}}\\
2\\
{2.25}\\
{1.25}
\end{array}} \right]\]
\[Y = Y - \mu = \left[ {\begin{array}{*{20}{c}}
{2.5}&{2.5}&{ - 2.5}&{ - 2.5}&?\\
{2.5}&?&?&{ - 2.5}&?\\
?&2&{ - 2}&?&?\\
{ - 2.25}&{ - 2.25}&{2.75}&{1.75}&?\\
{ - 1.25}&{ - 1.25}&{3.75}&{ - 1.25}&?
\end{array}} \right]\]
用新的 Y 取训练模型得到参数 x,当预测用户 j 对点电影 i 的评分时,预测结果应该是
\[{\left( {{\theta ^{\left( 5 \right)}}} \right)^T}\left( {{x^{\left( i \right)}}} \right) + {\mu _i}\]
这时,对于用户 Eve 的预测结果就是其它评分的均值
\[{\left( {{\theta ^{\left( 5 \right)}}} \right)^T}\left( x \right) + {\mu _i} = \left[ {\begin{array}{*{20}{c}}
{\begin{array}{*{20}{c}}
{2.5}\\
{2.5}
\end{array}}\\
2\\
{2.25}\\
{1.25}
\end{array}} \right]\]