【阅读笔记】Ranking Relevance in Yahoo Search (四 / 完结篇)—— recency-sensitive ranking
7. RECENCY-SENSITIVE RANKING
作用:
为recency-sensitive的query提高排序质量;
对于这类query,用户不仅要相关的还需要最新的信息;
方法:recency-demoted relevance
1) 对每篇doc,按照它的freshness程度进行分级:very fresh, fresh, slightly out-dated, stale, 和 non-time-sensitive(与时间无关);
2) 在base relevance的基础上,根据freshness进一步调整relevance:
VF | F | SO | S | NT | |
Perfect | Perfect | Perfect | Excellent | Good | Perfect |
Excellent | Perfect | Excellent | Good | Fair | Excellent |
Good | Good | Good | Fair | Bad | Good |
Fair | Fair | Fair | Bad | Bad | Fair |
Bad | Bad | Bad | Bad | Bad | Bad |
3)数据:“收集training data”
- 寻找大量的近期标签是不太可能的事情,因为近期的标签总是很快就out of data;
- 因此需要利用a large relevance dataset without recency labels and a small recency dataset for building the recency ranker;
4)公式:(待添加)
备注:
- 其中freshness组件是基于recency dataset训练得到的:通过time-sensitive classifier来决定此component是否要被添加;
- frel(x)代表基本的ranker;rfresh(x)代表freshness组件;cts代表time-sensitivity分类器;
- 仅当Cts表明x为time-sensitive query-url对时,rfresh(x)才被添加;
重点:time-sensitive classifier的训练;freshness component;
1) time-sensitive classifier
use the recency dataset and transform the freshness labels into binary labels (eg:non-time-sensitive to negative and other labels to positive) and train a binary classfier;
2)build rfresh(x)
use the frel(x) as the base ranker, and add more trees to optimize the goal of recency-demoted relevance;
8. LOCATION-SENSITIVE RANKING
location-sensitive query:
一些query的搜索结果与location关系密切,此类query我们称之为location-sensitive queries, 分为:
explicit local query - queries with specific location names(eg:"restaurants Boston");
implicit local query - queries without location but with location-sensitive intention(eg:"restaurant");
方法:通过query和url直接的距离d(query, url)来计算;
但如果使用过去的learning-to-rank模型的话,d(query, url)特征的影响不大,所以新建以下模型用来计算 -
模型:location boosting rankin model
1)分别从query和web page中提取出location:
- explicit local query - directly parse the location in explicit local query;
- implicit local query - use use's location;
- web pages - extracted based on the query-url click graph from search logs,or parse the locations from urls directly;
2)根据各自的location,计算query和web page之间的距离:
公式(待加)
以上logistic function考虑到base relevance和location之间的距离两个因素:
- 当doc的url地址和用户很接近,而且doc的内容也和query匹配时,对该doc进行提权操作;
- 若doc的url地址和用户很接近,但是doc的内容与query不相关,将不对该doc提权,ranking结果此时仅有base ranking function决定;
- 若doc的内容与query相关度很高,但doc的url地址与用户相隔很远,将不对该doc提权,ranking结果此时仅有base ranking function决定;
备注:
d^(query,url)代表d(query,url)的归一化,范围为[0,1];
fb(x)表示基于base ranking function得到的query和url的相关度;
3)参数的确定:
参数w, α, β通过以下公式由成对的数据确定 -
公式(待加)
备注:
其中P={(pi, pj)| pi > pj}是对于同一个query的一系列url pairs,pi > pj表示pi的相关性好于pj;
我们通过standard gradient descent approach来得到参数的最优化结果;
9. CONCLUSION
In this paper, we introduce the comprehensive relevance solutions of Yahoo search.