Information retrieval + RL
1. Ranking as Sequential Decision Making
Advantages: beyond independent relevance
2. RL: Learn to make good sequences of decisions
3. Alpha Go:
Breadth reduction: Policy Network 在某一手,下某些区域,是臭棋,可通过PN判断出来不必搜索,因而减少树的宽度。
Depth reduction: Value Network 在树搜索中,某一节点下注定赢不了棋,可通过VN剪枝,因而减少树的深度。
4. Ranking evaluation: NDCG (Normalized Discounted Cumulative Gain); Map(Mean average precision)
5. Monto-Carlo search
posted on 2017-11-22 17:36 WegZumHimmel 阅读(121) 评论(0) 编辑 收藏 举报