Information retrieval + RL

1. Ranking as Sequential Decision Making

  Advantages: beyond independent relevance

 

2. RL: Learn to make good sequences of decisions

3. Alpha Go:

Breadth reduction: Policy Network 在某一手,下某些区域,是臭棋,可通过PN判断出来不必搜索,因而减少树的宽度。

Depth reduction: Value Network 在树搜索中,某一节点下注定赢不了棋,可通过VN剪枝,因而减少树的深度。

 

4. Ranking evaluation: NDCG (Normalized Discounted Cumulative Gain); Map(Mean average precision)

5. Monto-Carlo search

 

posted on 2017-11-22 17:36  WegZumHimmel  阅读(121)  评论(0编辑  收藏  举报

导航