Distributed Sentence Similarity Base on Word Mover's Distance

Algorithm:

Refrence from one ICML15 paper: Word Mover's Distance.

1. First use Google's word2vec tool to get distributed word representing aka. word vectors.

2. Then use earth mover's distance as similarity measure metric.

3. Solve the EMD problem as transportation problem by Hungarian Algorithm.


 

Outcome:

Result looks not bad, but still have ways to improve the precision.

For example: use n-gram to keep a little bit sentence structure.

 

posted on 2015-08-12 22:44  amojry  阅读(566)  评论(0编辑  收藏  举报