WordNet词网研究7——之JWS(Java Wordnet Similarity)语义相似度计算
JWS——Java WordNet Similarity是由University Of Sussex的David Hope等开发的基于java与WordNet的语义相似度计算开源项目。其中实现了许多经典的语义相似度算法。是一款值得研究的语义相似度计算开源工具。
JWS是WordNet::Similarity(一个Perl版的WordNet相似度比较包)的Java实现版本,想用Java实现用WordNet比较词语相似度的朋友有福拉!!简述使用步骤:
1、下载WordNet(Win、2.1版):http://wordnet.princeton.edu/wordnet/download/;
2、下载WordNet-InfoContent(2.1版):http://wn-similarity.sourceforge.net/ 或http://www.d.umn.edu/~tpederse/Data/;
3、下载JWS(现有版本:beta.11.01):http://www.cogs.susx.ac.uk/users/drh21/;
4、安装WordNet;
5、解压WordNet-InfoContent-2.1,并将文件夹拷贝至WordNet目录D:/Program Files/WordNet/2.1下;
6、将JWS中的两个jar包:edu.mit.jwi_2.1.4.jar和edu.sussex.nlp.jws.beta.11.jar拷贝至Java的lib目录下,并设置环境变量;
7、在Eclipse下运行JWS中的例子程序:TestExamples
说明:由于下载的WordNet是2.1版本的,所以程序中有几处需要修改
String dir = "C:/Program Files/WordNet"; //这里指定WordNet的安装路径,按照你实际安装的路径加以修改
JWS ws = new JWS(dir, "3.0"); //把3.0改为2.1即可
程序实例:
1 import java.util.TreeMap; 2 import java.text.*; 3 import edu.sussex.nlp.jws.*; 4 5 6 // 'TestExamples': how to use Java WordNet::Similarity 7 // David Hope, 2008 8 public class TestExamples 9 { 10 public static void main(String[] args) 11 { 12 13 // 1. SET UP: 14 // Let's make it easy for the user. So, rather than set pointers in 'Environment Variables' etc. let's allow the user to define exactly where they have put WordNet(s) 15 String dir = "E:/Commonly Application/WordNet/"; 16 // That is, you may have version 3.0 sitting in the above directory e.g. C:/Program Files/WordNet/3.0/dict 17 // The corresponding IC files folder should be in this same directory e.g. C:/Program Files/WordNet/3.0/WordNet-InfoContent-3.0 18 19 // Option 1 (Perl default): specify the version of WordNet you want to use (assuming that you have a copy of it) and use the default IC file [ic-semcor.dat] 20 JWS ws = new JWS(dir, "2.1"); 21 // Option 2 : specify the version of WordNet you want to use and the particular IC file that you wish to apply 22 //JWS ws = new JWS(dir, "3.0", "ic-bnc-resnik-add1.dat"); 23 24 25 // 2. EXAMPLES OF USE: 26 27 // 2.1 [JIANG & CONRATH MEASURE] 28 JiangAndConrath jcn = ws.getJiangAndConrath(); 29 //System.out.println("Jiang & Conrath\n"); 30 // all senses 31 TreeMap<String, Double> scores1 = jcn.jcn("apple", "banana", "n"); // all senses 32 //TreeMap<String, Double> scores1 = jcn.jcn("apple", 1, "banana", "n"); // fixed;all 33 //TreeMap<String, Double> scores1 = jcn.jcn("apple", "banana", 2, "n"); // all;fixed 34 for(String s : scores1.keySet()) 35 System.out.println(s + "\t" + scores1.get(s)); 36 // specific senses 37 //System.out.println("\nspecific pair\t=\t" + jcn.jcn("apple", 1, "banana", 1, "n") + "\n"); 38 // max. 39 ///System.out.println("\nhighest score\t=\t" + jcn.max("java", "best", "n") + "\n\n\n"); 40 41 //*/ 42 // 2.2 [LIN MEASURE] 43 Lin lin = ws.getLin(); 44 ///System.out.println("Lin\n"); 45 // all senses 46 TreeMap<String, Double> scores2 = lin.lin("like", "love", "n"); // all senses 47 //TreeMap<String, Double> scores2 = lin.lin("kid", "child", "n"); // fixed;all 48 //TreeMap<String, Double> scores2 = lin.lin("apple", "banana", 2, "n"); // all;fixed 49 //for(String s : scores2.keySet()) 50 //System.out.println(s + "\t" + scores2.get(s)); 51 // specific senses 52 System.out.println("\nspecific pair\t=\t" + lin.lin("like", 1, "love", 1, "n") + "\n"); 53 // max. 54 System.out.println("\nhighest score\t=\t" + lin.max("From","date","n") + "\n\n\n"); 55 56 // ... and so on for any other measure 57 } 58 } // eof
简单实现基于JWS的语义相似度计算程序,例如:
1 import edu.sussex.nlp.jws.JWS; 2 import edu.sussex.nlp.jws.Lin; 3 4 5 public class Similar { 6 7 private String str1; 8 private String str2; 9 private String dir = "E:/Commonly Application/WordNet/"; 10 private JWS ws = new JWS(dir, "2.1"); 11 12 public Similar(String str1,String str2){ 13 this.str1=str1; 14 this.str2=str2; 15 } 16 17 public double getSimilarity(){ 18 String[] strs1 = splitString(str1); 19 String[] strs2 = splitString(str2); 20 double sum = 0.0; 21 for(String s1 : strs1){ 22 for(String s2: strs2){ 23 double sc= maxScoreOfLin(s1,s2); 24 sum+= sc; 25 System.out.println("当前计算: "+s1+" VS "+s2+" 的相似度为:"+sc); 26 } 27 } 28 double Similarity = sum /(strs1.length * strs2.length); 29 sum=0; 30 return Similarity; 31 } 32 33 private String[] splitString(String str){ 34 String[] ret = str.split(" "); 35 return ret; 36 } 37 38 private double maxScoreOfLin(String str1,String str2){ 39 Lin lin = ws.getLin(); 40 double sc = lin.max(str1, str2, "n"); 41 if(sc==0){ 42 sc = lin.max(str1, str2, "v"); 43 } 44 return sc; 45 } 46 47 public static void main(String args[]){ 48 String s1="departure"; 49 String s2="leaving from"; 50 Similar sm= new Similar(s1, s2); 51 System.out.println(sm.getSimilarity()); 52 } 53 }
当时碰到想基于protege+Wordnet来处理语义分析这块,所以接触到JWS,但没有太多的时间去深入研究,是一个非常的遗憾,希望有研究的朋友,发个Blog Url,大家参考参考!