2013年4月1日
摘要: IntroduceThrough the following, u will know what's edit distance.target : you are cute and I love u.source: I am cute I don't love u.For the two sentences, u can use edit including insert, delete and substitute the source so that it can be same to the target.u will subsutitute "you" 阅读全文
posted @ 2013-04-01 18:43 MrMission 阅读(2937) 评论(1) 推荐(0) 编辑
摘要: Our final discussion in basic text processing is segmenting out sentences from text.We use a decision tree to solve this question. But it's doesn't enough, we should use more sophisticated decision tree features to gain the classifier. For example, u can get the probablity of one word end of 阅读全文
posted @ 2013-04-01 14:05 MrMission 阅读(155) 评论(0) 推荐(0) 编辑
摘要: Well, today I learned the word normalization and stemming.After word tokenization, we should stem to map them to a normal form. For examples, u should refer "are is " to "be", and refer "windows" to "window" and so on. Afterwards, we can use Linux tool to impl 阅读全文
posted @ 2013-04-01 10:22 MrMission 阅读(497) 评论(1) 推荐(0) 编辑