cynorr

Learn what I touched.

  博客园  :: 首页  :: 新随笔  :: 联系 :: 订阅 订阅  :: 管理

What is Mr.LDA?


MrLDA is a topic model, which can be used to classfy the arcicles by topics and make face recognition. There are many versions of LDA and MrLDA is one of the most entire and meture version. What is more important is that it can deal with biligual corpus, which is the assignment my teacher assigned to me this month.

Lack compile commond.

Recently, in black mood, I can't compile the MrLDA using mavan.Some errors I can't recognize occurs when I type compiling commond:

$mvn clean package

On the top of these errors, somes errors implict that lack of compile commond. Having googled it, I get it. The solution is add the following commond after "".

<defaultCompile>compile</defaultCompile>

Wrong path in Java_Home

Though I have add the compile in pom.xml, It still can't works with a long list of errors. When I type

$ mvn -version

I found the origin problem.The PATH of maven is WRONG!!! It is so small that I ignored it.
After correcting it and rebooting, it works.Type

$mvn clean package

After waiting a minute, Compile successfully!!!
The *.jar I need is in the floder called "target"

Build hadoop

It must works on the hadoop platform, so that I have to build the hadoop environment before execute it.With only one PC, the single model is enough. The steps are simple.
Firstly, add the Java_Path to the maven_env.sh. Open the sh file and seek the line of JavePath, then type you Java_Path, remembering to delete the '$' notation before type path.If you forget you JavePath, you can open the configure file and check it.

$ sudo gedit /etc/profile

Then we can copy the *.jar and data to hadoop/bin floder. Then hadoop runs in single model.

$ ./hadoop jar target/*fatjar.jar -input ap.txt -output ap-pursed

It will waste our much time if we type this commond everytime.So, we can new a sh file and type it in that. It works just by type

sh run.sh

Can't execute 😦

Oh shit! The long list of errors occurs again!!! Some is about IOExceptions, and some is about hadoop. The readon may be following:

  • The sample data is ap.dat rather than ap.txt
  • The hadoop is not configured completely.

Having checking for one day , format of corpus should be wrong . I view the direction again and fond this LDA need a related sample data called "Mr.LDA-data" , which we can download at github.
With the specific data , executing it is a piece of cake.

posted on 2014-12-22 23:33  cynorr  阅读(268)  评论(0编辑  收藏  举报