HarrySun

  博客园 :: 首页 :: 博问 :: 闪存 :: 新随笔 :: 联系 :: 订阅 订阅 :: 管理 ::

1. install software

Cygwin,  jdk, ant, nutch

 

 

2. configure

  • environment variable 

JAVA_HOME = C:\PROGRA~1\Java\jdk1.7.0_45

ANT_HOME =  C:\PROGRA~1\Ant\apache-ant-1.9.3

PATH = ...

 

  • copy source file

copy apache-nutch-2.2.1-src folder into home of Cygwin

  • build

enter home/apache-nutch-2.2.1-src then build

ant

It takes about half an hour to download dependency.

 

3. test

Stan@Stan-PC ~/nutch/runtime/local
$ ls
bin  conf  lib  plugins  test

Stan@Stan-PC ~/nutch/runtime/local
$ bin/nutch
Usage: nutch COMMAND
where COMMAND is one of:
 inject         inject new urls into the database
 hostinject     creates or updates an existing host table from a text file
 generate       generate new batches to fetch from crawl db
 fetch          fetch URLs marked during generate
 parse          parse URLs marked during fetch
 updatedb       update web table after parsing
 updatehostdb   update host table after parsing
 readdb         read/dump records from page database
 readhostdb     display entries from the hostDB
 elasticindex   run the elasticsearch indexer
 solrindex      run the solr indexer on parsed batches
 solrdedup      remove duplicates from solr
 parsechecker   check the parser for a given url
 indexchecker   check the indexing filters for a given url
 plugin         load a plugin and run one of its classes main()
 nutchserver    run a (local) Nutch server on a user defined port
 junit          runs the given JUnit test
 or
 CLASSNAME      run the class named CLASSNAME
Most commands print help when invoked w/o parameters.

Stan@Stan-PC ~/nutch/runtime/local

 

continue...

posted on 2014-01-13 00:43  HarrySun  阅读(275)  评论(0编辑  收藏  举报