nutch 采集效率--设置采集间隔
fetcher.max.crawl.delay 默认是30秒,这里改为 5秒
修改nutch-default.xml
<property> <name>fetcher.max.crawl.delay</name> <value>5</value> <description> If the Crawl-Delay in robots.txt is set to greater than this value (in seconds) then the fetcher will skip this page, generating an error report. If set to -1 the fetcher will never skip such pages and will wait the amount of time retrieved from robots.txt Crawl-Delay, however long that might be. </description> </property>