nutch 采集效率问题
http://hi.baidu.com/jacklin/item/a8fbccf479f6a1d042c36a7c
再附一篇:http://blog.csdn.net/laigood/article/details/6233561
fetcher.threads.per.host
<property>
<name>fetcher.threads.per.queue</name>
<value>1</value>
<description>This number is the maximum number of threads that
should be allowed to access a queue at one time.
</description>
</property>
自1.6似乎改成fetcher.threads.per.queue 这个属性了
fetcher.server.delay
1 增加同一个host的线程数(如上图所示,不建议,增加对方网站负担)
2 减少延迟(interval) 从5降为 0 即可
3 当然最主要的还是 增加 mapred数~