nutch 采集效率--设置采集间隔

fetcher.max.crawl.delay  默认是30秒，这里改为 5秒
修改nutch-default.xml
<property>
 <name>fetcher.max.crawl.delay</name>
 <value>5</value>
 <description>
 If the Crawl-Delay in robots.txt is set to greater than this value (in
 seconds) then the fetcher will skip this page, generating an error report.
 If set to -1 the fetcher will never skip such pages and will wait the
 amount of time retrieved from robots.txt Crawl-Delay, however long that
 might be.
 </description>
</property>

posted on 2014-09-05 11:20 雨渐渐阅读(260) 评论(0) 编辑收藏举报

刷新页面返回顶部

雨渐渐

nutch 采集效率--设置采集间隔

导航

公告