摘要:
1 Nutch common is 'bin/nutch crawl <urlDir> [-dir d] [-threads n] [-depth i] [-topN] ', nutch will generate segment foreach depth,and topN means each layer will collect topN urls. Generally each layer has one single segment,it depends onmaxNumSegments(1 is the default value) in Generat 阅读全文