Nutch1.6学习笔记
暑假每天傍晚或晚上更新
伪恋赛高
这里提供nutch1.6的src下载:
apache-nutch-1.6-src.zip
115网盘礼包码:5lbcymlo6u76
http://115.com/lb/5lbcymlo6u76
如果不想自己编译源代码,可以直接下载我编译后的文件,包括单机版local和依赖hadoop版deploy(64位):
apache-nutch1.6-runtime.zip
115网盘礼包码:5lbcy4rl8e4l
http://115.com/lb/5lbcy4rl8e4l
或者仅下载官方的deploy版
apache-nutch-1.6-bin.tar.gz
115网盘礼包码:5lbbtpwwpbq2
http://115.com/lb/5lbbtpwwpbq2
7/20日编辑:---------------------
今天突然找到nutch各版本的下载地址:http://archive.apache.org/dist/nutch/
apache的各版本软件都可以在这里找到:http://archive.apache.org/dist/
-----------------------------------
http://wxweven.blog.163.com/blog/static/1974791152014127115626958/
在runtime文件夹中,local文件夹是不借助hadoop的nutch,在该文件夹中实现了单机mapreduce。
本地nutch一般用来做测试、调试。进入local文件夹
在conf文件夹中有很多配置nutch的文件,nutch-default.xml是默认配置,里面有很多配置的
说明。nutch-site.xml是最主要的配置,它会覆盖default中的内容。
在运行nutch前先在nutch-site.xml加入http.agent.name配置。
default中的http.agent.name的例子如下:
1 <property> 2 <name>http.agent.name</name> 3 <value></value> 4 <description>HTTP 'User-Agent' request header. MUST NOT be empty - 5 please set this to a single word uniquely related to your organization. 6 7 NOTE: You should also check other related properties: 8 9 http.robots.agents 10 http.agent.description 11 http.agent.url 12 http.agent.email 13 http.agent.version 14 15 and set their values appropriately. 16 17 </description> 18 </property>
考到site.xml,在value标签中加入请求头,这个请求头需要在浏览器中提取,
比如火狐的请求头是
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.72 Safari/537.36
这里是我的nutch-site.xml的完整内容:
1 <?xml version="1.0"?> 2 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> 3 4 <!-- Put site-specific property overrides in this file. --> 5 6 <configuration> 7 <property> 8 <name>http.agent.name</name> 9 <value>Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.72 Safari/537.36</value> 10 </property> 11 12 </configuration>
修改好配置之后就能做实验了。
运行bin中的natch程序,提示要输入命令
以下内容部分转自:http://www.blogjava.net/kxx129/archive/2009/09/05/294000.html
Crawl(爬行): Crawl是“org.apache.nutch.crawl.Crawl”的别称,它是一个完整的爬取和索引过程命令。
使用方法:
Shell代码
bin/nutch crawl <urlDir> [-dir d] [-threads n] [-depth i] [-topN]
bin/nutch crawl <urlDir> [-dir d] [-threads n] [-depth i] [-topN]
参数说明:
<urlDir>:包括URL列表的文本文件,它是一个已存在的文件夹。
[-dir <d>]:Nutch保存爬取记录的工作目录,默认情况下值为:./crawl-[date],其中[date]为当前目期。
[-threads <n>]:Fetcher线程数,覆盖默认配置文件中的fetcher.threads.fetch值(默认为10)。
[-depth <i>]:Nutch爬虫迭代的深度,默认值为5。
[-topN <num>]:限制每一次迭代中的前N条记录,默认值为 Integer.MAX_VALUE。
例子1:./bin/nutch crawl urls -dir data -threads 50 -depth 2 -topN 2(先不运行这个命令)
要抓取的网址存放在urls文件夹中(nutch要从urls中的文件读出来),
抓取后的数据放在data中,
使用50个线程来抓取,迭代深度为2,每次迭代抓前2条记录
值得注意的是nutch为了优化效率,不会严格按照深度优先搜索或广度优先搜索来查找
例子2: nohup ./bin/nutch crawl urls -dir data -threads 50 -depth 2 -topN 2 &
在前边加了一个nohup, nutch会把日志写到当前目录的nohup.out中
(更详细的日志文件在logs/hadoop.log中)
在后边加了一个&,这是linux的后台运行的命令
如果出错了,在nohup.out中可以看到类似与java异常的日志
测试的时候,先生成urls文件夹,然后在里面生成url.txt
写入http://blog.tianya.cn/,表示抓取天涯博客。
url.txt不是固定的,你可以改成其他名,在urls中的所有文件都将被看作是装作url的文件被读取
可以去掉-topN,这样抓取前两层所有的url,博主试了一下,某些网页超久的,最好不要去掉
运行例子2,而一会儿后,抓取完毕,以下是我抓取的nohup.out日志,没有显示异常:
1 solrUrl is not set, indexing will be skipped... 2 crawl started in: data 3 rootUrlDir = urls 4 threads = 50 5 depth = 2 6 solrUrl=null 7 topN = 2 8 Injector: starting at 2014-07-13 20:37:26 9 Injector: crawlDb: data/crawldb 10 Injector: urlDir: urls 11 Injector: Converting injected urls to crawl db entries. 12 Injector: total number of urls rejected by filters: 1 13 Injector: total number of urls injected after normalization and filtering: 2 14 Injector: Merging injected urls into crawl db. 15 Injector: finished at 2014-07-13 20:37:57, elapsed: 00:00:30 16 Generator: starting at 2014-07-13 20:37:57 17 Generator: Selecting best-scoring urls due for fetch. 18 Generator: filtering: true 19 Generator: normalizing: true 20 Generator: topN: 2 21 Generator: jobtracker is 'local', generating exactly one partition. 22 Generator: Partitioning selected urls for politeness. 23 Generator: segment: data/segments/20140713203805 24 Generator: finished at 2014-07-13 20:38:13, elapsed: 00:00:15 25 Fetcher: Your 'http.agent.name' value should be listed first in 'http.robots.agents' property. 26 Fetcher: starting at 2014-07-13 20:38:13 27 Fetcher: segment: data/segments/20140713203805 28 Using queue mode : byHost 29 Fetcher: threads: 50 30 Fetcher: time-out divisor: 2 31 QueueFeeder finished: total 1 records + hit by time limit :0 32 Using queue mode : byHost 33 Using queue mode : byHost 34 Using queue mode : byHost 35 fetching http://blog.tianya.cn/ 36 Using queue mode : byHost 37 -finishing thread FetcherThread, activeThreads=2 38 -finishing thread FetcherThread, activeThreads=1 39 Using queue mode : byHost 40 -finishing thread FetcherThread, activeThreads=1 41 Using queue mode : byHost 42 Using queue mode : byHost 43 -finishing thread FetcherThread, activeThreads=1 44 -finishing thread FetcherThread, activeThreads=1 45 Using queue mode : byHost 46 -finishing thread FetcherThread, activeThreads=1 47 Using queue mode : byHost 48 -finishing thread FetcherThread, activeThreads=1 49 Using queue mode : byHost 50 -finishing thread FetcherThread, activeThreads=1 51 Using queue mode : byHost 52 Using queue mode : byHost 53 -finishing thread FetcherThread, activeThreads=1 54 -finishing thread FetcherThread, activeThreads=1 55 -finishing thread FetcherThread, activeThreads=1 56 Using queue mode : byHost 57 Using queue mode : byHost 58 -finishing thread FetcherThread, activeThreads=1 59 -finishing thread FetcherThread, activeThreads=1 60 Using queue mode : byHost 61 Using queue mode : byHost 62 -finishing thread FetcherThread, activeThreads=1 63 Using queue mode : byHost 64 -finishing thread FetcherThread, activeThreads=1 65 Using queue mode : byHost 66 -finishing thread FetcherThread, activeThreads=1 67 Using queue mode : byHost 68 -finishing thread FetcherThread, activeThreads=1 69 Using queue mode : byHost 70 -finishing thread FetcherThread, activeThreads=1 71 Using queue mode : byHost 72 -finishing thread FetcherThread, activeThreads=1 73 Using queue mode : byHost 74 -finishing thread FetcherThread, activeThreads=1 75 -finishing thread FetcherThread, activeThreads=1 76 Using queue mode : byHost 77 Using queue mode : byHost 78 -finishing thread FetcherThread, activeThreads=1 79 Using queue mode : byHost 80 -finishing thread FetcherThread, activeThreads=1 81 Using queue mode : byHost 82 Using queue mode : byHost 83 -finishing thread FetcherThread, activeThreads=1 84 -finishing thread FetcherThread, activeThreads=1 85 Using queue mode : byHost 86 Using queue mode : byHost 87 -finishing thread FetcherThread, activeThreads=1 88 -finishing thread FetcherThread, activeThreads=1 89 Using queue mode : byHost 90 Using queue mode : byHost 91 -finishing thread FetcherThread, activeThreads=1 92 -finishing thread FetcherThread, activeThreads=1 93 Using queue mode : byHost 94 Using queue mode : byHost 95 -finishing thread FetcherThread, activeThreads=1 96 -finishing thread FetcherThread, activeThreads=1 97 Using queue mode : byHost 98 Using queue mode : byHost 99 -finishing thread FetcherThread, activeThreads=1 100 Using queue mode : byHost 101 -finishing thread FetcherThread, activeThreads=1 102 -finishing thread FetcherThread, activeThreads=1 103 Using queue mode : byHost 104 Using queue mode : byHost 105 -finishing thread FetcherThread, activeThreads=1 106 -finishing thread FetcherThread, activeThreads=1 107 Using queue mode : byHost 108 Using queue mode : byHost 109 -finishing thread FetcherThread, activeThreads=1 110 Using queue mode : byHost 111 -finishing thread FetcherThread, activeThreads=1 112 -finishing thread FetcherThread, activeThreads=1 113 Using queue mode : byHost 114 -finishing thread FetcherThread, activeThreads=1 115 Using queue mode : byHost 116 -finishing thread FetcherThread, activeThreads=1 117 Using queue mode : byHost 118 Using queue mode : byHost 119 -finishing thread FetcherThread, activeThreads=1 120 -finishing thread FetcherThread, activeThreads=1 121 Using queue mode : byHost 122 -finishing thread FetcherThread, activeThreads=1 123 Using queue mode : byHost 124 -finishing thread FetcherThread, activeThreads=1 125 Using queue mode : byHost 126 -finishing thread FetcherThread, activeThreads=1 127 Using queue mode : byHost 128 -finishing thread FetcherThread, activeThreads=1 129 -finishing thread FetcherThread, activeThreads=1 130 Using queue mode : byHost 131 Fetcher: throughput threshold: -1 132 Fetcher: throughput threshold retries: 5 133 -finishing thread FetcherThread, activeThreads=1 134 -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0 135 -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0 136 -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0 137 -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0 138 -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0 139 -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0 140 -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0 141 -finishing thread FetcherThread, activeThreads=0 142 -activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0 143 -activeThreads=0 144 Fetcher: finished at 2014-07-13 20:38:26, elapsed: 00:00:13 145 ParseSegment: starting at 2014-07-13 20:38:26 146 ParseSegment: segment: data/segments/20140713203805 147 Parsed (64ms):http://blog.tianya.cn/ 148 ParseSegment: finished at 2014-07-13 20:38:33, elapsed: 00:00:07 149 CrawlDb update: starting at 2014-07-13 20:38:33 150 CrawlDb update: db: data/crawldb 151 CrawlDb update: segments: [data/segments/20140713203805] 152 CrawlDb update: additions allowed: true 153 CrawlDb update: URL normalizing: true 154 CrawlDb update: URL filtering: true 155 CrawlDb update: 404 purging: false 156 CrawlDb update: Merging segment data into db. 157 CrawlDb update: finished at 2014-07-13 20:38:47, elapsed: 00:00:13 158 Generator: starting at 2014-07-13 20:38:47 159 Generator: Selecting best-scoring urls due for fetch. 160 Generator: filtering: true 161 Generator: normalizing: true 162 Generator: topN: 2 163 Generator: jobtracker is 'local', generating exactly one partition. 164 Generator: Partitioning selected urls for politeness. 165 Generator: segment: data/segments/20140713203855 166 Generator: finished at 2014-07-13 20:39:02, elapsed: 00:00:15 167 Fetcher: Your 'http.agent.name' value should be listed first in 'http.robots.agents' property. 168 Fetcher: starting at 2014-07-13 20:39:02 169 Fetcher: segment: data/segments/20140713203855 170 Using queue mode : byHost 171 Fetcher: threads: 50 172 Fetcher: time-out divisor: 2 173 QueueFeeder finished: total 2 records + hit by time limit :0 174 Using queue mode : byHost 175 Using queue mode : byHost 176 fetching http://blog.tianya.cn/blog/culture 177 Using queue mode : byHost 178 Using queue mode : byHost 179 Using queue mode : byHost 180 Using queue mode : byHost 181 Using queue mode : byHost 182 Using queue mode : byHost 183 Using queue mode : byHost 184 Using queue mode : byHost 185 Using queue mode : byHost 186 Using queue mode : byHost 187 Using queue mode : byHost 188 Using queue mode : byHost 189 Using queue mode : byHost 190 Using queue mode : byHost 191 Using queue mode : byHost 192 Using queue mode : byHost 193 Using queue mode : byHost 194 Using queue mode : byHost 195 Using queue mode : byHost 196 Using queue mode : byHost 197 Using queue mode : byHost 198 Using queue mode : byHost 199 Using queue mode : byHost 200 Using queue mode : byHost 201 Using queue mode : byHost 202 Using queue mode : byHost 203 Using queue mode : byHost 204 Using queue mode : byHost 205 Using queue mode : byHost 206 Using queue mode : byHost 207 Using queue mode : byHost 208 Using queue mode : byHost 209 Using queue mode : byHost 210 Using queue mode : byHost 211 Using queue mode : byHost 212 Using queue mode : byHost 213 Using queue mode : byHost 214 Using queue mode : byHost 215 Using queue mode : byHost 216 Using queue mode : byHost 217 Using queue mode : byHost 218 Using queue mode : byHost 219 Using queue mode : byHost 220 Using queue mode : byHost 221 Using queue mode : byHost 222 Using queue mode : byHost 223 Using queue mode : byHost 224 Using queue mode : byHost 225 Fetcher: throughput threshold: -1 226 Fetcher: throughput threshold retries: 5 227 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=1 228 * queue: http://blog.tianya.cn 229 maxThreads = 1 230 inProgress = 1 231 crawlDelay = 5000 232 minCrawlDelay = 0 233 nextFetchTime = 1405255142707 234 now = 1405255143834 235 0. http://blog.tianya.cn/blog/daren 236 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=1 237 * queue: http://blog.tianya.cn 238 maxThreads = 1 239 inProgress = 1 240 crawlDelay = 5000 241 minCrawlDelay = 0 242 nextFetchTime = 1405255142707 243 now = 1405255144838 244 0. http://blog.tianya.cn/blog/daren 245 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=1 246 * queue: http://blog.tianya.cn 247 maxThreads = 1 248 inProgress = 1 249 crawlDelay = 5000 250 minCrawlDelay = 0 251 nextFetchTime = 1405255142707 252 now = 1405255145841 253 0. http://blog.tianya.cn/blog/daren 254 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=1 255 * queue: http://blog.tianya.cn 256 maxThreads = 1 257 inProgress = 0 258 crawlDelay = 5000 259 minCrawlDelay = 0 260 nextFetchTime = 1405255151041 261 now = 1405255146844 262 0. http://blog.tianya.cn/blog/daren 263 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=1 264 * queue: http://blog.tianya.cn 265 maxThreads = 1 266 inProgress = 0 267 crawlDelay = 5000 268 minCrawlDelay = 0 269 nextFetchTime = 1405255151041 270 now = 1405255147847 271 0. http://blog.tianya.cn/blog/daren 272 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=1 273 * queue: http://blog.tianya.cn 274 maxThreads = 1 275 inProgress = 0 276 crawlDelay = 5000 277 minCrawlDelay = 0 278 nextFetchTime = 1405255151041 279 now = 1405255148852 280 0. http://blog.tianya.cn/blog/daren 281 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=1 282 * queue: http://blog.tianya.cn 283 maxThreads = 1 284 inProgress = 0 285 crawlDelay = 5000 286 minCrawlDelay = 0 287 nextFetchTime = 1405255151041 288 now = 1405255149855 289 0. http://blog.tianya.cn/blog/daren 290 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=1 291 * queue: http://blog.tianya.cn 292 maxThreads = 1 293 inProgress = 0 294 crawlDelay = 5000 295 minCrawlDelay = 0 296 nextFetchTime = 1405255151041 297 now = 1405255150858 298 0. http://blog.tianya.cn/blog/daren 299 fetching http://blog.tianya.cn/blog/daren 300 -finishing thread FetcherThread, activeThreads=49 301 -finishing thread FetcherThread, activeThreads=48 302 -finishing thread FetcherThread, activeThreads=46 303 -finishing thread FetcherThread, activeThreads=47 304 -finishing thread FetcherThread, activeThreads=45 305 -finishing thread FetcherThread, activeThreads=44 306 -finishing thread FetcherThread, activeThreads=43 307 -finishing thread FetcherThread, activeThreads=41 308 -finishing thread FetcherThread, activeThreads=42 309 -finishing thread FetcherThread, activeThreads=40 310 -finishing thread FetcherThread, activeThreads=39 311 -finishing thread FetcherThread, activeThreads=38 312 -finishing thread FetcherThread, activeThreads=37 313 -finishing thread FetcherThread, activeThreads=36 314 -finishing thread FetcherThread, activeThreads=35 315 -finishing thread FetcherThread, activeThreads=34 316 -finishing thread FetcherThread, activeThreads=33 317 -finishing thread FetcherThread, activeThreads=32 318 -finishing thread FetcherThread, activeThreads=31 319 -finishing thread FetcherThread, activeThreads=30 320 -finishing thread FetcherThread, activeThreads=29 321 -finishing thread FetcherThread, activeThreads=28 322 -finishing thread FetcherThread, activeThreads=27 323 -finishing thread FetcherThread, activeThreads=26 324 -finishing thread FetcherThread, activeThreads=25 325 -finishing thread FetcherThread, activeThreads=24 326 -finishing thread FetcherThread, activeThreads=23 327 -finishing thread FetcherThread, activeThreads=22 328 -finishing thread FetcherThread, activeThreads=21 329 -finishing thread FetcherThread, activeThreads=20 330 -finishing thread FetcherThread, activeThreads=19 331 -finishing thread FetcherThread, activeThreads=18 332 -finishing thread FetcherThread, activeThreads=17 333 -finishing thread FetcherThread, activeThreads=16 334 -finishing thread FetcherThread, activeThreads=15 335 -finishing thread FetcherThread, activeThreads=14 336 -finishing thread FetcherThread, activeThreads=13 337 -finishing thread FetcherThread, activeThreads=12 338 -finishing thread FetcherThread, activeThreads=11 339 -finishing thread FetcherThread, activeThreads=10 340 -finishing thread FetcherThread, activeThreads=9 341 -finishing thread FetcherThread, activeThreads=8 342 -finishing thread FetcherThread, activeThreads=7 343 -finishing thread FetcherThread, activeThreads=6 344 -finishing thread FetcherThread, activeThreads=5 345 -finishing thread FetcherThread, activeThreads=4 346 -finishing thread FetcherThread, activeThreads=3 347 -finishing thread FetcherThread, activeThreads=2 348 -finishing thread FetcherThread, activeThreads=1 349 -finishing thread FetcherThread, activeThreads=0 350 -activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0 351 -activeThreads=0 352 Fetcher: finished at 2014-07-13 20:39:18, elapsed: 00:00:16 353 ParseSegment: starting at 2014-07-13 20:39:18 354 ParseSegment: segment: data/segments/20140713203855 355 Parsed (13ms):http://blog.tianya.cn/blog/culture 356 Parsed (3ms):http://blog.tianya.cn/blog/daren 357 ParseSegment: finished at 2014-07-13 20:39:25, elapsed: 00:00:07 358 CrawlDb update: starting at 2014-07-13 20:39:25 359 CrawlDb update: db: data/crawldb 360 CrawlDb update: segments: [data/segments/20140713203855] 361 CrawlDb update: additions allowed: true 362 CrawlDb update: URL normalizing: true 363 CrawlDb update: URL filtering: true 364 CrawlDb update: 404 purging: false 365 CrawlDb update: Merging segment data into db. 366 CrawlDb update: finished at 2014-07-13 20:39:38, elapsed: 00:00:13 367 LinkDb: starting at 2014-07-13 20:39:38 368 LinkDb: linkdb: data/linkdb 369 LinkDb: URL normalize: true 370 LinkDb: URL filter: true 371 LinkDb: internal links will be ignored. 372 LinkDb: adding segment: file:/home/lan/nutch/local/data/segments/20140713203805 373 LinkDb: adding segment: file:/home/lan/nutch/local/data/segments/20140713203855 374 LinkDb: finished at 2014-07-13 20:39:48, elapsed: 00:00:10 375 crawl finished: data
没加-topN的日志:
1 solrUrl is not set, indexing will be skipped... 2 crawl started in: data 3 rootUrlDir = urls 4 threads = 50 5 depth = 2 6 solrUrl=null 7 Injector: starting at 2014-07-14 20:52:04 8 Injector: crawlDb: data/crawldb 9 Injector: urlDir: urls 10 Injector: Converting injected urls to crawl db entries. 11 Injector: total number of urls rejected by filters: 1 12 Injector: total number of urls injected after normalization and filtering: 2 13 Injector: Merging injected urls into crawl db. 14 Injector: finished at 2014-07-14 20:52:23, elapsed: 00:00:19 15 Generator: starting at 2014-07-14 20:52:23 16 Generator: Selecting best-scoring urls due for fetch. 17 Generator: filtering: true 18 Generator: normalizing: true 19 Generator: jobtracker is 'local', generating exactly one partition. 20 Generator: Partitioning selected urls for politeness. 21 Generator: segment: data/segments/20140714205231 22 Generator: finished at 2014-07-14 20:52:39, elapsed: 00:00:15 23 Fetcher: Your 'http.agent.name' value should be listed first in 'http.robots.agents' property. 24 Fetcher: starting at 2014-07-14 20:52:39 25 Fetcher: segment: data/segments/20140714205231 26 Using queue mode : byHost 27 Fetcher: threads: 50 28 Fetcher: time-out divisor: 2 29 QueueFeeder finished: total 1 records + hit by time limit :0 30 Using queue mode : byHost 31 Using queue mode : byHost 32 Using queue mode : byHost 33 fetching http://blog.tianya.cn/ 34 Using queue mode : byHost 35 -finishing thread FetcherThread, activeThreads=2 36 -finishing thread FetcherThread, activeThreads=1 37 Using queue mode : byHost 38 -finishing thread FetcherThread, activeThreads=1 39 Using queue mode : byHost 40 -finishing thread FetcherThread, activeThreads=1 41 Using queue mode : byHost 42 -finishing thread FetcherThread, activeThreads=1 43 Using queue mode : byHost 44 -finishing thread FetcherThread, activeThreads=1 45 Using queue mode : byHost 46 -finishing thread FetcherThread, activeThreads=1 47 Using queue mode : byHost 48 -finishing thread FetcherThread, activeThreads=1 49 Using queue mode : byHost 50 -finishing thread FetcherThread, activeThreads=1 51 Using queue mode : byHost 52 -finishing thread FetcherThread, activeThreads=1 53 Using queue mode : byHost 54 Using queue mode : byHost 55 -finishing thread FetcherThread, activeThreads=1 56 Using queue mode : byHost 57 -finishing thread FetcherThread, activeThreads=1 58 -finishing thread FetcherThread, activeThreads=1 59 Using queue mode : byHost 60 -finishing thread FetcherThread, activeThreads=1 61 Using queue mode : byHost 62 -finishing thread FetcherThread, activeThreads=1 63 Using queue mode : byHost 64 Using queue mode : byHost 65 -finishing thread FetcherThread, activeThreads=1 66 Using queue mode : byHost 67 -finishing thread FetcherThread, activeThreads=1 68 -finishing thread FetcherThread, activeThreads=1 69 Using queue mode : byHost 70 -finishing thread FetcherThread, activeThreads=1 71 Using queue mode : byHost 72 Using queue mode : byHost 73 -finishing thread FetcherThread, activeThreads=1 74 Using queue mode : byHost 75 -finishing thread FetcherThread, activeThreads=1 76 -finishing thread FetcherThread, activeThreads=1 77 Using queue mode : byHost 78 -finishing thread FetcherThread, activeThreads=1 79 Using queue mode : byHost 80 Using queue mode : byHost 81 -finishing thread FetcherThread, activeThreads=1 82 -finishing thread FetcherThread, activeThreads=1 83 Using queue mode : byHost 84 -finishing thread FetcherThread, activeThreads=1 85 Using queue mode : byHost 86 Using queue mode : byHost 87 -finishing thread FetcherThread, activeThreads=1 88 -finishing thread FetcherThread, activeThreads=1 89 Using queue mode : byHost 90 Using queue mode : byHost 91 -finishing thread FetcherThread, activeThreads=1 92 -finishing thread FetcherThread, activeThreads=1 93 Using queue mode : byHost 94 Using queue mode : byHost 95 -finishing thread FetcherThread, activeThreads=1 96 Using queue mode : byHost 97 -finishing thread FetcherThread, activeThreads=1 98 -finishing thread FetcherThread, activeThreads=1 99 Using queue mode : byHost 100 Using queue mode : byHost 101 -finishing thread FetcherThread, activeThreads=1 102 -finishing thread FetcherThread, activeThreads=1 103 Using queue mode : byHost 104 -finishing thread FetcherThread, activeThreads=1 105 Using queue mode : byHost 106 Using queue mode : byHost 107 -finishing thread FetcherThread, activeThreads=1 108 Using queue mode : byHost 109 -finishing thread FetcherThread, activeThreads=1 110 Using queue mode : byHost 111 -finishing thread FetcherThread, activeThreads=1 112 Using queue mode : byHost 113 -finishing thread FetcherThread, activeThreads=1 114 Using queue mode : byHost 115 -finishing thread FetcherThread, activeThreads=1 116 Using queue mode : byHost 117 -finishing thread FetcherThread, activeThreads=1 118 -finishing thread FetcherThread, activeThreads=1 119 Using queue mode : byHost 120 -finishing thread FetcherThread, activeThreads=1 121 Using queue mode : byHost 122 -finishing thread FetcherThread, activeThreads=1 123 Using queue mode : byHost 124 -finishing thread FetcherThread, activeThreads=1 125 Using queue mode : byHost 126 -finishing thread FetcherThread, activeThreads=1 127 Using queue mode : byHost 128 -finishing thread FetcherThread, activeThreads=1 129 Fetcher: throughput threshold: -1 130 Fetcher: throughput threshold retries: 5 131 -finishing thread FetcherThread, activeThreads=1 132 -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0 133 -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0 134 -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0 135 -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0 136 -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0 137 -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0 138 -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0 139 -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0 140 -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0 141 -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0 142 -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0 143 -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0 144 -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0 145 -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0 146 -finishing thread FetcherThread, activeThreads=0 147 -activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0 148 -activeThreads=0 149 Fetcher: finished at 2014-07-14 20:53:01, elapsed: 00:00:22 150 ParseSegment: starting at 2014-07-14 20:53:01 151 ParseSegment: segment: data/segments/20140714205231 152 Parsed (47ms):http://blog.tianya.cn/ 153 ParseSegment: finished at 2014-07-14 20:53:08, elapsed: 00:00:07 154 CrawlDb update: starting at 2014-07-14 20:53:08 155 CrawlDb update: db: data/crawldb 156 CrawlDb update: segments: [data/segments/20140714205231] 157 CrawlDb update: additions allowed: true 158 CrawlDb update: URL normalizing: true 159 CrawlDb update: URL filtering: true 160 CrawlDb update: 404 purging: false 161 CrawlDb update: Merging segment data into db. 162 CrawlDb update: finished at 2014-07-14 20:53:22, elapsed: 00:00:13 163 Generator: starting at 2014-07-14 20:53:22 164 Generator: Selecting best-scoring urls due for fetch. 165 Generator: filtering: true 166 Generator: normalizing: true 167 Generator: jobtracker is 'local', generating exactly one partition. 168 Generator: Partitioning selected urls for politeness. 169 Generator: segment: data/segments/20140714205330 170 Generator: finished at 2014-07-14 20:53:37, elapsed: 00:00:15 171 Fetcher: Your 'http.agent.name' value should be listed first in 'http.robots.agents' property. 172 Fetcher: starting at 2014-07-14 20:53:37 173 Fetcher: segment: data/segments/20140714205330 174 Using queue mode : byHost 175 Fetcher: threads: 50 176 Fetcher: time-out divisor: 2 177 Using queue mode : byHost 178 Using queue mode : byHost 179 Using queue mode : byHost 180 Using queue mode : byHost 181 Using queue mode : byHost 182 Using queue mode : byHost 183 Using queue mode : byHost 184 Using queue mode : byHost 185 Using queue mode : byHost 186 Using queue mode : byHost 187 Using queue mode : byHost 188 Using queue mode : byHost 189 fetching http://www.tianya.cn/mobile 190 Using queue mode : byHost 191 Using queue mode : byHost 192 Using queue mode : byHost 193 Using queue mode : byHost 194 fetching http://blog.tianya.cn/post-5010184-62889385-1.shtml 195 QueueFeeder finished: total 100 records + hit by time limit :0 196 Using queue mode : byHost 197 Using queue mode : byHost 198 Using queue mode : byHost 199 Using queue mode : byHost 200 Using queue mode : byHost 201 Using queue mode : byHost 202 Using queue mode : byHost 203 Using queue mode : byHost 204 Using queue mode : byHost 205 Using queue mode : byHost 206 Using queue mode : byHost 207 Using queue mode : byHost 208 Using queue mode : byHost 209 Using queue mode : byHost 210 Using queue mode : byHost 211 Using queue mode : byHost 212 Using queue mode : byHost 213 Using queue mode : byHost 214 Using queue mode : byHost 215 Using queue mode : byHost 216 Using queue mode : byHost 217 Using queue mode : byHost 218 Using queue mode : byHost 219 Using queue mode : byHost 220 Using queue mode : byHost 221 Using queue mode : byHost 222 Using queue mode : byHost 223 Using queue mode : byHost 224 Using queue mode : byHost 225 Using queue mode : byHost 226 Using queue mode : byHost 227 Using queue mode : byHost 228 Using queue mode : byHost 229 Using queue mode : byHost 230 Fetcher: throughput threshold: -1 231 Fetcher: throughput threshold retries: 5 232 -activeThreads=50, spinWaiting=48, fetchQueues.totalSize=98 233 -activeThreads=50, spinWaiting=48, fetchQueues.totalSize=98 234 -activeThreads=50, spinWaiting=48, fetchQueues.totalSize=98 235 -activeThreads=50, spinWaiting=48, fetchQueues.totalSize=98 236 -activeThreads=50, spinWaiting=48, fetchQueues.totalSize=98 237 -activeThreads=50, spinWaiting=48, fetchQueues.totalSize=98 238 -activeThreads=50, spinWaiting=48, fetchQueues.totalSize=98 239 -activeThreads=50, spinWaiting=48, fetchQueues.totalSize=98 240 -activeThreads=50, spinWaiting=48, fetchQueues.totalSize=98 241 -activeThreads=50, spinWaiting=48, fetchQueues.totalSize=98 242 -activeThreads=50, spinWaiting=48, fetchQueues.totalSize=98 243 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=98 244 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=98 245 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=98 246 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=98 247 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=98 248 fetching http://blog.tianya.cn/blog/culture 249 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=97 250 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=97 251 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=97 252 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=97 253 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=97 254 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=97 255 fetching http://blog.tianya.cn/post-4487705-62917227-1.shtml 256 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=96 257 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=96 258 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=96 259 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=96 260 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=96 261 fetching http://blog.tianya.cn/post-1119083-62403495-1.shtml 262 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=95 263 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=95 264 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=95 265 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=95 266 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=95 267 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=95 268 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=95 269 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=95 270 fetching http://blog.tianya.cn/blog/ent 271 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=94 272 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=94 273 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=94 274 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=94 275 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=94 276 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=94 277 fetching http://blog.tianya.cn/post-4598537-62971598-1.shtml 278 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=93 279 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=93 280 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=93 281 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=93 282 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=93 283 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=93 284 fetching http://blog.tianya.cn/post-5010184-62834903-1.shtml 285 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=92 286 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=92 287 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=92 288 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=92 289 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=92 290 fetching http://blog.tianya.cn/post-4877164-61406732-1.shtml 291 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=91 292 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=91 293 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=91 294 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=91 295 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=91 296 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=91 297 fetching http://blog.tianya.cn/post-78180-59109533-1.shtml 298 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=90 299 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=90 300 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=90 301 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=90 302 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=90 303 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=90 304 fetching http://blog.tianya.cn/post-4362114-63792588-1.shtml 305 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=89 306 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=89 307 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=89 308 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=89 309 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=89 310 fetching http://blog.tianya.cn/post-3961685-62977022-1.shtml 311 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=88 312 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=88 313 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=88 314 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=88 315 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=88 316 fetching http://blog.tianya.cn/post-5010184-62890806-1.shtml 317 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=87 318 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=87 319 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=87 320 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=87 321 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=87 322 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=87 323 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=87 324 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=87 325 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=87 326 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=87 327 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=87 328 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=87 329 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=87 330 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=87 331 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=87 332 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=87 333 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=87 334 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=87 335 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=87 336 fetching http://blog.tianya.cn/blog/mingbo 337 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=86 338 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=86 339 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=86 340 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=86 341 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=86 342 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=86 343 fetching http://blog.tianya.cn/post-959477-62971507-1.shtml 344 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=85 345 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=85 346 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=85 347 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=85 348 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=85 349 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=85 350 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=85 351 fetching http://blog.tianya.cn/post-4562315-62807399-1.shtml 352 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=84 353 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=84 354 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=84 355 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=84 356 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=84 357 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=84 358 fetching http://blog.tianya.cn/post-3941055-62934113-1.shtml 359 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=83 360 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=83 361 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=83 362 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=83 363 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=83 364 fetching http://blog.tianya.cn/blog/daren 365 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=82 366 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=82 367 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=82 368 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=82 369 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=82 370 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=82 371 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=82 372 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=82 373 fetching http://blog.tianya.cn/post-1196211-63799917-1.shtml 374 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=81 375 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=81 376 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=81 377 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=81 378 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=81 379 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=81 380 fetching http://blog.tianya.cn/post-196238-62376389-1.shtml 381 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=80 382 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=80 383 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=80 384 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=80 385 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=80 386 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=80 387 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=80 388 fetching http://blog.tianya.cn/post-4700528-62898660-1.shtml 389 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=79 390 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=79 391 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=79 392 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=79 393 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=79 394 fetching http://blog.tianya.cn/post-1119083-62958234-1.shtml 395 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=78 396 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=78 397 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=78 398 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=78 399 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=78 400 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=78 401 fetching http://blog.tianya.cn/post-1671874-62898829-1.shtml 402 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=77 403 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=77 404 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=77 405 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=77 406 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=77 407 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=77 408 fetching http://blog.tianya.cn/post-5010184-62313586-1.shtml 409 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=76 410 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=76 411 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=76 412 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=76 413 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=76 414 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=76 415 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=76 416 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=76 417 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=76 418 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=76 419 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=76 420 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=76 421 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=76 422 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=76 423 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=76 424 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=76 425 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=76 426 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=76 427 fetching http://blog.tianya.cn/post-4598537-62379563-1.shtml 428 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=75 429 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=75 430 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=75 431 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=75 432 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=75 433 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=75 434 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=75 435 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=75 436 fetching http://blog.tianya.cn/post-236764-59417277-1.shtml 437 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=74 438 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=74 439 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=74 440 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=74 441 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=74 442 fetching http://blog.tianya.cn/post-4360774-62845782-1.shtml 443 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=73 444 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=73 445 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=73 446 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=73 447 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=73 448 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=73 449 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=73 450 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=73 451 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=73 452 fetching http://blog.tianya.cn/post-196238-61158698-1.shtml 453 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=72 454 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=72 455 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=72 456 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=72 457 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=72 458 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=72 459 fetching http://blog.tianya.cn/post-3340761-62357537-1.shtml 460 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=71 461 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=71 462 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=71 463 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=71 464 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=71 465 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=71 466 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=71 467 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=71 468 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=71 469 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=71 470 fetching http://blog.tianya.cn/post-4562315-62367801-1.shtml 471 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=70 472 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=70 473 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=70 474 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=70 475 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=70 476 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=70 477 fetching http://blog.tianya.cn/post-38484-61144592-1.shtml 478 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=69 479 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=69 480 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=69 481 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=69 482 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=69 483 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=69 484 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=69 485 fetching http://blog.tianya.cn/post-4487705-63000074-1.shtml 486 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=68 487 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=68 488 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=68 489 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=68 490 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=68 491 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=68 492 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=68 493 fetching http://blog.tianya.cn/post-3941055-62972581-1.shtml 494 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=67 495 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=67 496 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=67 497 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=67 498 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=67 499 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=67 500 fetching http://blog.tianya.cn/post-2066284-62926321-1.shtml 501 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=66 502 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=66 503 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=66 504 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=66 505 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=66 506 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=66 507 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=66 508 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=66 509 fetching http://blog.tianya.cn/post-4608093-62651701-1.shtml 510 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=65 511 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=65 512 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=65 513 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=65 514 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=65 515 fetching http://blog.tianya.cn/post-236764-60248116-1.shtml 516 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=64 517 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=64 518 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=64 519 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=64 520 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=64 521 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=64 522 fetching http://blog.tianya.cn/post-5010184-62718271-1.shtml 523 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=63 524 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=63 525 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=63 526 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=63 527 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=63 528 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=63 529 fetching http://blog.tianya.cn/post-234213-62960519-1.shtml 530 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=62 531 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=62 532 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=62 533 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=62 534 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=62 535 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=62 536 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=62 537 fetching http://blog.tianya.cn/post-4600300-62374308-1.shtml 538 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=61 539 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=61 540 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=61 541 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=61 542 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=61 543 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=61 544 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=61 545 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=61 546 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=61 547 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=61 548 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=61 549 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=61 550 fetching http://blog.tianya.cn/post-3739914-62875218-1.shtml 551 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=60 552 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=60 553 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=60 554 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=60 555 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=60 556 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=60 557 fetching http://blog.tianya.cn/post-1119083-62979540-1.shtml 558 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=59 559 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=59 560 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=59 561 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=59 562 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=59 563 fetching http://blog.tianya.cn/post-3773157-62890053-1.shtml 564 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=58 565 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=58 566 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=58 567 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=58 568 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=58 569 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=58 570 fetching http://blog.tianya.cn/post-4562315-62899385-1.shtml 571 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=57 572 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=57 573 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=57 574 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=57 575 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=57 576 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=57 577 fetching http://blog.tianya.cn/post-2513619-62970447-1.shtml 578 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=56 579 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=56 580 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=56 581 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=56 582 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=56 583 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=56 584 fetching http://blog.tianya.cn/post-4482611-62820517-1.shtml 585 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=55 586 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=55 587 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=55 588 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=55 589 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=55 590 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=55 591 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=55 592 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=55 593 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=55 594 fetching http://blog.tianya.cn/post-236764-58766442-1.shtml 595 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=54 596 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=54 597 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=54 598 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=54 599 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=54 600 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=54 601 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=54 602 fetching http://blog.tianya.cn/post-351212-59432160-1.shtml 603 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=53 604 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=53 605 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=53 606 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=53 607 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=53 608 fetching http://blog.tianya.cn/post-174091-62981677-1.shtml 609 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=52 610 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=52 611 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=52 612 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=52 613 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=52 614 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=52 615 fetching http://blog.tianya.cn/post-78180-62903890-1.shtml 616 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=51 617 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=51 618 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=51 619 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=51 620 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=51 621 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=51 622 fetching http://blog.tianya.cn/blog/history 623 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=50 624 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=50 625 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=50 626 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=50 627 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=50 628 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=50 629 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=50 630 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=50 631 fetching http://blog.tianya.cn/post-1578250-62896383-1.shtml 632 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=49 633 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=49 634 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=49 635 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=49 636 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=49 637 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=49 638 fetching http://blog.tianya.cn/post-196238-62190438-1.shtml 639 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=48 640 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=48 641 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=48 642 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=48 643 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=48 644 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=48 645 fetching http://blog.tianya.cn/post-196238-61974722-1.shtml 646 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=47 647 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=47 648 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=47 649 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=47 650 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=47 651 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=47 652 fetching http://blog.tianya.cn/post-4700528-62898663-1.shtml 653 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=46 654 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=46 655 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=46 656 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=46 657 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=46 658 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=46 659 fetching http://blog.tianya.cn/post-5010184-62837336-1.shtml 660 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=45 661 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=45 662 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=45 663 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=45 664 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=45 665 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=45 666 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=45 667 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=45 668 fetching http://blog.tianya.cn/blog/finance 669 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=44 670 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=44 671 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=44 672 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=44 673 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=44 674 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=44 675 fetching http://blog.tianya.cn/post-145340-62426203-1.shtml 676 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=43 677 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=43 678 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=43 679 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=43 680 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=43 681 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=43 682 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=43 683 fetching http://blog.tianya.cn/post-1870300-63794004-1.shtml 684 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=42 685 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=42 686 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=42 687 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=42 688 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=42 689 fetching http://blog.tianya.cn/post-863996-62974859-1.shtml 690 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=41 691 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=41 692 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=41 693 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=41 694 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=41 695 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=41 696 fetching http://blog.tianya.cn/post-3727390-62972109-1.shtml 697 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=40 698 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=40 699 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=40 700 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=40 701 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=40 702 fetching http://blog.tianya.cn/blog/emotion 703 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=39 704 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=39 705 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=39 706 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=39 707 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=39 708 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=39 709 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=39 710 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=39 711 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=39 712 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=39 713 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=39 714 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=39 715 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=39 716 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=39 717 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=39 718 fetching http://blog.tianya.cn/post-336487-63732130-1.shtml 719 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=38 720 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=38 721 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=38 722 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=38 723 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=38 724 fetching http://blog.tianya.cn/post-4025452-63785440-1.shtml 725 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=37 726 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=37 727 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=37 728 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=37 729 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=37 730 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=37 731 fetching http://blog.tianya.cn/post-137239-63797690-1.shtml 732 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=36 733 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=36 734 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=36 735 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=36 736 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=36 737 fetching http://blog.tianya.cn/post-1838543-62970839-1.shtml 738 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=35 739 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=35 740 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=35 741 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=35 742 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=35 743 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=35 744 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=35 745 fetching http://blog.tianya.cn/blog/society 746 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=34 747 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=34 748 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=34 749 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=34 750 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=34 751 fetching http://blog.tianya.cn/post-542686-63799203-1.shtml 752 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=33 753 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=33 754 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=33 755 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=33 756 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=33 757 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=33 758 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=33 759 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=33 760 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=33 761 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=33 762 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=33 763 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=33 764 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=33 765 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=33 766 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=33 767 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=33 768 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=33 769 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=33 770 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=33 771 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=33 772 fetching http://blog.tianya.cn/post-1438407-62987507-1.shtml 773 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=32 774 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=32 775 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=32 776 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=32 777 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=32 778 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=32 779 fetching http://blog.tianya.cn/post-3773157-62390018-1.shtml 780 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=31 781 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=31 782 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=31 783 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=31 784 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=31 785 fetching http://blog.tianya.cn/post-78180-58859246-1.shtml 786 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=30 787 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=30 788 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=30 789 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=30 790 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=30 791 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=30 792 fetching http://blog.tianya.cn/post-236764-62962675-1.shtml 793 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=29 794 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=29 795 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=29 796 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=29 797 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=29 798 fetching http://blog.tianya.cn/blog/life 799 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=28 800 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=28 801 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=28 802 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=28 803 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=28 804 fetching http://blog.tianya.cn/post-1883179-62390915-1.shtml 805 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=27 806 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=27 807 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=27 808 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=27 809 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=27 810 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=27 811 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=27 812 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=27 813 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=27 814 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=27 815 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=27 816 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=27 817 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=27 818 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=27 819 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=27 820 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=27 821 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=27 822 fetching http://blog.tianya.cn/post-4009947-62401775-1.shtml 823 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=26 824 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=26 825 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=26 826 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=26 827 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=26 828 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=26 829 fetching http://blog.tianya.cn/post-4047683-63794167-1.shtml 830 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=25 831 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=25 832 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=25 833 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=25 834 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=25 835 fetching http://blog.tianya.cn/post-1755624-62987935-1.shtml 836 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=24 837 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=24 838 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=24 839 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=24 840 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=24 841 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=24 842 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=24 843 fetching http://blog.tianya.cn/post-5010184-62690266-1.shtml 844 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=23 845 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=23 846 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=23 847 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=23 848 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=23 849 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=23 850 fetching http://blog.tianya.cn/post-4353581-62972558-1.shtml 851 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=22 852 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=22 853 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=22 854 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=22 855 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=22 856 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=22 857 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=22 858 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=22 859 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=22 860 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=22 861 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=22 862 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=22 863 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=22 864 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=22 865 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=22 866 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=22 867 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=22 868 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=22 869 fetching http://blog.tianya.cn/blog/international 870 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=21 871 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=21 872 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=21 873 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=21 874 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=21 875 fetching http://blog.tianya.cn/post-196238-61768175-1.shtml 876 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=20 877 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=20 878 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=20 879 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=20 880 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=20 881 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=20 882 fetching http://blog.tianya.cn/post-4877164-61415979-1.shtml 883 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=19 884 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=19 885 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=19 886 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=19 887 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=19 888 fetching http://blog.tianya.cn/post-544588-62883194-1.shtml 889 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=18 890 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=18 891 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=18 892 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=18 893 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=18 894 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=18 895 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=18 896 fetching http://blog.tianya.cn/post-4250142-62927024-1.shtml 897 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=17 898 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=17 899 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=17 900 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=17 901 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=17 902 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=17 903 fetching http://blog.tianya.cn/post-78180-62980961-1.shtml 904 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=16 905 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=16 906 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=16 907 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=16 908 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=16 909 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=16 910 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=16 911 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=16 912 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=16 913 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=16 914 fetching http://blog.tianya.cn/post-4353581-62972544-1.shtml 915 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=15 916 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=15 917 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=15 918 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=15 919 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=15 920 fetching http://blog.tianya.cn/blog/stock 921 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=14 922 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=14 923 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=14 924 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=14 925 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=14 926 fetching http://blog.tianya.cn/post-4482611-62391796-1.shtml 927 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=13 928 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=13 929 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=13 930 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=13 931 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=13 932 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=13 933 fetching http://blog.tianya.cn/post-4482611-62900444-1.shtml 934 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=12 935 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=12 936 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=12 937 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=12 938 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=12 939 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=12 940 fetching http://blog.tianya.cn/blog/sports 941 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=11 942 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=11 943 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=11 944 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=11 945 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=11 946 fetching http://blog.tianya.cn/post-1882702-63776337-1.shtml 947 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=10 948 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=10 949 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=10 950 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=10 951 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=10 952 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=10 953 fetching http://blog.tianya.cn/post-3773157-62958131-1.shtml 954 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=9 955 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=9 956 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=9 957 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=9 958 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=9 959 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=9 960 fetching http://blog.tianya.cn/post-4101233-62653750-1.shtml 961 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=8 962 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=8 963 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=8 964 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=8 965 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=8 966 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=8 967 fetching http://blog.tianya.cn/blog/newPush 968 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=7 969 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=7 970 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=7 971 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=7 972 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=7 973 fetching http://blog.tianya.cn/post-2111189-62899907-1.shtml 974 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=6 975 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=6 976 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=6 977 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=6 978 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=6 979 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=6 980 fetching http://blog.tianya.cn/blog/food 981 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=5 982 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=5 983 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=5 984 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=5 985 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=5 986 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=5 987 fetching http://blog.tianya.cn/post-1515015-63779836-1.shtml 988 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=4 989 * queue: http://blog.tianya.cn 990 maxThreads = 1 991 inProgress = 1 992 crawlDelay = 5000 993 minCrawlDelay = 0 994 nextFetchTime = 1405343081023 995 now = 1405343081793 996 0. http://blog.tianya.cn/post-142905-62961160-1.shtml 997 1. http://blog.tianya.cn/blog/travel 998 2. http://blog.tianya.cn/post-4598537-62971461-1.shtml 999 3. http://blog.tianya.cn/post-4598537-62971498-1.shtml 1000 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=4 1001 * queue: http://blog.tianya.cn 1002 maxThreads = 1 1003 inProgress = 0 1004 crawlDelay = 5000 1005 minCrawlDelay = 0 1006 nextFetchTime = 1405343087577 1007 now = 1405343082796 1008 0. http://blog.tianya.cn/post-142905-62961160-1.shtml 1009 1. http://blog.tianya.cn/blog/travel 1010 2. http://blog.tianya.cn/post-4598537-62971461-1.shtml 1011 3. http://blog.tianya.cn/post-4598537-62971498-1.shtml 1012 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=4 1013 * queue: http://blog.tianya.cn 1014 maxThreads = 1 1015 inProgress = 0 1016 crawlDelay = 5000 1017 minCrawlDelay = 0 1018 nextFetchTime = 1405343087577 1019 now = 1405343083799 1020 0. http://blog.tianya.cn/post-142905-62961160-1.shtml 1021 1. http://blog.tianya.cn/blog/travel 1022 2. http://blog.tianya.cn/post-4598537-62971461-1.shtml 1023 3. http://blog.tianya.cn/post-4598537-62971498-1.shtml 1024 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=4 1025 * queue: http://blog.tianya.cn 1026 maxThreads = 1 1027 inProgress = 0 1028 crawlDelay = 5000 1029 minCrawlDelay = 0 1030 nextFetchTime = 1405343087577 1031 now = 1405343084804 1032 0. http://blog.tianya.cn/post-142905-62961160-1.shtml 1033 1. http://blog.tianya.cn/blog/travel 1034 2. http://blog.tianya.cn/post-4598537-62971461-1.shtml 1035 3. http://blog.tianya.cn/post-4598537-62971498-1.shtml 1036 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=4 1037 * queue: http://blog.tianya.cn 1038 maxThreads = 1 1039 inProgress = 0 1040 crawlDelay = 5000 1041 minCrawlDelay = 0 1042 nextFetchTime = 1405343087577 1043 now = 1405343085806 1044 0. http://blog.tianya.cn/post-142905-62961160-1.shtml 1045 1. http://blog.tianya.cn/blog/travel 1046 2. http://blog.tianya.cn/post-4598537-62971461-1.shtml 1047 3. http://blog.tianya.cn/post-4598537-62971498-1.shtml 1048 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=4 1049 * queue: http://blog.tianya.cn 1050 maxThreads = 1 1051 inProgress = 0 1052 crawlDelay = 5000 1053 minCrawlDelay = 0 1054 nextFetchTime = 1405343087577 1055 now = 1405343086809 1056 0. http://blog.tianya.cn/post-142905-62961160-1.shtml 1057 1. http://blog.tianya.cn/blog/travel 1058 2. http://blog.tianya.cn/post-4598537-62971461-1.shtml 1059 3. http://blog.tianya.cn/post-4598537-62971498-1.shtml 1060 fetching http://blog.tianya.cn/post-142905-62961160-1.shtml 1061 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=3 1062 * queue: http://blog.tianya.cn 1063 maxThreads = 1 1064 inProgress = 0 1065 crawlDelay = 5000 1066 minCrawlDelay = 0 1067 nextFetchTime = 1405343092743 1068 now = 1405343087813 1069 0. http://blog.tianya.cn/blog/travel 1070 1. http://blog.tianya.cn/post-4598537-62971461-1.shtml 1071 2. http://blog.tianya.cn/post-4598537-62971498-1.shtml 1072 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=3 1073 * queue: http://blog.tianya.cn 1074 maxThreads = 1 1075 inProgress = 0 1076 crawlDelay = 5000 1077 minCrawlDelay = 0 1078 nextFetchTime = 1405343092743 1079 now = 1405343088816 1080 0. http://blog.tianya.cn/blog/travel 1081 1. http://blog.tianya.cn/post-4598537-62971461-1.shtml 1082 2. http://blog.tianya.cn/post-4598537-62971498-1.shtml 1083 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=3 1084 * queue: http://blog.tianya.cn 1085 maxThreads = 1 1086 inProgress = 0 1087 crawlDelay = 5000 1088 minCrawlDelay = 0 1089 nextFetchTime = 1405343092743 1090 now = 1405343089819 1091 0. http://blog.tianya.cn/blog/travel 1092 1. http://blog.tianya.cn/post-4598537-62971461-1.shtml 1093 2. http://blog.tianya.cn/post-4598537-62971498-1.shtml 1094 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=3 1095 * queue: http://blog.tianya.cn 1096 maxThreads = 1 1097 inProgress = 0 1098 crawlDelay = 5000 1099 minCrawlDelay = 0 1100 nextFetchTime = 1405343092743 1101 now = 1405343090821 1102 0. http://blog.tianya.cn/blog/travel 1103 1. http://blog.tianya.cn/post-4598537-62971461-1.shtml 1104 2. http://blog.tianya.cn/post-4598537-62971498-1.shtml 1105 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=3 1106 * queue: http://blog.tianya.cn 1107 maxThreads = 1 1108 inProgress = 0 1109 crawlDelay = 5000 1110 minCrawlDelay = 0 1111 nextFetchTime = 1405343092743 1112 now = 1405343091824 1113 0. http://blog.tianya.cn/blog/travel 1114 1. http://blog.tianya.cn/post-4598537-62971461-1.shtml 1115 2. http://blog.tianya.cn/post-4598537-62971498-1.shtml 1116 fetching http://blog.tianya.cn/blog/travel 1117 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=2 1118 * queue: http://blog.tianya.cn 1119 maxThreads = 1 1120 inProgress = 1 1121 crawlDelay = 5000 1122 minCrawlDelay = 0 1123 nextFetchTime = 1405343092743 1124 now = 1405343092826 1125 0. http://blog.tianya.cn/post-4598537-62971461-1.shtml 1126 1. http://blog.tianya.cn/post-4598537-62971498-1.shtml 1127 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=2 1128 * queue: http://blog.tianya.cn 1129 maxThreads = 1 1130 inProgress = 0 1131 crawlDelay = 5000 1132 minCrawlDelay = 0 1133 nextFetchTime = 1405343098775 1134 now = 1405343093829 1135 0. http://blog.tianya.cn/post-4598537-62971461-1.shtml 1136 1. http://blog.tianya.cn/post-4598537-62971498-1.shtml 1137 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=2 1138 * queue: http://blog.tianya.cn 1139 maxThreads = 1 1140 inProgress = 0 1141 crawlDelay = 5000 1142 minCrawlDelay = 0 1143 nextFetchTime = 1405343098775 1144 now = 1405343094833 1145 0. http://blog.tianya.cn/post-4598537-62971461-1.shtml 1146 1. http://blog.tianya.cn/post-4598537-62971498-1.shtml 1147 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=2 1148 * queue: http://blog.tianya.cn 1149 maxThreads = 1 1150 inProgress = 0 1151 crawlDelay = 5000 1152 minCrawlDelay = 0 1153 nextFetchTime = 1405343098775 1154 now = 1405343095835 1155 0. http://blog.tianya.cn/post-4598537-62971461-1.shtml 1156 1. http://blog.tianya.cn/post-4598537-62971498-1.shtml 1157 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=2 1158 * queue: http://blog.tianya.cn 1159 maxThreads = 1 1160 inProgress = 0 1161 crawlDelay = 5000 1162 minCrawlDelay = 0 1163 nextFetchTime = 1405343098775 1164 now = 1405343096838 1165 0. http://blog.tianya.cn/post-4598537-62971461-1.shtml 1166 1. http://blog.tianya.cn/post-4598537-62971498-1.shtml 1167 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=2 1168 * queue: http://blog.tianya.cn 1169 maxThreads = 1 1170 inProgress = 0 1171 crawlDelay = 5000 1172 minCrawlDelay = 0 1173 nextFetchTime = 1405343098775 1174 now = 1405343097840 1175 0. http://blog.tianya.cn/post-4598537-62971461-1.shtml 1176 1. http://blog.tianya.cn/post-4598537-62971498-1.shtml 1177 fetching http://blog.tianya.cn/post-4598537-62971461-1.shtml 1178 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=1 1179 * queue: http://blog.tianya.cn 1180 maxThreads = 1 1181 inProgress = 1 1182 crawlDelay = 5000 1183 minCrawlDelay = 0 1184 nextFetchTime = 1405343098775 1185 now = 1405343098843 1186 0. http://blog.tianya.cn/post-4598537-62971498-1.shtml 1187 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=1 1188 * queue: http://blog.tianya.cn 1189 maxThreads = 1 1190 inProgress = 1 1191 crawlDelay = 5000 1192 minCrawlDelay = 0 1193 nextFetchTime = 1405343098775 1194 now = 1405343099846 1195 0. http://blog.tianya.cn/post-4598537-62971498-1.shtml 1196 -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=1 1197 * queue: http://blog.tianya.cn 1198 maxThreads = 1 1199 inProgress = 1 1200 crawlDelay = 5000 1201 minCrawlDelay = 0 1202 nextFetchTime = 1405343098775 1203 now = 1405343100849 1204 0. http://blog.tianya.cn/post-4598537-62971498-1.shtml 1205 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=1 1206 * queue: http://blog.tianya.cn 1207 maxThreads = 1 1208 inProgress = 0 1209 crawlDelay = 5000 1210 minCrawlDelay = 0 1211 nextFetchTime = 1405343105959 1212 now = 1405343101851 1213 0. http://blog.tianya.cn/post-4598537-62971498-1.shtml 1214 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=1 1215 * queue: http://blog.tianya.cn 1216 maxThreads = 1 1217 inProgress = 0 1218 crawlDelay = 5000 1219 minCrawlDelay = 0 1220 nextFetchTime = 1405343105959 1221 now = 1405343102853 1222 0. http://blog.tianya.cn/post-4598537-62971498-1.shtml 1223 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=1 1224 * queue: http://blog.tianya.cn 1225 maxThreads = 1 1226 inProgress = 0 1227 crawlDelay = 5000 1228 minCrawlDelay = 0 1229 nextFetchTime = 1405343105959 1230 now = 1405343103855 1231 0. http://blog.tianya.cn/post-4598537-62971498-1.shtml 1232 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=1 1233 * queue: http://blog.tianya.cn 1234 maxThreads = 1 1235 inProgress = 0 1236 crawlDelay = 5000 1237 minCrawlDelay = 0 1238 nextFetchTime = 1405343105959 1239 now = 1405343104857 1240 0. http://blog.tianya.cn/post-4598537-62971498-1.shtml 1241 -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=1 1242 * queue: http://blog.tianya.cn 1243 maxThreads = 1 1244 inProgress = 0 1245 crawlDelay = 5000 1246 minCrawlDelay = 0 1247 nextFetchTime = 1405343105959 1248 now = 1405343105859 1249 0. http://blog.tianya.cn/post-4598537-62971498-1.shtml 1250 fetching http://blog.tianya.cn/post-4598537-62971498-1.shtml 1251 -finishing thread FetcherThread, activeThreads=49 1252 -finishing thread FetcherThread, activeThreads=48 1253 -finishing thread FetcherThread, activeThreads=47 1254 -finishing thread FetcherThread, activeThreads=46 1255 -finishing thread FetcherThread, activeThreads=45 1256 -finishing thread FetcherThread, activeThreads=44 1257 -finishing thread FetcherThread, activeThreads=43 1258 -finishing thread FetcherThread, activeThreads=42 1259 -finishing thread FetcherThread, activeThreads=41 1260 -finishing thread FetcherThread, activeThreads=40 1261 -finishing thread FetcherThread, activeThreads=39 1262 -finishing thread FetcherThread, activeThreads=38 1263 -finishing thread FetcherThread, activeThreads=37 1264 -finishing thread FetcherThread, activeThreads=36 1265 -finishing thread FetcherThread, activeThreads=29 1266 -finishing thread FetcherThread, activeThreads=30 1267 -finishing thread FetcherThread, activeThreads=31 1268 -finishing thread FetcherThread, activeThreads=32 1269 -finishing thread FetcherThread, activeThreads=33 1270 -finishing thread FetcherThread, activeThreads=34 1271 -finishing thread FetcherThread, activeThreads=35 1272 -finishing thread FetcherThread, activeThreads=28 1273 -finishing thread FetcherThread, activeThreads=20 1274 -finishing thread FetcherThread, activeThreads=21 1275 -finishing thread FetcherThread, activeThreads=22 1276 -finishing thread FetcherThread, activeThreads=23 1277 -finishing thread FetcherThread, activeThreads=19 1278 -finishing thread FetcherThread, activeThreads=18 1279 -finishing thread FetcherThread, activeThreads=17 1280 -finishing thread FetcherThread, activeThreads=16 1281 -finishing thread FetcherThread, activeThreads=15 1282 -finishing thread FetcherThread, activeThreads=14 1283 -finishing thread FetcherThread, activeThreads=13 1284 -finishing thread FetcherThread, activeThreads=12 1285 -finishing thread FetcherThread, activeThreads=11 1286 -finishing thread FetcherThread, activeThreads=10 1287 -finishing thread FetcherThread, activeThreads=9 1288 -finishing thread FetcherThread, activeThreads=8 1289 -finishing thread FetcherThread, activeThreads=7 1290 -finishing thread FetcherThread, activeThreads=6 1291 -finishing thread FetcherThread, activeThreads=24 1292 -finishing thread FetcherThread, activeThreads=25 1293 -finishing thread FetcherThread, activeThreads=26 1294 -finishing thread FetcherThread, activeThreads=27 1295 -finishing thread FetcherThread, activeThreads=1 1296 -finishing thread FetcherThread, activeThreads=2 1297 -finishing thread FetcherThread, activeThreads=3 1298 -finishing thread FetcherThread, activeThreads=4 1299 -finishing thread FetcherThread, activeThreads=5 1300 -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0 1301 -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0 1302 -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0 1303 -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0 1304 -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0 1305 -finishing thread FetcherThread, activeThreads=0 1306 -activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0 1307 -activeThreads=0 1308 Fetcher: finished at 2014-07-14 21:05:17, elapsed: 00:11:39 1309 ParseSegment: starting at 2014-07-14 21:05:17 1310 ParseSegment: segment: data/segments/20140714205330 1311 Parsed (13ms):http://blog.tianya.cn/blog/culture 1312 Parsed (2ms):http://blog.tianya.cn/blog/daren 1313 Parsed (15ms):http://blog.tianya.cn/blog/emotion 1314 Parsed (8ms):http://blog.tianya.cn/blog/ent 1315 Parsed (7ms):http://blog.tianya.cn/blog/finance 1316 Parsed (7ms):http://blog.tianya.cn/blog/food 1317 Parsed (12ms):http://blog.tianya.cn/blog/history 1318 Parsed (6ms):http://blog.tianya.cn/blog/international 1319 Parsed (6ms):http://blog.tianya.cn/blog/life 1320 Parsed (3ms):http://blog.tianya.cn/blog/mingbo 1321 Parsed (7ms):http://blog.tianya.cn/blog/newPush 1322 Parsed (16ms):http://blog.tianya.cn/blog/society 1323 Parsed (8ms):http://blog.tianya.cn/blog/sports 1324 Parsed (8ms):http://blog.tianya.cn/blog/stock 1325 Parsed (20ms):http://blog.tianya.cn/blog/travel 1326 Parsed (6ms):http://blog.tianya.cn/post-1119083-62403495-1.shtml 1327 Parsed (4ms):http://blog.tianya.cn/post-1119083-62958234-1.shtml 1328 Parsed (5ms):http://blog.tianya.cn/post-1119083-62979540-1.shtml 1329 Parsed (0ms):http://blog.tianya.cn/post-1196211-63799917-1.shtml 1330 Parsed (0ms):http://blog.tianya.cn/post-137239-63797690-1.shtml 1331 Parsed (0ms):http://blog.tianya.cn/post-1438407-62987507-1.shtml 1332 Parsed (0ms):http://blog.tianya.cn/post-145340-62426203-1.shtml 1333 Parsed (0ms):http://blog.tianya.cn/post-1515015-63779836-1.shtml 1334 Parsed (0ms):http://blog.tianya.cn/post-1578250-62896383-1.shtml 1335 Parsed (1ms):http://blog.tianya.cn/post-1671874-62898829-1.shtml 1336 Parsed (1ms):http://blog.tianya.cn/post-174091-62981677-1.shtml 1337 Parsed (0ms):http://blog.tianya.cn/post-1755624-62987935-1.shtml 1338 Parsed (0ms):http://blog.tianya.cn/post-1838543-62970839-1.shtml 1339 Parsed (1ms):http://blog.tianya.cn/post-1870300-63794004-1.shtml 1340 Parsed (1ms):http://blog.tianya.cn/post-1882702-63776337-1.shtml 1341 Parsed (1ms):http://blog.tianya.cn/post-1883179-62390915-1.shtml 1342 Parsed (0ms):http://blog.tianya.cn/post-196238-61158698-1.shtml 1343 Parsed (1ms):http://blog.tianya.cn/post-196238-61768175-1.shtml 1344 Parsed (0ms):http://blog.tianya.cn/post-196238-61974722-1.shtml 1345 Parsed (0ms):http://blog.tianya.cn/post-196238-62190438-1.shtml 1346 Parsed (1ms):http://blog.tianya.cn/post-196238-62376389-1.shtml 1347 Parsed (1ms):http://blog.tianya.cn/post-2066284-62926321-1.shtml 1348 Parsed (0ms):http://blog.tianya.cn/post-2111189-62899907-1.shtml 1349 Parsed (0ms):http://blog.tianya.cn/post-234213-62960519-1.shtml 1350 Parsed (0ms):http://blog.tianya.cn/post-236764-58766442-1.shtml 1351 Parsed (0ms):http://blog.tianya.cn/post-236764-59417277-1.shtml 1352 http://blog.tianya.cn/post-236764-60248116-1.shtml skipped. Content of size 65778 was truncated to 64957 1353 Parsed (1ms):http://blog.tianya.cn/post-236764-62962675-1.shtml 1354 Parsed (0ms):http://blog.tianya.cn/post-2513619-62970447-1.shtml 1355 Parsed (0ms):http://blog.tianya.cn/post-3340761-62357537-1.shtml 1356 Parsed (0ms):http://blog.tianya.cn/post-3727390-62972109-1.shtml 1357 Parsed (0ms):http://blog.tianya.cn/post-3739914-62875218-1.shtml 1358 Parsed (0ms):http://blog.tianya.cn/post-3773157-62390018-1.shtml 1359 Parsed (0ms):http://blog.tianya.cn/post-3773157-62890053-1.shtml 1360 Parsed (1ms):http://blog.tianya.cn/post-3773157-62958131-1.shtml 1361 http://blog.tianya.cn/post-38484-61144592-1.shtml skipped. Content of size 154978 was truncated to 64956 1362 Parsed (0ms):http://blog.tianya.cn/post-3941055-62934113-1.shtml 1363 Parsed (0ms):http://blog.tianya.cn/post-3941055-62972581-1.shtml 1364 Parsed (0ms):http://blog.tianya.cn/post-3961685-62977022-1.shtml 1365 Parsed (1ms):http://blog.tianya.cn/post-4009947-62401775-1.shtml 1366 Parsed (0ms):http://blog.tianya.cn/post-4025452-63785440-1.shtml 1367 Parsed (0ms):http://blog.tianya.cn/post-4047683-63794167-1.shtml 1368 Parsed (0ms):http://blog.tianya.cn/post-4101233-62653750-1.shtml 1369 Parsed (0ms):http://blog.tianya.cn/post-4250142-62927024-1.shtml 1370 Parsed (0ms):http://blog.tianya.cn/post-4353581-62972544-1.shtml 1371 Parsed (0ms):http://blog.tianya.cn/post-4353581-62972558-1.shtml 1372 Parsed (0ms):http://blog.tianya.cn/post-4360774-62845782-1.shtml 1373 Parsed (0ms):http://blog.tianya.cn/post-4362114-63792588-1.shtml 1374 Parsed (0ms):http://blog.tianya.cn/post-4482611-62391796-1.shtml 1375 Parsed (0ms):http://blog.tianya.cn/post-4482611-62820517-1.shtml 1376 Parsed (0ms):http://blog.tianya.cn/post-4482611-62900444-1.shtml 1377 Parsed (0ms):http://blog.tianya.cn/post-4487705-62917227-1.shtml 1378 Parsed (0ms):http://blog.tianya.cn/post-4487705-63000074-1.shtml 1379 Parsed (0ms):http://blog.tianya.cn/post-4562315-62367801-1.shtml 1380 Parsed (0ms):http://blog.tianya.cn/post-4562315-62807399-1.shtml 1381 Parsed (0ms):http://blog.tianya.cn/post-4562315-62899385-1.shtml 1382 Parsed (0ms):http://blog.tianya.cn/post-4598537-62379563-1.shtml 1383 Parsed (0ms):http://blog.tianya.cn/post-4598537-62971461-1.shtml 1384 Parsed (0ms):http://blog.tianya.cn/post-4598537-62971498-1.shtml 1385 Parsed (0ms):http://blog.tianya.cn/post-4598537-62971598-1.shtml 1386 Parsed (1ms):http://blog.tianya.cn/post-4600300-62374308-1.shtml 1387 Parsed (0ms):http://blog.tianya.cn/post-4608093-62651701-1.shtml 1388 Parsed (0ms):http://blog.tianya.cn/post-4700528-62898660-1.shtml 1389 Parsed (0ms):http://blog.tianya.cn/post-4700528-62898663-1.shtml 1390 Parsed (0ms):http://blog.tianya.cn/post-4877164-61406732-1.shtml 1391 Parsed (0ms):http://blog.tianya.cn/post-4877164-61415979-1.shtml 1392 Parsed (0ms):http://blog.tianya.cn/post-5010184-62313586-1.shtml 1393 Parsed (0ms):http://blog.tianya.cn/post-5010184-62690266-1.shtml 1394 Parsed (1ms):http://blog.tianya.cn/post-5010184-62718271-1.shtml 1395 Parsed (0ms):http://blog.tianya.cn/post-5010184-62834903-1.shtml 1396 Parsed (0ms):http://blog.tianya.cn/post-5010184-62837336-1.shtml 1397 Parsed (0ms):http://blog.tianya.cn/post-5010184-62889385-1.shtml 1398 Parsed (0ms):http://blog.tianya.cn/post-5010184-62890806-1.shtml 1399 Parsed (0ms):http://blog.tianya.cn/post-542686-63799203-1.shtml 1400 Parsed (0ms):http://blog.tianya.cn/post-544588-62883194-1.shtml 1401 Parsed (0ms):http://blog.tianya.cn/post-78180-58859246-1.shtml 1402 Parsed (0ms):http://blog.tianya.cn/post-78180-59109533-1.shtml 1403 Parsed (0ms):http://blog.tianya.cn/post-78180-62903890-1.shtml 1404 Parsed (0ms):http://blog.tianya.cn/post-78180-62980961-1.shtml 1405 Parsed (1ms):http://blog.tianya.cn/post-863996-62974859-1.shtml 1406 Parsed (1ms):http://blog.tianya.cn/post-959477-62971507-1.shtml 1407 ParseSegment: finished at 2014-07-14 21:05:30, elapsed: 00:00:13 1408 CrawlDb update: starting at 2014-07-14 21:05:30 1409 CrawlDb update: db: data/crawldb 1410 CrawlDb update: segments: [data/segments/20140714205330] 1411 CrawlDb update: additions allowed: true 1412 CrawlDb update: URL normalizing: true 1413 CrawlDb update: URL filtering: true 1414 CrawlDb update: 404 purging: false 1415 CrawlDb update: Merging segment data into db. 1416 CrawlDb update: finished at 2014-07-14 21:05:43, elapsed: 00:00:13 1417 LinkDb: starting at 2014-07-14 21:05:43 1418 LinkDb: linkdb: data/linkdb 1419 LinkDb: URL normalize: true 1420 LinkDb: URL filter: true 1421 LinkDb: internal links will be ignored. 1422 LinkDb: adding segment: file:/home/lan/nutch/local/data/segments/20140714205231 1423 LinkDb: adding segment: file:/home/lan/nutch/local/data/segments/20140714205330 1424 LinkDb: finished at 2014-07-14 21:05:53, elapsed: 00:00:10 1425 crawl finished: data
可以看到总共用了13分钟。日志靠近最下方有parse各个网页的时间
输入这条命令:
cat nohup.out|grep elapsed
显示信息如下:
1 Injector: finished at 2014-07-14 20:52:23, elapsed: 00:00:19 2 Generator: finished at 2014-07-14 20:52:39, elapsed: 00:00:15 3 Fetcher: finished at 2014-07-14 20:53:01, elapsed: 00:00:22 4 ParseSegment: finished at 2014-07-14 20:53:08, elapsed: 00:00:07 5 CrawlDb update: finished at 2014-07-14 20:53:22, elapsed: 00:00:13 6 Generator: finished at 2014-07-14 20:53:37, elapsed: 00:00:15 7 Fetcher: finished at 2014-07-14 21:05:17, elapsed: 00:11:39 8 ParseSegment: finished at 2014-07-14 21:05:30, elapsed: 00:00:13 9 CrawlDb update: finished at 2014-07-14 21:05:43, elapsed: 00:00:13 10 LinkDb: finished at 2014-07-14 21:05:53, elapsed: 00:00:10
每次crawl都是从injector(注入url)开始,然后Generator(产生抓取列表),接着Fetch(抓取),然后ParseSegment(内容解析), CrawlDb update(更新CrawlDb)为一轮,最后以LinkDb结束。由于没加topN所以第二层的Fetch用了11分钟时间。
这里有一篇nutch执行crawl命令的详细步骤的文章:http://www.cnblogs.com/huligong1234/p/3515214.html
以下命令只给出例子,详细说明可参见上面的链接
readdb: readdb命令是“org.apache.nutch.crawl.CrawlDbReader”的别称,返回或者导出Crawl数据库(crawldb)中的信息。
例子1:./bin/nutch readdb data/crawldb -stats
指定抓取完成后的数据在data/crawldb中
-stats代表在java标准输出中输出信息,如url数、已抓取数、未抓取数
以下是输出信息:
1 CrawlDb statistics start: data/crawldb 2 Statistics for CrawlDb: data/crawldb 3 TOTAL urls: 1469 4 retry 0: 1469 5 min score: 0.0 6 avg score: 0.0017549354 7 max score: 1.032 8 status 1 (db_unfetched): 1368 9 status 2 (db_fetched): 97 10 status 4 (db_redir_temp): 3 11 status 5 (db_redir_perm): 1 12 CrawlDb statistics: done
例子2:./bin/nutch readdb data/crawldb -dump data/crawldb/crawldb_dump
-dump把统计信息输出到后面的文件中
例子3:./bin/nutch readdb data/crawldb -url http://zxcvbnm20111.blog.tianya.cn/
输出 http://zxcvbnm20111.blog.tianya.cn/这个url的详细信息
这个网页是在运行例子2的命令之后,在data/crawldb/crawldb_dump文件中找的
信息如下:
1 CrawlDb dump: starting 2 CrawlDb db: data/crawldb 3 CrawlDb dump: done 4 lan@Ubuntu1:~/nutch/local$ 5 lan@Ubuntu1:~/nutch/local$ ./bin/nutch readdb data/crawldb -url http://zxcvbnm20111.blog.tianya.cn/ 6 URL: http://zxcvbnm20111.blog.tianya.cn/ 7 Version: 7 8 Status: 1 (db_unfetched) 9 Fetch time: Mon Jul 14 21:05:40 CST 2014 10 Modified time: Thu Jan 01 08:30:00 CST 1970 11 Retries since fetch: 0 12 Retry interval: 2592000 seconds (30 days) 13 Score: 2.9411764E-4 14 Signature: null 15 Metadata:
例子3:./bin/nutch readdb data/crawldb -topN 10 data/crawldb/crawldb_topN 0.5
在data/crawldb/crawldb_topN文件中输出排名前十的且分值大>=0.5的url及其分值
readseg: 例子1:./bin/nutch readseg -dump data/segments/20140714205330 data/segments/dump -nocontent -nofetch -noparse -noparsedata -noparsetext
查看segments产生的信息,输出到data/segments/dump文件(在参数中少了-nogenerate,就是说只写入产生segments的信息)
如果查看fetch信息,就把-nofetch改成-nogenerate
要查看content信息,就把-nocontent改成-nogenerate
同理,还有parse、parsedata和parsetext,不再赘述
例子2: ./bin/nutch readseg -list -dir data/segments
以列表的方式显示每次产生的segments
例子3: ./bin/nutch readseg -get data/segments/20140714205231 http://blog.tianya.cn/
显示某个segments的信息,哇塞,有一大堆html代码和内容~
readlinkdb: 例子1:./bin/nutch readlinkdb data/linkdb -dump data/linkdb/dump
将linkdb的信息dump到data/linkdb/dump文件里
例子2:./bin/nutch readlinkdb data/linkdb -url http://cnrdn.com/4NJC
查看某具体url。这个url是我在上面的dump文件中复制出来的
结果将会产生和dump文件中该url下面几行一样的文字
generate -> fetch -> parse -> update db
实际上,crawl命令等于inject命令+generate命令+fetch命令+parse命令+updatedb命令+invertlinks命令:
inject: 例子1: ./bin/nutch inject data/crawldb urls
把要抓取的url注入到crawldb中。url存放在urls文件夹中的所有文件中,注入到data/crawldb中。
要保证data不存在
generate: 例子: ./bin/nutch generate data/crawldb data/segments
fetch: 例子:./bin/nutch fetch data/segments/20140716205702 -threads 3
parse: 例子:./bin/nutch parse data/segments/20140716205702
updatedb: 例子: ./bin/nutch updatedb data/crawldb -dir data/segments
mergesegs: 例子:./bin/nutch mergesegs data2/segments_all -dir data2/segments
要注意,在segments文件夹及其子文件夹中不要有自己另外生成的东西
非常有用的命令,合并之后文件变小。文件越多越大,合并效果越好。I/O越快
类似的还有mergedb、mergelinkdb命令
invertlinks: 例子:./bin/nutch invertlinks data/linkdb -dir data/segments
要注意,在segments文件夹及其子文件夹中不要有自己另外生成的东西。
计算反向链接分析新输入的segment目录,产生新的反向链接库
把新产生的反向链接库与原来的库进行合并
通过计算有多少个网页指向当前网页,来计算当前网页的分值
parsecheker: 例子1:./bin/nutch parsechecker http://apdplat.org
可以方便的查看网页中有哪些链接
例子2: ./bin/nutch parsechecker -dumpText http://apdplat.org
只查看网页中的文本
./bin/nutch domainstats data/crawldb/current host host
第一个host是输出目录,第二个host是输出选项
./bin/nutch domainstats data/crawldb/current domain domain
./bin/nutch domainstats data/crawldb/current suffix suffix
./bin/nutch domainstats data/crawldb/current tld tld
从host级别到tld级别统计信息越来越少,因为后面的url包括前面的url,
假如有网址 http://www.cnblogs.com.cn/,host是www.cnblogs.com.cn,domain是cnblogs.com.cn,suffix是顶级域名com.cn。tld是比顶级域名还高级的域名,在这里就是cn,如果url是http://www.cnblogs.com,那么tld和suffix都是com
./bin/nutch webgraph -segmentDir data/segments -webgraphdb data/webgraphdb
指定segments输入路径和webgraphdb输出路径。将在data/webgraphdb生成Outlinks、Inlinks、Nodes
分别对应输出链接及数量,输入链接及数量,url及其分值
第一次执行webgraph命令时,nodes中的所有url的分值为0,因此需要执行linkrank命令
输出链接是保存在parse_data里的,所有OutLinkDb的的输入链接是parse_data
由输出连接可以得到所有网页的输入连接,就能计算每个网页的分值
./bin/nutch nodedumper -topn 1 -inlinks -output inlinks_topn_1 -webgraphdb data/webgraphdb
查看data/webgraphdb里的文件内容,可以看到url和输入链接数量
-asSequenceFile参数是生成序列文件,因为序列文件是2进制的,这里不用
-topn,如果有相同的输出链接,只输出topn条
-inlinks,按输入链接降序排序,类似的还有-outlinks、-scores
-output,指定输出目录
-webgraphdb,指定webgraphdb路径
如果按照scores来排序,在生成的文件中,我们可以看到所有的url分值都为0,
这说明经过执行webgraph命令,所有的url分值都为0
./bin/nutch linkrank -webgraphdb data/webgraphdb
计算分值并记录起来
然后在用命令:
./bin/nutch nodedumper -topn 1 -scores -output after-inject-scores -webgraphdb data/webgraphdb
可以发现after-inject-scores文件夹里的文件内容里的url分值不再是0
./bin/nutch nodedumper -group domain sum -inlinks -output inlink_domain_sum -webgraphdb data/webgraphdb
生成分组数据
domain可以替换成host, sum可以替换成max。这个两个参数要放在-group之后
如果对上面命令再加上-topn 1,输出路径改为inlink_domain_sum_1,会发现这个文件中的有些输入连接数少了
说明nodedumper先进行分组,然后再对每个组中的top1进行求和(和等于每组的最大输入链接数)
./bin/nutch scoreupdater -crawldb data/crawldb -webgraphdb data/webgraphdb
crawl命令默认使用了opic插件来计算分值。而webgraph的计算分值方式是从1.0开始有的,
比较完善。
./bin/nutch freegen urls2 data3/segments
urls2文件夹中存放了新生成的保存url的文件,有一个url在里边:http://apdplat.org
新生成的段输出到data3/segments
这个命令可以绕过抓取庞大的crawldb库,专门去通过某些url生成segments
检查是否配置了索引插件配置是否成功:
./bin/nutch indexchecker http://www.163.com
在显示的信息中title和content比较重要
找了很久才找到3.6.2和4.2.0的下载地址。现在主页上已经不能下载。
这里是solr各版本下载地址:http://archive.apache.org/dist/lucene/solr/
这里使用的是3.6.2
配置solr
1. 把nutch的conf/schema.xml复制到solr的/example/solr/conf中,注意备份solr的schema.xml
在nutch的conf/nutch-default.xml中搜索index- ,会找到如下xml段
<property> <name>plugin.includes</name> <value>protocol-http|urlfilter-regex|parse-(html|tika)|index-(basic|anchor)|scoring-opic|urlnormalizer-(pass|regex|basic)</value> <description>Regular expression naming plugin directory names to include. Any plugin not matching this expression is excluded. In any case you need at least include the nutch-extensionpoints plugin. By default Nutch includes crawling just HTML and plain text via HTTP, and basic indexing and search plugins. In order to use HTTPS please enable protocol-httpclient, but be aware of possible intermittent problems with the underlying commons-httpclient library. </description> </property>
可以看到这是把插件include到nutch的配置,index-basic和index-anchor是其中两个插件
进入solr的/example/solr/conf/schema.xml中(即刚才拷贝的nutch的schema.xml)
同样所有index-,其中有两个段
<!-- fields for index-basic plugin --> <field name="host" type="string" stored="false" indexed="true"/> <field name="url" type="url" stored="true" indexed="true" required="true"/> <field name="content" type="text" stored="false" indexed="true"/> <field name="title" type="text" stored="true" indexed="true"/> <field name="cache" type="string" stored="true" indexed="false"/> <field name="tstamp" type="date" stored="true" indexed="false"/> <!-- fields for index-anchor plugin --> <field name="anchor" type="string" stored="true" indexed="true" multiValued="true"/>
field配置了这两个插件的字段,除了field之外,在上面还有id
2. 把solr的/example/solr/conf/solrconfig.xml中的所有的<str name="df">text</str>改成<str name="df">content</str>
默认搜索的字段应该是content
3. 启动solr。到example下运行start.jar: java -jar start.jar & (后台运行)
如果没有配置第2步,就会报错说找不到text
4. 打开浏览器访问localhost:8983
可以看到solr管理网页。solr内嵌了jetty服务器,因此能够用b/s方式管理solr
5. 回到nutch的local下输入 bin/nutch | grep solr
可以发现有三条命令:solrindex(建solr索引)、solrdedup(去重)、solrclean(去除301永久重定向、404网址)
6. 通过crawldb、linkdb和segments来把索引提交给http://localhost:8983/solr:
在local目录下输入bin/nutch solrindex http://localhost:8983/solr data/crawldb -linkdb data/linkdb -dir data/segments
输出信息中有:Indexing 3 documents
如果超过250个,就会indexing多次,这个可以在conf/nutch-default.xml或nutch-site.xml中配置solr.commit.size(default中有样例)
调高数量可以提高效率,但是更占内存
solr将会把索引保存在example/solr/data/index中
luku是lucene的索引工具箱,可以方便查看和搜索索引,便于调试
下载地址:http://code.google.com/p/luke/downloads/list
这里使用的是:lukeall-4.0.0-ALPHA.jar
将solr的example/solr/data/index目录拷贝到本地(这里我把index目录拷贝到windows桌面,luke的jar包也放在桌面)
双击jar即可运行luke。Luke会自动提示你指向索引文件夹。
如上图,有10个字段,在左下角的框中显示了schema.xml中的1个id,3个core fields,5个index-basic字段(不知道为什么少了一个cache字段),1个index-anchor字段
选中一个字段,再点show top term可以看到具体的分词
id字段是完整的、不分词的
点击Documents标签,可以通过docments数量来查看字段
注意点一下左上角的绿色左箭头(仅仅是为了让框里有内容从而显示字段信息),然后按绿色的右箭头:
在title字段中找一个text,比如2014。点击search标签在左上角的框中输入title:2014 ,再点search,可以搜索到索引。应该指出:可能某些title是搜不出来的,应该确保建索引时的分词器和搜索时的分词器一致!比如title=明星娱乐圈,而用Luke会把整个title给分成4个字再搜索,这样会导致搜索不出。之后会讲到设置分词器。
solr自带的分词器对中文分词不好,导致Luke搜索不到索引信息,因此使用mmseg4j
下载地址:https://code.google.com/p/mmseg4j/downloads/list
这里使用的是mmseg4j-1.8.5.zip
1. 把solr停下来。使用jps命令查看进程号, 然后输入kill -9 进程号 关掉solr
2. 删除solr的example/solr/data目录
3. 在solr的example/solr下新建lib文件夹
4. 把mmseg4j中的mmseg4j-all-1.8.5-with-dic.jar拷贝到solr的example/solr/lib中,让solr的服务器加载这个jar包
5. 修改solr的example/solr/conf/schema.xml:
把<tokenizer class="solr.WhitespaceTokenizerFactory"/>和<tokenizer class="solr.StandardTokenizerFactory"/>
替换成<tokenizer class="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory" mode="complex"/>
意思就是配置solr使用mmseg4j的类进行分词,默认的分词方法对中文分词效果不好
之后再打开solr服务器,把索引注入到solr中,然后用Luke打开索引文件夹,在使用搜索中文字段时就能找到了~
可以发现Luke中Number of terms明显变小,这是因为solr配置mmseg4j后把多个汉字分成1个词,原先可能是每个汉字一个词。
另外,可以在localhost:8983/solr/admin页面的Query String: 查询框中也可以搜索,搜索语法和Luke一样,都是"字段名:要查的值"
虽然现在已经配置了solr的分词器,但是Luke还没配置mmseg4j作为分词器,
在Luke中search "title:客户端" 时,会分成3个词(现在Luke还没指定mmseg4j为分词器),
而如果Luke配置了mmseg4j作为分词器时,会把“客户端”当成一个词
虽然luke可以搜索到索引了,solr和luke最好使用同一个分词器。
1.8.5版本跟luke4.0版本有冲突,所以luke使用1.9.1的mmseg4j,下载地址上面有。
这里使用:mmseg4j-1.9.1.v20130120-SNAPSHOT.zip
把mmseg4j-1.9.1的dist中的三个jar包解压出来,并把解压出来的data文件夹和com文件夹复制到Luke的jar中
打开Luke并指定index目录,点击search选项卡,看到右边有一个下拉框选择用来处理分词的类ComplexAnalyzer,在下拉框右边选默认字段为content
搜索:title:客户端
会发现Query Details框中分词就是”客户端”,如果是用原先没配置,会把“客户端”当成3个词
以下内容提到的solr目录均以
1. solr4.2的example/solr/中多了一个collection1文件夹。要把nutch的local/conf/schema-solr4.xml拷贝到solr4.2的example/solr/collection1的conf目录中并重命名为schema.xml
2. solr4.2不需要把schema.xml中的text换成content。怎么看应该修改成那个字段?打开shema.xml,拉到下边有这个标签:
<defaultSearchField>content</defaultSearchField>
solr4.2的这个标签就是text,所以不用改。而solr3.6.2的这个标签是content,所以得把所有的text改成content
3. 在schema.xml的fields标签中加入一个_version_标签,不然启动solr时会报错:
<field name="_version_" type="long" stored="true" indexed="true"/>
4. 启动solr,也是打开start.jar
5. 拷贝mmseg4j-1.9.1的jar。同样,也是拷贝jar就好了,把mmseg4j的dist中的jar拷贝到solr4.2的collection1的lib目录下,注意如果没有lib文件夹要先mkdir
6. 配置mmseg4j。
修改solr的example/solr/conf/schema.xml:
把<tokenizer class="solr.WhitespaceTokenizerFactory"/>和<tokenizer class="solr.StandardTokenizerFactory"/>
替换成<tokenizer class="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory" mode="complex"/>
注意别把除WhitespaceTokenizerFactory和StandardTokenizerFactory之外的tokenizer给改了
之后就可以提交索引到solr了
提交索引后,可以进入localhost:8983的core admin中查看索引数等信息:
在左下角有个Core Selector下拉框可以选collection1,然后点击下方的query,在右边的界面就能够查询索引了:
在windows系统中使用虚拟机来使用linux比较重量级,
我安装的Ubuntu虚拟机需要4G内存才能跑得不那么卡。
Cygwin相比之下比较轻量级,而且能够方便地在cygwin的环境中使用windows的东西。
如果java的目录有包含空格,那么运行 nutch crawl命令时就会出错。
比如使用cygwin时,首先把apache-nutch-1.6复制到cygwin目录下的home/Administrator(取决于你的操作系统用户名)中,
打开cygwin,进入nutch的bin目录下执行./nutch crawl命令,会提示你该目录不存在(如果你的java安装在c:/Program Files/中)。
解决方法:
把整个java目录拷贝到cygwin的home/Administrator目录下,
并设置JAVA_HOME为c:/cygwin/home/Administrator/Java/jdk1.6.0_21 就好了
如果机器上有多个jdk,那就为cygin设置NUTCH_JAVA_HOME。
注意cygwin中的环境变量是windows的目录。
hadoop的教程可以在这里找到:http://www.cnblogs.com/lanhj/p/3841709.html
这里用到的nutch保存在前面用ant编译生成的deploy文件夹,即nutch把job提交给hadoop执行的版本
在nutch的conf/nutch-site.xml中加入http.agent.name的键值对(前面有)就ok啦~
当然hadoop至少要运行自带的WordCount.java成功,并且配上HADOOP_HOME环境变量才行。
还是老样子,先在deploy中生成urls文件夹,在里面生成保存url的文档
然后执行crawl命令:
bin/nutch crawl urls -dir data -threads 50 -depth 2 -topN 1
是不是报出非法输入错误?
原因是:job是hadoop执行的,hadoop默认的目录是HDFS上的目录,因此我们需要把urls上传到HDFS上:
hadoop fs -put urls /user/xxx/ hadoop fs -ls /user/xxx
第一条命令是把urls上传到HDFS的user/xxx目录下(nutch的job要求inject的urls存放在 /user/用户名/ 下),第二条命令是看该目录下有哪些文件。
可以看到urls已经上传到HDFS了(在我的另一篇随笔中,由于我比较懒,暂时没写关于HDFS的概念、命令、api。之前做的ppt和代码都还在,有空再上传)
再次运行crawl命令,
命令执行到半的时候,可以打开http://localhost:50030(hadoop查看mapreduce和jobtracker的页面)
可以看见有Map task或Reduce task
等结束以后,再查看HDFS的/user/xxx目录,可以发现生成了data文件夹
嫌查看HDFS的文件命令麻烦,就打开localhost:80070,然后点击Browse the filesystem查看HDFS上的文件
这个网页只能查看目录以及文档,不能删除、上传、更新
localhost:50060可以查看tasktracker信息
hadoop也内嵌了jetty服务器,所以可以用网页的方式查看hadoop的情况
7月22日更新完毕