日志解析(二) 多线程http请求

发起http请求后,检查返回的数据是否含有特征码,几百w的数据跑了一天后也没跑完,尝试了下ruby多线程,发现并不能提高运行速度,果断换JAVA来写,ruby代码贴下:

#coding:gbk
require 'rubygems'
require "net/http"
require "uri"
require 'zlib'

    threads = []
    urls = IO.readlines "test.txt"
    pFile = File.open("errortest.txt","a")
    urls.each do|page|
      #url="http://www.baidu.com"
      begin
        threads << Thread.new(page) do|url|
          uri = URI.parse("#{url}")
          res = Net::HTTP.get_response(uri)

          data=res.body
          mes=res.code
          if res["content-encoding"]=="gzip"
            body_io=StringIO.new(data)
            data=Zlib::GzipReader.new(body_io).read
          end
          #puts data
          data=data.force_encoding("ASCII-8BIT")

          if mes=="200" or mes=="404" or mes=="500" or mes=="502"
            unless data.include?("_dctc._account") and data.include?("UA-") and data.include?("count/load.min.js")
              #puts url
              pFile.puts mes+","+url
            end
          end
        end
        threads.each { |t|t.join  }
      rescue
        puts url
      end
    end
    pFile.close

posted @ 2015-03-05 17:46  雨蝶  阅读(351)  评论(0编辑  收藏  举报