heartstill

博客园 首页 新随笔 联系 订阅 管理

httpclient访问带cookie限制的网页

(2010-05-15 13:01:39)
 1、未设置请求头的cookie
 String url="http://www.drugstore.com/products/prod.asp? 
 HttpClient client=new HttpClient();  
 GetMethod getMethod = new GetMethod(url);

  int status= client.executeMethod(getMethod); 
 此方式staatus返回301,或者返回的是跳转后的页面结果,无法真正访问url的内容
因此需设置请求头
2、设置请求头的cookie
使用IEHttpHeaders工具在浏览器打入url,查看请求头信息,如下:
浏览器请求信息:
 GET /products/prod.asp?pid=188801&catid=12943 HTTP/1.1
Accept: image/jpeg, application/x-ms-application, image/gif, application/xaml+xml, image/pjpeg, application/x-ms-xbap, application/x-shockwave-flash, application/msword, application/vnd.ms-excel, application/vnd.ms-powerpoint, **
Accept-Language: zh-CN
User-Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; QQPinyin 689; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729)
Accept-Encoding: gzip, deflate
Host: www.drugstore.com
Connection: Keep-Alive
Cookie: _br_uid_1=uid%3D6876780775167%3A; STICKY=SEAWEB004P:044086BC26EB4FAE88BFBC531C091EDD:5mwwt52kjx2imf55cou5ue55; drugstore%2Efish=UserID=728708E24FAA4229A58D61AA436133E3; s_vi=[CS]v1|25D1666085011D4F-60000109600D3FBA[CE]; foresee.repeatdays=90

代码中设置请求头:
只需根据浏览器发送的请求头,在代码中作相应设置即可
String url="http://www.drugstore.com/products/prod.asp?pid=266446&catid=119689&cmbProdBrandFilter=15867&mp=True&trx=GFI-0-MBS&trxp1=119689&trxp2=266446&trxp3=2&trxp4=2";
HttpClient client=new HttpClient();
//client.getParams().setCookiePolicy(CookiePolicy.BROWSER_COMPATIBILITY);

   GetMethod getMethod = new GetMethod(url);
    
     getMethod.setRequestHeader("Accept"," image/jpeg, application/x-ms-application, image/gif, application/xaml+xml, image/pjpeg, application/x-ms-xbap, application/x-shockwave-flash, application/msword, application/vnd.ms-excel, application/vnd.ms-powerpoint, */*");
    //   getMethod.setRequestHeader("Referer","http://kyxk.net/wForum/disparticle.php?boardName=LifeScience&ID=2114");
     getMethod.setRequestHeader("Accept-Language","en-US");
     //  getMethod.setRequestHeader("Accept-Encoding"," gzip, deflate");
      getMethod.setRequestHeader("If-Modified-Since","Thu, 29 Jul 2004 02:24:49 GMT");
      getMethod.setRequestHeader("If-None-Match","'3014d-1d31-41085ff1'");
      getMethod.setRequestHeader("User-Agent"," Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; QQPinyin 689; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729)");
      getMethod.setRequestHeader("Host","www.drugstore.com");
      getMethod.setRequestHeader("Connection"," Keep-Alive");
      getMethod.setRequestHeader("Cookie","_br_uid_1=uid%3D6876780775167%3A; BIGipServerdscm_farm=1746184384.0.0000; STICKY=SEAWEB004P:044086BC26EB4FAE88BFBC531C091EDD:bdm5j355thye4445yvus2kis; drugstore%2Efish=UserID=728708E24FAA4229A58D61AA436133E3; s_vi=[CS]v1|25D1666085011D4F-60000109600D3FBA[CE]; foresee.repeatdays=90");
  
     //getMethod.getParams().setParameter(HttpMethodParams.RETRY_HANDLER,
       //                    new DefaultHttpMethodRetryHandler());
 
int status= client.executeMethod(getMethod);

 
以上设置的请求头都是在iehttpheaders看到的请求头
但需把setMethod.setRequestHeader("Accept-Encoding"," gzip, deflate")去掉,否则会乱码
另外setCookie是必须的,值为以上浏览器请求的信息
这样,得到的网页内容就是实际请求的网页内容了

posted on 2010-12-20 15:24  开始测试  阅读(1944)  评论(0编辑  收藏  举报