此方式staatus返回301,或者返回的是跳转后的页面结果,无法真正访问url的内容
因此需设置请求头
2、设置请求头的cookie
使用IEHttpHeaders工具在浏览器打入url,查看请求头信息,如下:
浏览器请求信息:
GET /products/prod.asp?pid=188801&catid=12943 HTTP/1.1
Accept: image/jpeg, application/x-ms-application, image/gif, application/xaml+xml, image/pjpeg, application/x-ms-xbap, application/x-shockwave-flash, application/msword, application/vnd.ms-excel, application/vnd.ms-powerpoint, **
Accept-Language: zh-CN
User-Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; QQPinyin 689; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729)
Accept-Encoding: gzip, deflate
Host: www.drugstore.com
Connection: Keep-Alive
Cookie: _br_uid_1=uid%3D6876780775167%3A; STICKY=SEAWEB004P:044086BC26EB4FAE88BFBC531C091EDD:5mwwt52kjx2imf55cou5ue55; drugstore%2Efish=UserID=728708E24FAA4229A58D61AA436133E3; s_vi=[CS]v1|25D1666085011D4F-60000109600D3FBA[CE]; foresee.repeatdays=90
代码中设置请求头:
只需根据浏览器发送的请求头,在代码中作相应设置即可
String url="http://www.drugstore.com/products/prod.asp?pid=266446&catid=119689&cmbProdBrandFilter=15867&mp=True&trx=GFI-0-MBS&trxp1=119689&trxp2=266446&trxp3=2&trxp4=2";
HttpClient client=new HttpClient();
//client.getParams().setCookiePolicy(CookiePolicy.BROWSER_COMPATIBILITY);
GetMethod getMethod = new GetMethod(url);
getMethod.setRequestHeader("Accept"," image/jpeg, application/x-ms-application, image/gif, application/xaml+xml, image/pjpeg, application/x-ms-xbap, application/x-shockwave-flash, application/msword, application/vnd.ms-excel, application/vnd.ms-powerpoint, */*");
// getMethod.setRequestHeader("Referer","http://kyxk.net/wForum/disparticle.php?boardName=LifeScience&ID=2114");
getMethod.setRequestHeader("Accept-Language","en-US");
// getMethod.setRequestHeader("Accept-Encoding"," gzip, deflate");
getMethod.setRequestHeader("If-Modified-Since","Thu, 29 Jul 2004 02:24:49 GMT");
getMethod.setRequestHeader("If-None-Match","'3014d-1d31-41085ff1'");
getMethod.setRequestHeader("User-Agent"," Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; QQPinyin 689; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729)");
getMethod.setRequestHeader("Host","www.drugstore.com");
getMethod.setRequestHeader("Connection"," Keep-Alive");
getMethod.setRequestHeader("Cookie","_br_uid_1=uid%3D6876780775167%3A; BIGipServerdscm_farm=1746184384.0.0000; STICKY=SEAWEB004P:044086BC26EB4FAE88BFBC531C091EDD:bdm5j355thye4445yvus2kis; drugstore%2Efish=UserID=728708E24FAA4229A58D61AA436133E3; s_vi=[CS]v1|25D1666085011D4F-60000109600D3FBA[CE]; foresee.repeatdays=90");
//getMethod.getParams().setParameter(HttpMethodParams.RETRY_HANDLER,
// new DefaultHttpMethodRetryHandler());
int status= client.executeMethod(getMethod);
以上设置的请求头都是在iehttpheaders看到的请求头
但需把setMethod.setRequestHeader("Accept-Encoding"," gzip, deflate")去掉,否则会乱码
另外setCookie是必须的,值为以上浏览器请求的信息
这样,得到的网页内容就是实际请求的网页内容了