HttpClient程序包是一个实现了 HTTP协议的客户端编程工具包,要想熟练的掌握它,必须熟悉 HTTP协议。对于HTTP协议来说,无非就是用户请求数据,服务器端响应用户请求,并将内容结果返回给用户。HTTP1.1由以下几种请求组成:GET,HEAD, POST, PUT, DELETE, TRACE ,OPTIONS,因此对应到HttpClient程序包中分别用HttpGet,HttpHead, HttpPost, HttpPut, HttpDelete, HttpTrace, HttpOptions 这几个类来创建请求。所有的这些类均实现了HttpUriRequest接口,故可以作为execute的执行参数使用。
l HTTP请求
当然在所有请求中最常用的还是GET与POST两种请求,创建请求的方式如下:
HttpUriRequest request = newHttpPost("http://localhost/index.html");
HttpUriRequest request = newHttpGet(“http://127.0.0.1:8080/index.html”);
HTTP请求格式告诉我们,有两种方式可以为request提供参数:request-line方式与request-body方式。
Ø request-line方式是指在请求行上通过URI直接提供参数。
(1)可以在生成request对象时提供带参数的URI,如:
HttpUriRequest request = newHttpGet("http://localhost/index.html?param1=value1¶m2=value2");
(2)HttpClient程序包还提供了URIUtils工具类,可以通过它生成带参数的URI,如:
URI uri =URIUtils.createURI("http", "localhost", -1,"/index.html",
"param1=value1¶m2=value2", null);
HttpUriRequest request = newHttpGet(uri);
System.out.println(request.getURI());
上例的实例结果如下:
http://localhost/index.html?param1=value1¶m2=value2
(3)需要注意的是,如果参数中含有中文,需将参数进行URLEncoding处理,如:
String param ="param1=" + URLEncoder.encode("中国", "UTF-8") +"¶m2=value2";
URI uri =URIUtils.createURI("http", "localhost", 8080,"/sshsky/index.html",param, null);
System.out.println(uri);
上例的实例结果如下:
http://localhost/index.html?param1=%E4%B8%AD%E5%9B%BD¶m2=value2
(4)对于参数的URLEncoding处理,HttpClient程序包为我们准备了另一个工具类:URLEncodedUtils。通过它,我们可以直观的(但是比较复杂)生成URI,如:
01 | List params = newArrayList(); |
03 | params.add(newBasicNameValuePair( "param1" , "中国" )); |
05 | params.add(newBasicNameValuePair( "param2" , "value2" )); |
07 | String param =URLEncodedUtils.format(params, "UTF-8" ); |
09 | URI uri =URIUtils.createURI( "http" , "localhost" , 8080 , "/sshsky/index.html" ,param, null ); |
11 | System.out.println(uri); |
上例的实例结果如下:
http://localhost/index.html?param1=%E4%B8%AD%E5%9B%BD¶m2=value2
Ø request-body方式是指在请求的request-body中提供参数
与 request-line方式不同,request-body方式是在request-body中提供参数,此方式只能用于进行POST请求。在HttpClient程序包中有两个类可以完成此项工作,它们分别是UrlEncodedFormEntity类与MultipartEntity类。这 两个类均实现了HttpEntity接口。
(1)UrlEncodedFormEntity类,故名思意该类主要用于form表单提交。通过该类创建的对象可以模拟传统的HTML表单传送POST请求中的参数。如下面的表单:
1 | < formaction = "http://localhost/index.html" method = "POST" > |
3 | < inputtype = "text" name = "param1" value = "中国" /> |
5 | < inputtype = "text" name = "param2" value = "value2" /> |
7 | < inupttype = "submit" value = "submit" /> |
即可以通过下面的代码实现:
01 | List formParams = newArrayList(); |
03 | formParams.add(newBasicNameValuePair( "param1" , "中国" )); |
05 | formParams.add(newBasicNameValuePair( "param2" , "value2" )); |
07 | HttpEntity entity = newUrlEncodedFormEntity(formParams, "UTF-8" ); |
09 | HttpPost request = newHttpPost(“http: //localhost/index.html”); |
11 | request.setEntity(entity); |
当然,如果想查看HTTP数据格式,可以通过HttpEntity对象的各种方法取得。如:
01 | List formParams = newArrayList(); |
03 | formParams.add(newBasicNameValuePair( "param1" , "中国" )); |
05 | formParams.add(newBasicNameValuePair( "param2" , "value2" )); |
07 | UrlEncodedFormEntity entity = new UrlEncodedFormEntity(formParams, "UTF-8" ); |
09 | System.out.println(entity.getContentType()); |
11 | System.out.println(entity.getContentLength()); |
13 | System.out.println(EntityUtils.getContentCharSet(entity)); |
15 | System.out.println(EntityUtils.toString(entity)); |
上例的实例结果如下:
Content-Type: application/x-www-form-urlencoded; charset=UTF-8
39
UTF-8
param1=%E4%B8%AD%E5%9B%BD¶m2=value2
(2)除了传统的application/x-www-form-urlencoded表单,还有另一个经常用到的是上传文件用的表单,这种表单的类型为 multipart/form-data。在HttpClient程序扩展包(HttpMime)中专门有一个类与之对应,那就是MultipartEntity类。此类同样实现了HttpEntity接口。如下面的表单:
01 | < formaction = "http://localhost/index.html" method = "POST" |
03 | enctype = "multipart/form-data" > |
05 | < inputtype = "text" name = "param1" value = "中国" /> |
07 | < inputtype = "text" name = "param2" value = "value2" /> |
09 | < inputtype = "file" name = "param3" /> |
11 | < inupttype = "submit" value = "submit" /> |
可以用下面的代码实现:
01 | MultipartEntity entity = newMultipartEntity(); |
03 | entity.addPart( "param1" , new StringBody( "中国" , Charset.forName( "UTF-8" ))); |
05 | entity.addPart( "param2" , new StringBody( "value2" , Charset.forName( "UTF-8" ))); |
07 | entity.addPart( "param3" , new FileBody( new File( "C:\\1.txt" ))); |
09 | HttpPost request = newHttpPost(“http: //localhost/index.html”); |
11 | request.setEntity(entity); |
l HTTP响应
HttpClient 程序包对于HTTP响应的处理较请求来说简单多了,其过程同样使用了HttpEntity接口。我们可以从HttpEntity对象中取出数据流(InputStream),该数据流就是服务器返回的响应数据。需要注意的是,HttpClient程序包不负责 解析数据流中的内容。如:
01 | HttpUriRequest request = ...; |
03 | HttpResponse response =httpClient.execute(request); |
05 | // 从response中取出HttpEntity对象 |
07 | HttpEntity entity =response.getEntity(); |
11 | System.out.println(entity.getContentType()); |
13 | System.out.println(entity.getContentLength()); |
15 | System.out.println(EntityUtils.getContentCharSet(entity)); |
19 | InputStream stream =entity.getContent(); |
或者采用如下的接口方式httpClient.execute(request,new ResponseHandler<T> response)进行调用,它的返回值直接对应的即为用户自己想获取的数据的类型及值。
具体实例解析,通过下述方法,即可获取到指定url的页面内容。
01 | public static String executeStringByGet(String url, final Charset charset) { |
05 | HttpClient client = new DefaultHttpClient(); |
07 | HttpGet get = new HttpGet(url); |
13 | result = client.execute(get, new ResponseHandler<String>() { |
17 | public String handleResponse(HttpResponse response) throws ClientProtocolException, IOException { |
19 | HttpEntity entity = response.getEntity(); |
23 | if (response.getStatusLine().getStatusCode() == HttpStatus.SC_OK) { |
25 | return new String(EntityUtils.toByteArray(entity), charset.getValue()); |
37 | } catch (Exception e) { |
HttpClient接口的详细使用:
001 | package com.wow.common.test; |
005 | import java.io.IOException; |
007 | import java.util.regex.Matcher; |
009 | import java.util.regex.Pattern; |
013 | import org.apache.http.Header; |
015 | import org.apache.http.HttpEntity; |
017 | import org.apache.http.HttpResponse; |
019 | import org.apache.http.HttpStatus; |
021 | import org.apache.http.client.ClientProtocolException; |
023 | import org.apache.http.client.HttpClient; |
025 | import org.apache.http.client.methods.HttpGet; |
027 | import org.apache.http.impl.client.DefaultHttpClient; |
029 | import org.apache.http.util.EntityUtils; |
035 | * 类HttpClientTest.java的实现描述:TODO 类实现描述 |
037 | * @author zheng.zhaoz 2012-2-9 下午07:33:18 |
041 | public class HttpClientTest { |
045 | public static void main(String[] args) { |
047 | HttpClient httpClient = new DefaultHttpClient(); |
051 | HttpGet httpGet = new HttpGet( "http://www.cnblogs.com/loveyakamoz/archive/2011/07/21/2113252.html" ); |
057 | httpGet.setHeader( "Accept" , "Accept text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8" ); |
059 | httpGet.setHeader( "Accept-Charset" , "GB2312,utf-8;q=0.7,*;q=0.7" ); |
061 | httpGet.setHeader( "Accept-Encoding" , "gzip, deflate" ); |
063 | httpGet.setHeader( "Accept-Language" , "zh-cn,zh;q=0.5" ); |
065 | httpGet.setHeader( "Connection" , "keep-alive" ); |
067 | httpGet.setHeader( "Cookie" , "__utma=226521935.73826752.1323672782.1325068020.1328770420.6;" ); |
069 | httpGet.setHeader( "Host" , "www.cnblogs.com" ); |
071 | httpGet.setHeader( "refer" , "http://www.baidu.com/s?tn=monline_5_dg&bs=httpclient4+MultiThreadedHttpConnectionManager" ); |
073 | httpGet.setHeader( "User-Agent" , "Mozilla/5.0 (Windows NT 6.1; rv:6.0.2) Gecko/20100101 Firefox/6.0.2" ); |
075 | System.out.println( "Accept-Charset: " + httpGet.getFirstHeader( "Accept-Charset" )); |
077 | System.out.println( "Execute request: " + httpGet.getURI()); |
081 | HttpResponse response = null ; |
085 | response = httpClient.execute(httpGet); |
087 | } catch (ClientProtocolException e) { |
091 | } catch (IOException e) { |
101 | if (response != null ) { |
103 | Header headers[] = response.getAllHeaders(); |
107 | while (i < headers.length) { |
109 | System.out.println(headers[i].getName() + ": " + headers[i].getValue()); |
115 | if (response.getStatusLine().getStatusCode() == HttpStatus.SC_OK) { |
119 | HttpEntity entity = response.getEntity(); |
121 | // 将源码流保存在一个byte数组当中,因为可能需要两次用到该流 |
123 | byte [] bytes = EntityUtils.toByteArray(entity); |
127 | // 如果头部Content-Type中包含了编码信息,那么我们可以直接在此处获取 |
129 | charSet = EntityUtils.getContentCharSet(entity); |
131 | System.out.println( "In header: " + charSet); |
133 | // 如果头部中没有,需要 查看页面源码,这个方法虽然不能说完全正确,因为有些粗糙的网页编码者没有在页面中写头部编码信息 |
137 | String regEx= "(?=<meta).*?(?<=charset=[\\'|\\\"]?)([[a-z]|[A-Z]|[0-9]|-]*)" ; |
139 | Pattern p=Pattern.compile(regEx, Pattern.CASE_INSENSITIVE); |
141 | Matcher m=p.matcher( new String(bytes)); // 默认编码转成字符串,因为我们的匹配中无中文,所以串中可能的乱码对我们没有影响 |
143 | boolean result = m.find(); |
145 | if (m.groupCount() == 1 ) { |
147 | charSet = m.group( 1 ); |
157 | System.out.println( "Last get: " + charSet); |
159 | // 可以将原byte数组按照正常编码专成字符串输出(如果找到了编码的话) |
161 | System.out.println( "Encoding string is: " + new String(bytes, charSet)); |
163 | } catch (IOException e) { |
175 | httpClient.getConnectionManager().shutdown(); |