crawler_java应用集锦9:httpclient4.2.2的几个常用方法,登录之后访问页面问题,下载文件_设置代理
在工作中要用到android,然后进行网络请求的时候,打算使用httpClient。
总结一下httpClient的一些基本使用。
版本是4.2.2。
使用这个版本的过程中,百度很多,结果都是出现的org.apache.commons.httpclient.这个包名,而不是我这里的org.apache.http.client.HttpClient----------前者版本是 Commons HttpClient 3.x ,不是最新的版本HttpClient 4.×。
官网上面:
Commons HttpClient 3.x codeline is at the end of life. All users of Commons HttpClient 3.x are strongly encouraged to upgrade to HttpClient 4.1.
1.基本的get
- public void getUrl(String url, String encoding)
- throws ClientProtocolException, IOException {
- // 默认的client类。
- HttpClient client = new DefaultHttpClient();
- // 设置为get取连接的方式.
- HttpGet get = new HttpGet(url);
- // 得到返回的response.
- HttpResponse response = client.execute(get);
- // 得到返回的client里面的实体对象信息.
- HttpEntity entity = response.getEntity();
- if (entity != null) {
- System.out.println("内容编码是:" + entity.getContentEncoding());
- System.out.println("内容类型是:" + entity.getContentType());
- // 得到返回的主体内容.
- InputStream instream = entity.getContent();
- try {
- BufferedReader reader = new BufferedReader(
- new InputStreamReader(instream, encoding));
- System.out.println(reader.readLine());
- } catch (Exception e) {
- e.printStackTrace();
- } finally {
- instream.close();
- }
- }
- // 关闭连接.
- client.getConnectionManager().shutdown();
- }
2.基本的Post
下面的params参数,是在表单里面提交的参数。
- public void postUrlWithParams(String url, Map params, String encoding)
- throws Exception {
- DefaultHttpClient httpclient = new DefaultHttpClient();
- try {
- HttpPost httpost = new HttpPost(url);
- // 添加参数
- List<NameValuePair> nvps = new ArrayList<NameValuePair>();
- if (params != null && params.keySet().size() > 0) {
- Iterator iterator = params.entrySet().iterator();
- while (iterator.hasNext()) {
- Map.Entry entry = (Entry) iterator.next();
- nvps.add(new BasicNameValuePair((String) entry.getKey(),
- (String) entry.getValue()));
- }
- }
- httpost.setEntity(new UrlEncodedFormEntity(nvps, Consts.UTF_8));
- HttpResponse response = httpclient.execute(httpost);
- HttpEntity entity = response.getEntity();
- System.out.println("Login form get: " + response.getStatusLine()
- + entity.getContent());
- dump(entity, encoding);
- System.out.println("Post logon cookies:");
- List<Cookie> cookies = httpclient.getCookieStore().getCookies();
- if (cookies.isEmpty()) {
- System.out.println("None");
- } else {
- for (int i = 0; i < cookies.size(); i++) {
- System.out.println("- " + cookies.get(i).toString());
- }
- }
- } finally {
- // 关闭请求
- httpclient.getConnectionManager().shutdown();
- }
- }
3。打印页面输出的小代码片段
- private static void dump(HttpEntity entity, String encoding)
- throws IOException {
- BufferedReader br = new BufferedReader(new InputStreamReader(
- entity.getContent(), encoding));
- System.out.println(br.readLine());
- }
4.常见的登录session问题,需求:使用账户,密码登录系统之后,然后再访问页面不出错。
特别注意,下面的httpclient对象要使用一个,而不要在第二次访问的时候,重新new一个。至于如何保存这个第一步经过了验证的httpclient,有很多种方法实现。单例,系统全局变量(android 下面的Application),ThreadLocal变量等等。
以及下面创建的httpClient要使用ThreadSafeClientConnManager对象!
public String getSessionId(String url, Map params, String encoding,
- String url2) throws Exception {
- DefaultHttpClient httpclient = new DefaultHttpClient(
- new ThreadSafeClientConnManager());
- try {
- HttpPost httpost = new HttpPost(url);
- // 添加参数
- List<NameValuePair> nvps = new ArrayList<NameValuePair>();
- if (params != null && params.keySet().size() > 0) {
- Iterator iterator = params.entrySet().iterator();
- while (iterator.hasNext()) {
- Map.Entry entry = (Entry) iterator.next();
- nvps.add(new BasicNameValuePair((String) entry.getKey(),
- (String) entry.getValue()));
- }
- }
- // 设置请求的编码格式
- httpost.setEntity(new UrlEncodedFormEntity(nvps, Consts.UTF_8));
- // 登录一遍
- httpclient.execute(httpost);
- // 然后再第二次请求普通的url即可。
- httpost = new HttpPost(url2);
- BasicResponseHandler responseHandler = new BasicResponseHandler();
- System.out.println(httpclient.execute(httpost, responseHandler));
- } finally {
- // 关闭请求
- httpclient.getConnectionManager().shutdown();
- }
- return "";
- }
5.下载文件,例如mp3等等。
- //第一个参数,网络连接;第二个参数,保存到本地文件的地址
- public void getFile(String url, String fileName) {
- HttpClient httpClient = new DefaultHttpClient();
- HttpGet get = new HttpGet(url);
- try {
- ResponseHandler<byte[]> handler = new ResponseHandler<byte[]>() {
- public byte[] handleResponse(HttpResponse response)
- throws ClientProtocolException, IOException {
- HttpEntity entity = response.getEntity();
- if (entity != null) {
- return EntityUtils.toByteArray(entity);
- } else {
- return null;
- }
- }
- };
- byte[] charts = httpClient.execute(get, handler);
- FileOutputStream out = new FileOutputStream(fileName);
- out.write(charts);
- out.close();
- } catch (Exception e) {
- e.printStackTrace();
- } finally {
- httpClient.getConnectionManager().shutdown();
- }
- }
6.创建一个多线程环境下面可用的httpClient
(原文:http://blog.csdn.net/jiaoshi0531/article/details/6459468)
- HttpParams params = new BasicHttpParams();
- //设置允许链接的做多链接数目
- ConnManagerParams.setMaxTotalConnections(params, 200);
- //设置超时时间.
- ConnManagerParams.setTimeout(params, 10000);
- //设置每个路由的最多链接数量是20
- ConnPerRouteBean connPerRoute = new ConnPerRouteBean(20);
- //设置到指定主机的路由的最多数量是50
- HttpHost localhost = new HttpHost("127.0.0.1",80);
- connPerRoute.setMaxForRoute(new HttpRoute(localhost), 50);
- ConnManagerParams.setMaxConnectionsPerRoute(params, connPerRoute);
- //设置链接使用的版本
- HttpProtocolParams.setVersion(params, HttpVersion.HTTP_1_1);
- //设置链接使用的内容的编码
- HttpProtocolParams.setContentCharset(params,
- HTTP.DEFAULT_CONTENT_CHARSET);
- //是否希望可以继续使用.
- HttpProtocolParams.setUseExpectContinue(params, true);
- SchemeRegistry schemeRegistry = new SchemeRegistry();
- schemeRegistry.register(new Scheme("http",PlainSocketFactory.getSocketFactory(),80));
- schemeRegistry.register(new Scheme("https",SSLSocketFactory.getSocketFactory(),443));
- ClientConnectionManager cm = new ThreadSafeClientConnManager(params,schemeRegistry);
- httpClient = new DefaultHttpClient(cm, params);
7.实用的一个对象,http上下文,可以从这个对象里面取到一次请求相关的信息,例如request,response,代理主机等。
- public static void getUrl(String url, String encoding)
- throws ClientProtocolException, IOException {
- // 设置为get取连接的方式.
- HttpGet get = new HttpGet(url);
- HttpContext localContext = new BasicHttpContext();
- // 得到返回的response.第二个参数,是上下文,很好的一个参数!
- httpclient.execute(get, localContext);
- // 从上下文中得到HttpConnection对象
- HttpConnection con = (HttpConnection) localContext
- .getAttribute(ExecutionContext.HTTP_CONNECTION);
- System.out.println("socket超时时间:" + con.getSocketTimeout());
- // 从上下文中得到HttpHost对象
- HttpHost target = (HttpHost) localContext
- .getAttribute(ExecutionContext.HTTP_TARGET_HOST);
- System.out.println("最终请求的目标:" + target.getHostName() + ":"
- + target.getPort());
- // 从上下文中得到代理相关信息.
- HttpHost proxy = (HttpHost) localContext
- .getAttribute(ExecutionContext.HTTP_PROXY_HOST);
- if (proxy != null)
- System.out.println("代理主机的目标:" + proxy.getHostName() + ":"
- + proxy.getPort());
- System.out.println("是否发送完毕:"
- + localContext.getAttribute(ExecutionContext.HTTP_REQ_SENT));
- // 从上下文中得到HttpRequest对象
- HttpRequest request = (HttpRequest) localContext
- .getAttribute(ExecutionContext.HTTP_REQUEST);
- System.out.println("请求的版本:" + request.getProtocolVersion());
- Header[] headers = request.getAllHeaders();
- System.out.println("请求的头信息: ");
- for (Header h : headers) {
- System.out.println(h.getName() + "--" + h.getValue());
- }
- System.out.println("请求的链接:" + request.getRequestLine().getUri());
- // 从上下文中得到HttpResponse对象
- HttpResponse response = (HttpResponse) localContext
- .getAttribute(ExecutionContext.HTTP_RESPONSE);
- HttpEntity entity = response.getEntity();
- if (entity != null) {
- System.out.println("返回结果内容编码是:" + entity.getContentEncoding());
- System.out.println("返回结果内容类型是:" + entity.getContentType());
- dump(entity, encoding);
- }
- }
输出结果大致如下:
- socket超时时间:0
- 最终请求的目标:money.finance.sina.com.cn:-1
- 是否发送完毕:true
- 请求的版本:HTTP/1.1
- 请求的头信息:
- Host--money.finance.sina.com.cn
- Connection--Keep-Alive
- User-Agent--Apache-HttpClient/4.2.2 (java 1.5)
- 请求的链接:/corp/go.php/vFD_BalanceSheet/stockid/600031/ctrl/part/displaytype/4.phtml
- 返回结果内容编码是:null
- 返回结果内容类型是:Content-Type: text/html
8.设置代理
- //String hostIp代理主机ip,int port 代理端口
- htpHost proxy = new HttpHost(hostIp, port);
- // 设置代理主机.
- htpclient.getParams().setParameter(ConnRoutePNames.DEFAULT_PROXY,
- proxy);
9.设置保持链接时间
- //在服务端设置一个保持持久连接的特性.
- //HTTP服务器配置了会取消在一定时间内没有活动的链接,以节省系统的持久性链接资源.
- httpClient.setKeepAliveStrategy(new ConnectionKeepAliveStrategy() {
- public long getKeepAliveDuration(HttpResponse response,
- HttpContext context) {
- HeaderElementIterator it = new BasicHeaderElementIterator(
- response.headerIterator(HTTP.CONN_KEEP_ALIVE));
- while (it.hasNext()) {
- HeaderElement he = it.nextElement();
- String param = he.getName();
- String value = he.getValue();
- if (value != null && param.equalsIgnoreCase("timeout")) {
- try {
- return Long.parseLong(value) * 1000;
- } catch (Exception e) {
- }
- }
- }
- HttpHost target = (HttpHost)context.getAttribute(ExecutionContext.HTTP_TARGET_HOST);
- if("www.baidu.com".equalsIgnoreCase(target.getHostName())){
- return 5*1000;
- }
- else
- return 30*1000;
- }
- });