HttpClient(一)-- HelloWorld
一、简介
HttpClient 是Apache Jakarta Common 下的子项目,可以用来提供高效的、最新的、功能丰富的支持 HTTP 协议的客户端编程工具包,并且它支持 HTTP 协议最新的版本和建议。详细介绍,此处基于4.5.2版本。maven依赖:
<dependency> <groupId>org.apache.httpcomponents</groupId> <artifactId>httpclient</artifactId> <version>4.5.2</version> </dependency>
二、HelloWorld实现
package com.xsjt.chap01; import java.io.IOException; import org.apache.http.HttpEntity; import org.apache.http.client.ClientProtocolException; import org.apache.http.client.methods.CloseableHttpResponse; import org.apache.http.client.methods.HttpGet; import org.apache.http.impl.client.CloseableHttpClient; import org.apache.http.impl.client.HttpClients; import org.apache.http.util.EntityUtils; public class HelloWorld { /** * 抓取网页信息使用 get请求 * @param args * @throws IOException * @throws ClientProtocolException */ public static void main(String[] args) throws ClientProtocolException, IOException { // 创建httpClient实例 CloseableHttpClient httpClient = HttpClients.createDefault(); // 创建httpGet实例 HttpGet httpGet = new HttpGet("http://www.cnblogs.com"); // http://www.tuicool.com/ CloseableHttpResponse response = httpClient.execute(httpGet); if(response != null){ HttpEntity entity = response.getEntity(); // 获取网页内容 String result = EntityUtils.toString(entity, "UTF-8"); System.out.println("网页内容:" + result); } if(response != null){ response.close(); } if(httpClient != null){ httpClient.close(); } } }
上述代码中可以直接获取到 网页内容,有的获取到的内容是 中文乱码的,这就需要根据 网页的编码 来设置编码了,比如gb2312。
三、爬虫教程
https://www.kancloud.cn/johnnylee/crawler/
四、HttpClient学习地址