HttpClient学习记录-系列1(tutorial)

1. HttpClient使用了facade模式,如何使用的?

2. HTTP protocol interceptors使用了Decorator模式,如何使用的?

 

 

URIBuilder

response.headerIterator("Set-Cookie");

HeaderElementIterator it = new BasicHeaderElementIterator(
    response.headerIterator("Set-Cookie"));


太长了不能直接使用 EntilyUtils.toString(). 否则会浪费大量的内存,正确的使用模式为:
CloseableHttpClient httpclient = HttpClients.createDefault();
HttpGet httpget = new HttpGet("http://localhost/");
CloseableHttpResponse response = httpclient.execute(httpget);
try {
    HttpEntity entity = response.getEntity();
    if (entity != null) {
        long len = entity.getContentLength();
        if (len != -1 && len < 2048) {
            System.out.println(EntityUtils.toString(entity));
        } else {
            // Stream content out
        }
    }
} finally {
    response.close();
}


如果想对一个response多次的read可以将entity封装到BufferedHttpEntity,它会将数据读出来缓存到内存里


对response拦截处理:使用ResponseHandler接口

HttpClient是线程安全对象,建议全局唯一

HttpClientContext localContext = HttpClientContext.create(); httpclient.execute(httpget2, context); HttpClientContext实现多个请求数据共享,已经配置传播。

HttpRequestRetryHandler 实现自定义请求重试,注意: 对于自封闭的entity可以重试,非自封闭(流)不可以重试。

redirect是HttpClient自动做的,返回转发以后的结果

 

第二章: 连接池管理

HttpClientConnectionManager connMrg = new BasicHttpClientConnectionManager();
ConnectionRequest connRequest = connMrg.requestConnection(route, null);
HttpClientConnection conn = connRequest.get(10, TimeUnit.SECONDS);

BasicHttpClientConnectionManager只会维护一个Connection,连续的同一个route的请求会复用Connection。不同的route请求会先把当前的连接关掉,再打开一个新的连接。 只能单线程使用

PoolingHttpClientConnectionManager HttpClient默认的连接管理。适用于多线程环境。 每个route默认创建2个并发连接,总共不超过20个连接
  PoolingHttpClientConnectionManager cm = new PoolingHttpClientConnectionManager();
cm.setMaxTotal(200);
cm.setDefaultMaxPerRoute(20);
HttpHost localhost = new HttpHost("locahost", 80);
cm.setMaxPerRoute(new HttpRoute(localhost), 50);
   http.conn-manager.timeout 设置获取请求的超时时间
墙裂建议每个线程维持其自己的HttpContext

由于HttpClient默认的失效连接检测方法并不能完全避免已经失效的连接无法归还到连接池中,所以最好的方法是自己开一个监控线程,定时的去执行Factory的closeExpiredConnections或者closeIdleConnections方法。
public static class IdleConnectionMonitorThread extends Thread {
    
    private final HttpClientConnectionManager connMgr;
    private volatile boolean shutdown;
    
    public IdleConnectionMonitorThread(HttpClientConnectionManager connMgr) {
        super();
        this.connMgr = connMgr;
    }

    @Override
    public void run() {
        try {
            while (!shutdown) {
                synchronized (this) {
                    wait(5000);
                    // Close expired connections
                    connMgr.closeExpiredConnections();
                    // Optionally, close connections
                    // that have been idle longer than 30 sec
                    connMgr.closeIdleConnections(30, TimeUnit.SECONDS);
                }
            }
        } catch (InterruptedException ex) {
            // terminate
        }
    }
    
    public void shutdown() {
        shutdown = true;
        synchronized (this) {
            notifyAll();
        }
    }
    
}

对于自定义某个请求特定的连接持有时间使用ConnectionKeepAliveStrategy。


Http连接底层是使用java.net.Socket来完成网络传输。HttpClient直接使用ConnectionSocketFactory来获取Socket。默认使用PlainConnectionSocketFactory。
HttpClient使用SSLSocketFactory(LayeredConnectionSocketFactory的实现类)创建安全的Socket连接。

可以把SocketFactory和ConnectionFactory组合起来。示例:
ConnectionSocketFactory plainsf = <...>
LayeredConnectionSocketFactory sslsf = <...>
Registry<ConnectionSocketFactory> r = RegistryBuilder.<ConnectionSocketFactory>create()
        .register("http", plainsf)
        .register("https", sslsf)
        .build();

HttpClientConnectionManager cm = new PoolingHttpClientConnectionManager(r);
HttpClients.custom()
        .setConnectionManager(cm)
        .build();
对于不同的协议使用不同的SocketFactory

主机名验证DefaultHostnameVerifier、NoopHostnameVerifier() 不知道主机验证是个什么鬼东西。后面再研究

HttpClient自定义代理配置:
 
HttpRoutePlanner重写该类。不知道自定义代理是什么意思。

第三章: Http状态管理
Cookie接口: 抽象的Cookie
SetCookie接口: 表示Set-Cookie头信息
ClientCookie: 理解为客户端使用Cookie对象的抽象

Cookie有很多标准:
Standard strict
Standard
Netscape draft (obsolete)
RFC 2965 (obsolete)
RFC 2109 (obsolete)
Browser compatibility (obsolete)
Default
Ignore cookies
建议使用Standard和Standard strict。 这些标准对应于
CookieSpecs

自定义Cookie标准:
PublicSuffixMatcher publicSuffixMatcher = PublicSuffixMatcherLoader.getDefault();

Registry<CookieSpecProvider> r = RegistryBuilder.<CookieSpecProvider>create()
        .register(CookieSpecs.DEFAULT,
                new DefaultCookieSpecProvider(publicSuffixMatcher))
        .register(CookieSpecs.STANDARD,
                new RFC6265CookieSpecProvider(publicSuffixMatcher))
        .register("easy", new EasySpecProvider())
        .build();

RequestConfig requestConfig = RequestConfig.custom()
        .setCookieSpec("easy")
        .build();

CloseableHttpClient httpclient = HttpClients.custom()
        .setDefaultCookieSpecRegistry(r)
        .setDefaultRequestConfig(requestConfig)
        .build();

使用CookieStore实现类似于浏览器Cookie缓存的功能。

可以对每个线程或者每个用户单独设置一个HttpClientContext来实现Http的状态管理。

第四章: HTTP权限验证

UsernamePasswordCredentials定义了基本的username/password的用户证书(适用于标准的HTTP权限验证)

NTCredentials适用于windows系统验证

AuthScheme抽象了 面向挑战-应答的权限校验模型,以下是HTTPClient支持的AuthScheme:
  Basic: 只需要用户名、密码的验证。 用户名密码明文传输。将该authSScheme和SSL/TLS结合使用可以适应很多场景
  Digest: 对用户名、密码做签名,安全性高于Basic
  NTLM: windows的那一套,不喜欢windows,不想了解
  Kerberos: 一种双向验证的协议,我理解是基于业务层的协议(高于HTTP协议),实现了安全的权限认证服务。
  SPNEGO: 这个好像有两个版本,一个版本是Kerberos的扩展版,用于基于Web服务的权限认证协议。 另一个版本是专门给Windows用的

CredentialsProvider: 用户证书生成器,需要用这个类把用户证书和执行上下文结合起来,才能供给HttpClient上层接口使用。 默认实现为: BasicCredentialsProvider
  
CredentialsProvider credsProvider = new BasicCredentialsProvider();
credsProvider.setCredentials(
    new AuthScope("somehost", AuthScope.ANY_PORT), 
    new UsernamePasswordCredentials("u1", "p1"));

通过将权限验证的信息结合到执行上下文中(HttpClientContext),可以实现状态查询,以及抢先验证的功能:
CloseableHttpClient httpclient = <...>

CredentialsProvider credsProvider = <...>
Lookup<AuthSchemeProvider> authRegistry = <...>
AuthCache authCache = <...>

HttpClientContext context = HttpClientContext.create();
context.setCredentialsProvider(credsProvider);
context.setAuthSchemeRegistry(authRegistry);
context.setAuthCache(authCache);
HttpGet httpget = new HttpGet("http://somehost/");
CloseableHttpResponse response1 = httpclient.execute(httpget, context);
<...>

AuthState proxyAuthState = context.getProxyAuthState();
System.out.println("Proxy auth state: " + proxyAuthState.getState());
System.out.println("Proxy auth scheme: " + proxyAuthState.getAuthScheme());
System.out.println("Proxy auth credentials: " + proxyAuthState.getCredentials());
AuthState targetAuthState = context.getTargetAuthState();
System.out.println("Target auth state: " + targetAuthState.getState());
System.out.println("Target auth scheme: " + targetAuthState.getAuthScheme());
System.out.println("Target auth credentials: " + targetAuthState.getCredentials());


SPNego Web服务版执行流程:
  1. Client Web Browser does HTTP GET for resource.

  2. Web server returns HTTP 401 status and a header: WWW-Authenticate: Negotiate

  3. Client generates a NegTokenInit, base64 encodes it, and resubmits the GET with an Authorization header:Authorization: Negotiate <base64 encoding>.

  4. Server decodes the NegTokenInit, extracts the supportedMechTypes (only Kerberos V5 in our case), ensures it is one of the expected ones, and then extracts the MechToken(Kerberos Token) and authenticates it.

    If more processing is required another HTTP 401 is returned to the client with more data in the the WWW-Authenticateheader. Client takes the info and generates another token passing this back in the Authorization header until complete.

  5. When the client has been authenticated the Web server should return the HTTP 200 status, a final WWW-Authenticateheader and the page content.

第五章:Fluent API
Fluent适用场景为简单的请求,可以使用流式的代码,避免了用户处理连接相关和资源释放的问题。但是流式调用的基础是Fluent API底层缓存了response到内存中,所以对于response很大的情况可能会消耗
很大的内存,可以通过自定义ResponseHandler来实现更灵活的response处理。

第六章: Http缓存
HttpClient提供兼容Http1.1的缓存机制
当一个缓存版HttpClient执行请求时候执行以下步骤:
1. 检查请求是否符合Http1.1协议
2. 刷新所有与该请求相关的缓存
3. 判断请求是否是可以缓存的,如果不能缓存则请求服务器,并在合适的时候缓存response
4. 如果是命中缓存,则从缓存中读取数据,构建成BasicHttpResponse返回。
5. 如果缓存的结果无法重新验证,则重新请求服务器获取结果

缓存版HttpClient解析response过程:
检查response协议是否兼容
检查response是否是可缓存的
如果是可以缓存的,则读取response,并将数据缓存下来
如果太大无法缓存,则直接返回
使用方法:
CacheConfig cacheConfig = CacheConfig.custom()
        .setMaxCacheEntries(1000)
        .setMaxObjectSize(8192)
        .build();
RequestConfig requestConfig = RequestConfig.custom()
        .setConnectTimeout(30000)
        .setSocketTimeout(30000)
        .build();
CloseableHttpClient cachingClient = CachingHttpClients.custom()
        .setCacheConfig(cacheConfig)
        .setDefaultRequestConfig(requestConfig)
        .build();

HttpCacheContext context = HttpCacheContext.create();
HttpGet httpget = new HttpGet("http://www.mydomain.com/content/");
CloseableHttpResponse response = cachingClient.execute(httpget, context);
try {
    CacheResponseStatus responseStatus = context.getCacheResponseStatus();
    switch (responseStatus) {
        case CACHE_HIT:
            System.out.println("A response was generated from the cache with " +
                    "no requests sent upstream");
            break;
        case CACHE_MODULE_RESPONSE:
            System.out.println("The response was generated directly by the " +
                    "caching module");
            break;
        case CACHE_MISS:
            System.out.println("The response came from an upstream server");
            break;
        case VALIDATED:
            System.out.println("The response was generated from the cache " +
                    "after validating the entry with the origin server");
            break;
    }
} finally {
    response.close();
}
缓存相关的参数可以通过CacheConfig配置。

默认的缓存介质是放到JVM的内存中,支持的第三方组建EhCache、memcached。支持循环的写入磁盘。

如果都不满足你的需求可以自己实现HttpCacheStorage来实现自己的后端存储方案。

可以实现类似计算机的多级缓存来提升效率

第7章: 高级主题
自定义Http Request格式化(LineFormatter)和response解析(LineParser)
HttpConnectionFactory<HttpRoute, ManagedHttpClientConnection> connFactory =
        new ManagedHttpClientConnectionFactory(
            new DefaultHttpRequestWriterFactory(),
            new DefaultHttpResponseParserFactory(
                    new MyLineParser(), new DefaultHttpResponseFactory()));

UserTokenHandler(不知道这个的应用场景是什么,官网也没有给什么示例)

FutureRequestExecutionService类似于ExecutorService,将HttpClient(需要支持多线程)和ExecutorService传递进去构建FutureRequestExecutionService,HttpClient支持的线程数和ExecutorService的线程数最好相同,如果HttpClient支持的线程数小于ExecutorService,则该Service无法启动,如果HttpClient的线程数大于ExecutorService,则浪费了部分Connection。

HttpClient httpClient = HttpClientBuilder.create().setMaxConnPerRoute(5).build();
ExecutorService executorService = Executors.newFixedThreadPool(5);
FutureRequestExecutionService futureRequestExecutionService =
    new FutureRequestExecutionService(httpClient, executorService);

调度的时候还需要HttpGet、HttpClientContext和ResponseHandler三个参数。 HttpGet是必须的,如果没有它就不知道执行什么请求,HttpClientContext我感觉是为了做一些缓存、统计之类的东西。 通过ResponseHandler定制Task返回的数据。

private final class OkidokiHandler implements ResponseHandler<Boolean> {
    public Boolean handleResponse(
            final HttpResponse response) throws ClientProtocolException, IOException {
        return response.getStatusLine().getStatusCode() == 200;
    }
}

HttpRequestFutureTask<Boolean> task = futureRequestExecutionService.execute(
    new HttpGet("http://www.google.com"), HttpClientContext.create(),
    new OkidokiHandler());
// blocks until the request complete and then returns true if you can connect to Google
boolean ok=task.get();

以上是通过task.get获取执行结果,还可以通过回调类来实现结果处理:
private final class MyCallback implements FutureCallback<Boolean> {

    public void failed(final Exception ex) {
        // do something
    }

    public void completed(final Boolean result) {
        // do something
    }

    public void cancelled() {
        // do something
    }
}

HttpRequestFutureTask<Boolean> task = futureRequestExecutionService.execute(
    new HttpGet("http://www.google.com"), HttpClientContext.create(),
    new OkidokiHandler(), new MyCallback());

任务执行的一些统计信息可以通过Metrics查看:
task.scheduledTime() // returns the timestamp the task was scheduled
task.startedTime() // returns the timestamp when the task was started
task.endedTime() // returns the timestamp when the task was done executing
task.requestDuration // returns the duration of the http request
task.taskDuration // returns the duration of the task from the moment it was scheduled

FutureRequestExecutionMetrics metrics = futureRequestExecutionService.metrics()
metrics.getActiveConnectionCount() // currently active connections
metrics.getScheduledConnectionCount(); // currently scheduled connections
metrics.getSuccessfulConnectionCount(); // total number of successful requests
metrics.getSuccessfulConnectionAverageDuration(); // average request duration
metrics.getFailedConnectionCount(); // total number of failed tasks
metrics.getFailedConnectionAverageDuration(); // average duration of failed tasks
metrics.getTaskCount(); // total number of tasks scheduled
metrics.getRequestCount(); // total number of requests
metrics.getRequestAverageDuration(); // average request duration
metrics.getTaskAverageDuration(); // average task duration
posted @ 2018-05-30 16:43  Birding  阅读(460)  评论(0编辑  收藏  举报