本文继续分析Heritrix3.1.0系统的源码,其实本人感觉接下来待分析的问题不是一两篇文章能够澄清,本人不能因为迫于表述而乱了问题本身的章法,接下来的分析的Heritrix3.1.0系统封装HttpClient组件可能要分几篇文章来解析
我们知道,Heritrix3.1.0系统是通过封装HttpClient组件(里面封装了Socket)来与服务器通信的,Socket的输出流写入数据,输入流接收数据
那么Heritrix3.1.0系统是怎样封装Httpclient(Heritrix3.1.0系统是采用的以前的Apache版本)组件的呢?
我们可以看到,在FetchHTTP处理器里面有一段静态代码块,用于注册Socket工厂,分别用于HTTP通信与HTTPS通信协议(基于TCP协议通信,至于两者的关系本文就不再分析了,不懂的读者可以参考网络通信方面的教程)
/** * 注册http和https协议 */ static { Protocol.registerProtocol("http", new Protocol("http", new HeritrixProtocolSocketFactory(), 80)); try { ProtocolSocketFactory psf = new HeritrixSSLProtocolSocketFactory(); Protocol p = new Protocol("https", psf, 443); Protocol.registerProtocol("https", p); } catch (KeyManagementException e) { e.printStackTrace(); } catch (KeyStoreException e) { e.printStackTrace(); } catch (NoSuchAlgorithmException e) { e.printStackTrace(); } }
上面的两个类HeritrixProtocolSocketFactory和HeritrixSSLProtocolSocketFactory都实现了HttpClient组件的ProtocolSocketFactory接口,用于创建客户端Socket对象(HeritrixSSLProtocolSocketFactory类间接实现了ProtocolSocketFactory接口)
ProtocolSocketFactory接口定义了创建SOCKET对象的方法(package org.apache.commons.httpclient.protocol)
/** * A factory for creating Sockets. * * <p>Both {@link java.lang.Object#equals(java.lang.Object) Object.equals()} and * {@link java.lang.Object#hashCode() Object.hashCode()} should be overridden appropriately. * Protocol socket factories are used to uniquely identify <code>Protocol</code>s and * <code>HostConfiguration</code>s, and <code>equals()</code> and <code>hashCode()</code> are * required for the correct operation of some connection managers.</p> * * @see Protocol * * @author Michael Becke * @author <a href="mailto:mbowler@GargoyleSoftware.com">Mike Bowler</a> * * @since 2.0 */ public interface ProtocolSocketFactory { /** * Gets a new socket connection to the given host. * * @param host the host name/IP * @param port the port on the host * @param localAddress the local host name/IP to bind the socket to * @param localPort the port on the local machine * * @return Socket a new socket * * @throws IOException if an I/O error occurs while creating the socket * @throws UnknownHostException if the IP address of the host cannot be * determined */ Socket createSocket( String host, int port, InetAddress localAddress, int localPort ) throws IOException, UnknownHostException; /** * Gets a new socket connection to the given host. * * @param host the host name/IP * @param port the port on the host * @param localAddress the local host name/IP to bind the socket to * @param localPort the port on the local machine * @param params {@link HttpConnectionParams Http connection parameters} * * @return Socket a new socket * * @throws IOException if an I/O error occurs while creating the socket * @throws UnknownHostException if the IP address of the host cannot be * determined * @throws ConnectTimeoutException if socket cannot be connected within the * given time limit * * @since 3.0 */ Socket createSocket( String host, int port, InetAddress localAddress, int localPort, HttpConnectionParams params ) throws IOException, UnknownHostException, ConnectTimeoutException; /** * Gets a new socket connection to the given host. * * @param host the host name/IP * @param port the port on the host * * @return Socket a new socket * * @throws IOException if an I/O error occurs while creating the socket * @throws UnknownHostException if the IP address of the host cannot be * determined */ Socket createSocket( String host, int port ) throws IOException, UnknownHostException; }
HeritrixProtocolSocketFactory类实现了上面的ProtocolSocketFactory接口(用于HTTP通信)
public class HeritrixProtocolSocketFactory implements ProtocolSocketFactory { /** * Constructor. */ public HeritrixProtocolSocketFactory() { super(); } @Override public Socket createSocket(String host, int port, InetAddress localAddress, int localPort) throws IOException, UnknownHostException { // TODO Auto-generated method stub return new Socket(host, port, localAddress, localPort); } @Override public Socket createSocket(String host, int port, InetAddress localAddress, int localPort, HttpConnectionParams params) throws IOException, UnknownHostException, ConnectTimeoutException { // TODO Auto-generated method stub // Below code is from the DefaultSSLProtocolSocketFactory#createSocket // method only it has workarounds to deal with pre-1.4 JVMs. I've // cut these out. if (params == null) { throw new IllegalArgumentException("Parameters may not be null"); } Socket socket = null; int timeout = params.getConnectionTimeout(); if (timeout == 0) { socket = createSocket(host, port, localAddress, localPort); } else { socket = new Socket(); InetAddress hostAddress; Thread current = Thread.currentThread(); if (current instanceof HostResolver) { HostResolver resolver = (HostResolver)current; hostAddress = resolver.resolve(host); } else { hostAddress = null; } InetSocketAddress address = (hostAddress != null)? new InetSocketAddress(hostAddress, port): new InetSocketAddress(host, port); socket.bind(new InetSocketAddress(localAddress, localPort)); try { socket.connect(address, timeout); } catch (SocketTimeoutException e) { // Add timeout info. to the exception. throw new SocketTimeoutException(e.getMessage() + ": timeout set at " + Integer.toString(timeout) + "ms."); } assert socket.isConnected(): "Socket not connected " + host; } return socket; } @Override public Socket createSocket(String host, int port) throws IOException, UnknownHostException { // TODO Auto-generated method stub return new Socket(host, port); } /** * All instances of DefaultProtocolSocketFactory are the same. * @param obj Object to compare. * @return True if equal */ public boolean equals(Object obj) { return ((obj != null) && obj.getClass().equals(HeritrixProtocolSocketFactory.class)); } /** * All instances of DefaultProtocolSocketFactory have the same hash code. * @return Hash code for this object. */ public int hashCode() { return HeritrixProtocolSocketFactory.class.hashCode(); } }
HeritrixSSLProtocolSocketFactory类通过SecureProtocolSocketFactory实现SecureProtocolSocketFactory接口(间接实现了ProtocolSocketFactory接口)用于HTTPS通信
SecureProtocolSocketFactory接口方法如下
/** * A ProtocolSocketFactory that is secure. * * @see org.apache.commons.httpclient.protocol.ProtocolSocketFactory * * @author Michael Becke * @author <a href="mailto:mbowler@GargoyleSoftware.com">Mike Bowler</a> * @since 2.0 */ public interface SecureProtocolSocketFactory extends ProtocolSocketFactory { /** * Returns a socket connected to the given host that is layered over an * existing socket. Used primarily for creating secure sockets through * proxies. * * @param socket the existing socket * @param host the host name/IP * @param port the port on the host * @param autoClose a flag for closing the underling socket when the created * socket is closed * * @return Socket a new socket * * @throws IOException if an I/O error occurs while creating the socket * @throws UnknownHostException if the IP address of the host cannot be * determined */ Socket createSocket( Socket socket, String host, int port, boolean autoClose ) throws IOException, UnknownHostException; }
HeritrixSSLProtocolSocketFactory类实现上面的SecureProtocolSocketFactory接口
/** * Implementation of the commons-httpclient SSLProtocolSocketFactory so we * can return SSLSockets whose trust manager is * {@link org.archive.httpclient.ConfigurableX509TrustManager}. * * We also go to the heritrix cache to get IPs to use making connection. * To this, we have dependency on {@link HeritrixProtocolSocketFactory}; * its assumed this class and it are used together. * See {@link HeritrixProtocolSocketFactory#getHostAddress(ServerCache,String)}. * * @author stack * @version $Id: HeritrixSSLProtocolSocketFactory.java 6637 2009-11-10 21:03:27Z gojomo $ * @see org.archive.httpclient.ConfigurableX509TrustManager */ public class HeritrixSSLProtocolSocketFactory implements SecureProtocolSocketFactory { // static final String SERVER_CACHE_KEY = "heritrix.server.cache"; static final String SSL_FACTORY_KEY = "heritrix.ssl.factory"; /*** * Socket factory with default trust manager installed. */ private SSLSocketFactory sslDefaultFactory = null; /** * Shutdown constructor. * @throws KeyManagementException * @throws KeyStoreException * @throws NoSuchAlgorithmException */ public HeritrixSSLProtocolSocketFactory() throws KeyManagementException, KeyStoreException, NoSuchAlgorithmException{ // Get an SSL context and initialize it. SSLContext context = SSLContext.getInstance("SSL"); // I tried to get the default KeyManagers but doesn't work unless you // point at a physical keystore. Passing null seems to do the right // thing so we'll go w/ that. context.init(null, new TrustManager[] { new ConfigurableX509TrustManager( ConfigurableX509TrustManager.DEFAULT)}, null); this.sslDefaultFactory = context.getSocketFactory(); } @Override public Socket createSocket(String host, int port, InetAddress clientHost, int clientPort) throws IOException, UnknownHostException { return this.sslDefaultFactory.createSocket(host, port, clientHost, clientPort); } @Override public Socket createSocket(String host, int port) throws IOException, UnknownHostException { return this.sslDefaultFactory.createSocket(host, port); } @Override public synchronized Socket createSocket(String host, int port, InetAddress localAddress, int localPort, HttpConnectionParams params) throws IOException, UnknownHostException { // Below code is from the DefaultSSLProtocolSocketFactory#createSocket // method only it has workarounds to deal with pre-1.4 JVMs. I've // cut these out. if (params == null) { throw new IllegalArgumentException("Parameters may not be null"); } Socket socket = null; int timeout = params.getConnectionTimeout(); if (timeout == 0) { socket = createSocket(host, port, localAddress, localPort); } else { SSLSocketFactory factory = (SSLSocketFactory)params. getParameter(SSL_FACTORY_KEY);//SSL_FACTORY_KEY SSLSocketFactory f = (factory != null)? factory: this.sslDefaultFactory; socket = f.createSocket(); Thread current = Thread.currentThread(); InetAddress hostAddress; if (current instanceof HostResolver) { HostResolver resolver = (HostResolver)current; hostAddress = resolver.resolve(host); } else { hostAddress = null; } InetSocketAddress address = (hostAddress != null)? new InetSocketAddress(hostAddress, port): new InetSocketAddress(host, port); socket.bind(new InetSocketAddress(localAddress, localPort)); try { socket.connect(address, timeout); } catch (SocketTimeoutException e) { // Add timeout info. to the exception. throw new SocketTimeoutException(e.getMessage() + ": timeout set at " + Integer.toString(timeout) + "ms."); } assert socket.isConnected(): "Socket not connected " + host; } return socket; } @Override public Socket createSocket(Socket socket, String host, int port, boolean autoClose) throws IOException, UnknownHostException { return this.sslDefaultFactory.createSocket(socket, host, port, autoClose); } public boolean equals(Object obj) { return ((obj != null) && obj.getClass(). equals(HeritrixSSLProtocolSocketFactory.class)); } public int hashCode() { return HeritrixSSLProtocolSocketFactory.class.hashCode(); } }
HTTPS通信的SOCKET对象是通过SSLSocketFactory sslDefaultFactory(SSLSocket工厂)对象创建的,为了创建SSLSocketFactory sslDefaultFactory对象
Heritrix3.1.0系统定义了X509TrustManager接口的实现类ConfigurableX509TrustManager(用于SSL通信,自动接收证书)
/** * A configurable trust manager built on X509TrustManager. * * If set to 'open' trust, the default, will get us into sites for whom we do * not have the CA or any of intermediary CAs that go to make up the cert chain * of trust. Will also get us past selfsigned and expired certs. 'loose' * trust will get us into sites w/ valid certs even if they are just * selfsigned. 'normal' is any valid cert not including selfsigned. 'strict' * means cert must be valid and the cert DN must match server name. * * <p>Based on pointers in * <a href="http://jakarta.apache.org/commons/httpclient/sslguide.html">SSL * Guide</a>, * and readings done in <a * href="http://java.sun.com/j2se/1.4.2/docs/guide/security/jsse/JSSERefGuide.html#Introduction">JSSE * Guide</a>. * * <p>TODO: Move to an ssl subpackage when we have other classes other than * just this one. * * @author stack * @version $Id: ConfigurableX509TrustManager.java 6637 2009-11-10 21:03:27Z gojomo $ */ public class ConfigurableX509TrustManager implements X509TrustManager { /** * Logging instance. */ protected static Logger logger = Logger.getLogger( "org.archive.httpclient.ConfigurableX509TrustManager"); public static enum TrustLevel { /** * Trust anything given us. * * Default setting. * * <p>See <a href="http://javaalmanac.com/egs/javax.net.ssl/TrustAll.html"> * e502. Disabling Certificate Validation in an HTTPS Connection</a> from * the java almanac for how to trust all. */ OPEN, /** * Trust any valid cert including self-signed certificates. */ LOOSE, /** * Normal jsse behavior. * * Seemingly any certificate that supplies valid chain of trust. */ NORMAL, /** * Strict trust. * * Ensure server has same name as cert DN. */ STRICT, } /** * Default setting for trust level. */ public final static TrustLevel DEFAULT = TrustLevel.OPEN; /** * Trust level. */ private TrustLevel trustLevel = DEFAULT; /** * An instance of the SUNX509TrustManager that we adapt variously * depending upon passed configuration. * * We have it do all the work we don't want to. */ private X509TrustManager standardTrustManager = null; public ConfigurableX509TrustManager() throws NoSuchAlgorithmException, KeyStoreException { this(DEFAULT); } /** * Constructor. * * @param level Level of trust to effect. * * @throws NoSuchAlgorithmException * @throws KeyStoreException */ public ConfigurableX509TrustManager(TrustLevel level) throws NoSuchAlgorithmException, KeyStoreException { super(); TrustManagerFactory factory = TrustManagerFactory. getInstance(TrustManagerFactory.getDefaultAlgorithm()); // Pass in a null (Trust) KeyStore. Null says use the 'default' // 'trust' keystore (KeyStore class is used to hold keys and to hold // 'trusts' (certs)). See 'X509TrustManager Interface' in this doc: // http://java.sun.com // /j2se/1.4.2/docs/guide/security/jsse/JSSERefGuide.html#Introduction factory.init((KeyStore)null); TrustManager[] trustmanagers = factory.getTrustManagers(); if (trustmanagers.length == 0) { throw new NoSuchAlgorithmException(TrustManagerFactory. getDefaultAlgorithm() + " trust manager not supported"); } this.standardTrustManager = (X509TrustManager)trustmanagers[0]; this.trustLevel = level; } @Override public void checkClientTrusted(X509Certificate[] certificates, String type) throws CertificateException { if (this.trustLevel.equals(TrustLevel.OPEN)) { return; } this.standardTrustManager.checkClientTrusted(certificates, type); } @Override public void checkServerTrusted(X509Certificate[] certificates, String type) throws CertificateException { if (this.trustLevel.equals(TrustLevel.OPEN)) { return; } try { this.standardTrustManager.checkServerTrusted(certificates, type); if (this.trustLevel.equals(TrustLevel.STRICT)) { logger.severe(TrustLevel.STRICT + " not implemented."); } } catch (CertificateException e) { if (this.trustLevel.equals(TrustLevel.LOOSE) && certificates != null && certificates.length == 1) { // If only one cert and its valid and it caused a // CertificateException, assume its selfsigned. X509Certificate certificate = certificates[0]; certificate.checkValidity(); } else { // If we got to here, then we're probably NORMAL. Rethrow. throw e; } } } @Override public X509Certificate[] getAcceptedIssuers() { return this.standardTrustManager.getAcceptedIssuers(); } }
---------------------------------------------------------------------------
本系列Heritrix 3.1.0 源码解析系本人原创
转载请注明出处 博客园 刺猬的温驯
本文链接 http://www.cnblogs.com/chenying99/archive/2013/04/25/3042207.html