webmagic框架protocol_version问题处理
异常描述:
1 javax.net.ssl.SSLException: Received fatal alert: protocol_version 2 at sun.security.ssl.Alerts.getSSLException(Alerts.java:208) 3 at sun.security.ssl.Alerts.getSSLException(Alerts.java:154) 4 at sun.security.ssl.SSLSocketImpl.recvAlert(SSLSocketImpl.java:2020) 5 at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1127) 6 at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1367) 7 at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1395) 8 at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1379) 9 at org.apache.http.conn.ssl.SSLConnectionSocketFactory.createLayeredSocket(SSLConnectionSocketFactory.java:394) 10 at org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:353) 11 at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:141) 12 at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:353) 13 at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:380) 14 at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236) 15 at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184) 16 at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:88) 17 at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110) 18 at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184) 19 at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82) 20 at us.codecraft.webmagic.downloader.HttpClientDownloader.download(HttpClientDownloader.java:85) 21 at us.codecraft.webmagic.Spider.processRequest(Spider.java:404) 22 at us.codecraft.webmagic.Spider.access$000(Spider.java:61) 23 at us.codecraft.webmagic.Spider$1.run(Spider.java:320) 24 at us.codecraft.webmagic.thread.CountableThreadPool$1.run(CountableThreadPool.java:74) 25 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 26 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 27 at java.lang.Thread.run(Thread.java:748)
原因:webmagic默认的HttpClient只会用TLSv1去请求,对于某些只支持TLS1.2的站点(例如https://juejin.im/) ,就会报错;
解决方法:https://github.com/code4craft/webmagic/issues/701
try { SSLContext sslContext = SSLContext.getDefault(); SSLConnectionSocketFactory sslConnectionFactory = new SSLConnectionSocketFactory( sslContext, new String[]{"TLSv1.2"}, null, NoopHostnameVerifier.INSTANCE); Registry<ConnectionSocketFactory> registry = RegistryBuilder.<ConnectionSocketFactory>create() .register("https", sslConnectionFactory) .register("http", PlainConnectionSocketFactory.INSTANCE) .build(); HttpClientConnectionManager ccm = new BasicHttpClientConnectionManager(registry); CloseableHttpClient httpClient = HttpClientBuilder.create() .setSSLSocketFactory(sslConnectionFactory) .setConnectionManager(ccm).build(); HttpGet httpGet = new HttpGet(url); CloseableHttpResponse httpResponse = httpClient.execute(httpGet); StatusLine statusLine = httpResponse.getStatusLine(); if(statusLine.getStatusCode() == 200) { HttpEntity httpEntity = httpResponse.getEntity(); String result = EntityUtils.toString(httpEntity, "utf-8"); } } catch (NoSuchAlgorithmException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); }