HTTPCLIENT 模拟登陆
第一步构建忽略https验证的httpclient
public static CloseableHttpClient getHttpClient() throws Exception { SSLConnectionSocketFactory sslsf = null; PoolingHttpClientConnectionManager cm = null; SSLContextBuilder builder = null; builder = new SSLContextBuilder(); //全部信任 不做身份鉴定 builder.loadTrustMaterial(null, new TrustStrategy() { @Override public boolean isTrusted(X509Certificate[] x509Certificates, String s) throws CertificateException { return true; } }); sslsf = new SSLConnectionSocketFactory(builder.build(), new String[]{"SSLv2Hello", "SSLv3", "TLSv1", "TLSv1.2"}, null, NoopHostnameVerifier.INSTANCE); CloseableHttpClient httpClient = HttpClients.custom() .setSSLSocketFactory(sslsf) .setConnectionManager(cm) .setConnectionManagerShared(true) .setDefaultCookieStore(cookieStore) .build(); return httpClient; }
第二步:第一次访问获取cookie;
public static void getCookie() throws Exception{ HttpClient httpClient = getHttpClient(); // HTTP请求 HttpUriRequest request = new HttpGet("http://888.by3322.com:8088/Login"); setHeader(request); HttpResponse response = httpClient.execute(request); HttpEntity entity = response.getEntity(); }
第三步:为避免目标网站的过滤,添加请求头相关信息
public static void setHeader(HttpUriRequest request){ request.setHeader("User-Agent","Mozilla/5.0 (Windows NT 10.0; WOW64; rv:56.0) Gecko/20100101 Firefox/56.0"); request.setHeader("Accept","text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"); request.setHeader("Accept-Encoding","gzip, deflate"); request.setHeader("Accept-Language","zh-CN,zh;q=0.8,en-US;q=0.5,en;q=0.3"); request.setHeader("Cache-Control", "max-age=0"); request.setHeader("Connection","keep-alive"); request.setHeader("Host","888.by3322.com:8088"); request.setHeader("Upgrade-Insecure-Requests","1"); }
第四步:下载验证码图片;
public static String getAuthNum()throws Exception{
String authnum = "";
HttpGet httpGet = new HttpGet("http://888.by3322.com:8088/Login/authnum");
setHeader(httpGet);
httpGet.setHeader("Referer","http://888.by3322.com:8088/Login");
HttpResponse response = getHttpClient().execute(httpGet);
if(response.getStatusLine().getStatusCode() == 200){
//下载图片
String filepath = "E:"+ File.separator+"authNum.png";
OutputStream outputStream = new FileOutputStream(filepath);
response.getEntity().writeTo(outputStream);
EntityUtils.consume(response.getEntity());
//识别图片上的字母数字
authnum = ImageUtil.readImgText(new File(filepath));
}
return authnum;
}
第五步:
public static String readImgText(File file){ String result = ""; ITesseract instance = new Tesseract(); File tessDataFolder = LoadLibs.extractTessResources("tessdata"); instance.setLanguage("eng");//英文库识别数字比较准确 instance.setDatapath(tessDataFolder.getAbsolutePath()); try { result = instance.doOCR(file); System.out.println(result); } catch (TesseractException e) { System.err.println(e.getMessage()); } return result; }
备注:
在httpclient 请求url时,可能遇到norespones 之类没有响应,而用浏览器有相应的问题,问题可能是请求头的问题,或者是refener的参数设置
识别图片文字采用tess4j ;其中遇到的问题:找不到指定的模块,主要原因是在Windows环境下,gsdll64.dll,liblept170.dll,libtesseract304.dll等三个文件是通过vc2013编译的,所以需要相应地依赖库函数;这个地址:https://www.microsoft.com/zh-cn/download/default.aspx 下载,安装;