java识别验证码
所需资源下载链接(资源免费,重在分享)
Tesseract:http://download.csdn.net/detail/chenyangqi/9190667
jai_imageio-1.1-alpha,swingx-1.0:http://download.csdn.net/detail/chenyangqi/9190683
HttpWatch Professional:http://download.csdn.net/detail/chenyangqi/9208339
项目简介:
我们学校使用的是校园锐捷客户端上网,本人因长时间不用密码,早已忘记密码,只记得是六位纯数字,又不想去服务中心换密码,就想用Http模拟一下锐捷网页版实现登陆,遍历一下000000-999999的密码。也顺便学习一下。废话不说了,直接进入主题,登陆页面如下中规中矩,用户名密码验证码:
要实现模拟登陆难点有两个,一个是Http请求中Cookie的管理,一个是验证码识别,改篇文章讲解如何识别验证码,至于模拟登陆请看本人后期的博文(博文连接:http://www.cnblogs.com/chenyangqi/p/4906376.html)
1:下载安装Tesseract
本人已提供下载链接(文章开头处提供下载链接),下载好安装即可,压缩包内还提供了汉语语言包的,Tesseract是一款支持汉语识别的OCR,这也是我选择他的原因,至于安装和验证安装是否成功,可自行百度,我就不废话了。
我安装的位置为:D:\Program Files (x86)\Tesseract-OCR,该目录下的结构如下(其中tessdata就是存放语言包的位置)
二:验证码获取
下载HttpWatch(文章开头处提供下载链接),安装,并在IE中使用,对登陆页面进行抓包,找到验证码的URL,如下图(HttpWatch安装使用方法,自行百度);
获得URL后键验证码图片下载到本地。比如:D://verifycode.jpg
下载图片到本地代码如下:
private static void getImage(String name_get, String password_get) { GetMethod get = new GetMethod( "你的验证码URL"); try { client.executeMethod(get); File storeFile = new File("D:/verifycode.jpg"); //保存在本地的路径 FileOutputStream output = new FileOutputStream(storeFile); InputStream is = get.getResponseBodyAsStream(); FileOutputStream fos = new FileOutputStream(storeFile); byte[] b = new byte[1024]; while ((is.read(b)) != -1) { fos.write(b); } is.close(); fos.close(); } catch (IOException e) { e.printStackTrace(); } }
三:Java实现验证码识别
代码如下,eclipse新建一个java项目,引入jai_imageio-1.1-alpha.jar,swingx-1.0.jar这两个包(文章开头处提供下载链接),导入到项目中。OK,准备写代码吧,共两个类ORC.class ImageIOhelper.class(只需修改一下你安装Tesseract的路径,可直接引入你的项目使用)。
OCR.class代码如下:recognizeText(File imageFile, String imageFormat)方法的参数,就是上一步下载的验证码图片在本地的位置。D://verifycode.jpg
package com.cyq.request; import java.io.BufferedReader; import java.io.File; import java.io.FileInputStream; import java.io.InputStreamReader; import java.util.ArrayList; import java.util.List; import org.jdesktop.swingx.util.OS; public class OCR { private final String LANG_OPTION = "-l"; // 英文字母小写l,并非数字1 private final String EOL = System.getProperty("line.separator"); private String tessPath = "D://Program Files (x86)//Tesseract-OCR";//Tesseract安装路径 public String recognizeText(File imageFile, String imageFormat) throws Exception { File tempImage = ImageIOHelper.createImage(imageFile, imageFormat); File outputFile = new File(imageFile.getParentFile(), "output"); StringBuffer strB = new StringBuffer(); List<String> cmd = new ArrayList<String>(); if (OS.isWindowsXP()) { cmd.add(tessPath + "//tesseract"); } else if (OS.isLinux()) { cmd.add("tesseract"); } else { cmd.add(tessPath + "//tesseract"); } cmd.add(""); cmd.add(outputFile.getName()); ProcessBuilder pb = new ProcessBuilder(); pb.directory(imageFile.getParentFile()); cmd.set(1, tempImage.getName()); pb.command(cmd); pb.redirectErrorStream(true); Process process = pb.start(); int w = process.waitFor(); // 删除临时正在工作文件 tempImage.delete(); if (w == 0) { BufferedReader in = new BufferedReader(new InputStreamReader( new FileInputStream(outputFile.getAbsolutePath() + ".txt"), "UTF-8")); String str; while ((str = in.readLine()) != null) { strB.append(str).append(EOL); } in.close(); } else { String msg; switch (w) { case 1: msg = "Errors accessing files.There may be spaces in your image's filename."; break; case 29: msg = "Cannot recongnize the image or its selected region."; break; case 31: msg = "Unsupported image format."; break; default: msg = "Errors occurred."; } tempImage.delete(); } new File(outputFile.getAbsolutePath() + ".txt").delete(); return strB.toString(); } }
ImageIOhelper.class代码如下:
package com.cyq.request; import java.awt.image.BufferedImage; import java.io.File; import java.io.IOException; import java.util.Iterator; import java.util.Locale; import javax.imageio.IIOImage; import javax.imageio.ImageIO; import javax.imageio.ImageReader; import javax.imageio.ImageWriteParam; import javax.imageio.ImageWriter; import javax.imageio.metadata.IIOMetadata; import javax.imageio.stream.ImageInputStream; import javax.imageio.stream.ImageOutputStream; import com.sun.media.imageio.plugins.tiff.TIFFImageWriteParam; public class ImageIOHelper { /** * 图片文件转换为tif格式 * * @param imageFile * 文件路径 * @param imageFormat * 文件扩展名 * @return */ public static File createImage(File imageFile, String imageFormat) { File tempFile = null; try { Iterator<ImageReader> readers = ImageIO .getImageReadersByFormatName(imageFormat); ImageReader reader = readers.next(); ImageInputStream iis = ImageIO.createImageInputStream(imageFile); reader.setInput(iis); IIOMetadata streamMetadata = reader.getStreamMetadata(); TIFFImageWriteParam tiffWriteParam = new TIFFImageWriteParam( Locale.CHINESE); tiffWriteParam.setCompressionMode(ImageWriteParam.MODE_DISABLED); Iterator<ImageWriter> writers = ImageIO .getImageWritersByFormatName("tiff"); ImageWriter writer = writers.next(); BufferedImage bi = reader.read(0); IIOImage image = new IIOImage(bi, null, reader.getImageMetadata(0)); tempFile = tempImageFile(imageFile); ImageOutputStream ios = ImageIO.createImageOutputStream(tempFile); writer.setOutput(ios); writer.write(streamMetadata, image, tiffWriteParam); ios.close(); writer.dispose(); reader.dispose(); } catch (IOException e) { e.printStackTrace(); } return tempFile; } private static File tempImageFile(File imageFile) { String path = imageFile.getPath(); StringBuffer strB = new StringBuffer(path); strB.insert(path.lastIndexOf('.'), 0); return new File(strB.toString().replaceFirst("(?<=//.)(//w+)$", "tif")); } }
Main方法中调用就OK了:
private static String getCode() { String valCode = null; String path = "d://verifycode.jpg"; try { valCode = new OCR().recognizeText(new File(path), "jpg");
System.out.println("验证码为:"+valCode) } catch (IOException e) { e.printStackTrace(); } catch (Exception e) { e.printStackTrace(); } return valCode; }
好啦,验证码识别over,至于整个Http登陆的请看我后一篇博文(http://www.cnblogs.com/chenyangqi/p/4906376.html)。
声明:该博文为博主原创,转载请注明出处
本程序模拟仅用于学习,请勿使用该内容从事违法活动和暴力破解活动