爬虫笔记之teambition登录验证码
一、缘起
想做的事情太多,计划乱糟糟,想找个工具理一下,想起来了的很久之前用过teambition,打算看一下,然后在登录界面看到一个比较有意思的验证码:
这种倒是比较有意思哈,看着像是模仿12306的那种,12306的破不了(我真人都要刷几次才能对。。。),这个简单版的还破不了吗,于是激发了我强烈的破解兴趣。
二、分析
打开开发者工具,先选中看一下先:
首先比较雷的是“地球”竟然是文本显示在页面上的,这就比较尴尬了,不过其实这个无所谓,即使是图片也没关系,这里的重点是要每次返回的都有所区分(区分度越大越安全,否则使用使用一些基于统计的方式很容易就能够破掉),否则的话会被以比较低的成本作为一个标识,然后就是那几张图片的显示,里面有个uid,然后还有个index,那么这两个变量是从哪里来的呢,点击刷新按钮,然后观察网络请求会发现有几个:
这个uid和value的数组下标一拼装就是页面上显示的图标的url,至此看起来没啥毛病。
然后就是考虑如何破解的问题了,我看这几个图标画的如此清新脱俗,应该是手工画的,既然是手工画的,那么其数量应该是有限的,最多几百个吧,那么完全可以采用打标签的方式来,但是打标签的话几百个也是太多了,而且只是手动打标签识别这种平平无奇的做法,也不值一提了,这有一种无须手动打标签的方式,就是上面的接口中,“地球”所对应的图片一定在下面的values数组中,而我只需要对这个接口多请求几次,然后对它们按照imageName分组,比如“地球”这个分组会对应着很多个values,每个values中都有一张图片是真的“地球”,哪张是呢,所有的values的交集就是,这样进行一个group by imageName --> mapGroup求分组内values交集 --> 得到一个imageName对应的图片的特征,这个就作为模型,识别的时候只需要根据imageName取出模型中对应的图片特征,然后破解时从新请求返回的values找到哪张图片的特征是能够对应上的,就实现了从imageName到图片的识别。
三、编码实现
首先请求获取验证码的接口,得到一批图片:
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 | package cc11001100.misc.crawler.captcha.teambition; import cc11001100.misc.crawler.utils.HttpUtil; import com.alibaba.fastjson.JSON; import com.alibaba.fastjson.JSONArray; import com.alibaba.fastjson.JSONObject; import org.apache.commons.io.FileUtils; import org.apache.commons.io.IOUtils; import org.jsoup.Connection; import java.io.File; import java.io.FileOutputStream; import java.io.IOException; /** * * 下载一些原始的验证码图片以用作分析 * * https://account.teambition.com/login * * @author CC11001100 */ public class TeambitionCaptchaCrawler { public static void handleSingleCaptcha(String captchaResponseJsonStr) throws IOException { JSONObject responseJson = JSON.parseObject(captchaResponseJsonStr); String uid = responseJson.getString( "uid" ); JSONArray values = responseJson.getJSONArray( "values" ); for ( int i = 0 ; i < values.size(); i++) { String url = "https://auth_services.teambition.com/captcha/image?uid=" + uid + "&lang=zh&index=" + i; byte [] captchaBytes = HttpUtil.request(url, null , Connection.Response::bodyAsBytes); String outputLocation = "data/captcha/teambition/captcha-imgs/" + values.getString(i) + ".png" ; IOUtils.write(captchaBytes, new FileOutputStream(outputLocation)); } } public static void downloadRawData() throws IOException { for ( int i = 0 ; i < 10000 ; i++) { String url = "https://auth_services.teambition.com/captcha/setup?num=5&lang=zh&_=" + System.currentTimeMillis(); String responseBody = HttpUtil.request(url, connection -> { connection.userAgent( "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36" ); }, Connection.Response::body); FileUtils.writeStringToFile( new File( "data/captcha/teambition/raw-captcha.jsonl" ), responseBody + "\n" , "UTF-8" , true ); handleSingleCaptcha(responseBody); } } public static void main(String[] args) throws IOException { downloadRawData(); } } |
得到一批验证码的原始图片:
raw-captcha.json:
01 02 03 04 05 06 | { "values" :[ "f38d8ea6e916762a57be2108c7c0b29c027650f3" , "cc6fc6dd8ef8f17b7fa99f44dfaa65df221af4f4" , "801d66a2673d0ba30ba9d02412d2321e1ba1de94" , "66cecc1d439a74d10a6a2d628f2a358fa90df1df" , "5d8029335a0330a1ff9f0c5766f3502dbba1ad1f" ], "imageName" : "飞机" , "uid" : "27419090-0b57-11ea-8fd8-e31d1dab49d9" } { "values" :[ "f54416fb674ae827c35c004e35057bdbc14fc0fe" , "ca1cc93bdc4236889e154b7542bf5675c7ffedc0" , "368a7a84132450e43f5020802b397b071dfe7840" , "0a449565c02a16adbe17b799c30947c8c904ad73" , "29d6a3b0ff1c42af3a9bb47fd759872a5e8f5931" ], "imageName" : "锁" , "uid" : "27852940-0b57-11ea-a598-c15471c1be2e" } { "values" :[ "84ba343b153e45c4c9aae9b260cfefa297587eda" , "fffe2e988c105b50c902d6372340a306abda0ce5" , "04ab1f94a0a73fdbf37d6aa40beb9878ef737c8f" , "ec2b8e8612ced4cd43f656bce3050dc3ef58c656" , "e61b104c2e7fbc1c6ab1dfa1b2a23f2c6fde1680" ], "imageName" : "相机" , "uid" : "27bab830-0b57-11ea-8fd8-e31d1dab49d9" } { "values" :[ "295da38531279e1938aa4a87f354f27f62feb159" , "b3ed0b06f53373ba6d8ffdccea65e807d26b53e6" , "f6baa4dcea4f4d845f4d13301a39dbbe9bbe9fe9" , "61ce83c8c500d9cf96b79e34e9147993a0c6b359" , "93244a67523c11554f3ce8b49950d8fcbdbbf8fd" ], "imageName" : "锁" , "uid" : "27ecebc0-0b57-11ea-8fd8-e31d1dab49d9" } { "values" :[ "b2c5fa8ec472914cbdaff3d790fc0eb0c8a45adf" , "5ebd5d74430628a34fb189e9efc919c12afa069e" , "369adc4406bd49e7876e760b3e13dcc2637daa87" , "d664b3e2334880096f88fd50349e7d5b9e4e0fcd" , "cef15533490a94c27eeb8ae8dd97efb47ba717ce" ], "imageName" : "相机" , "uid" : "282587f0-0b57-11ea-8fd8-e31d1dab49d9" } ... |
然后就是刚才从刚才下载到的验证码图片中生成imageName到图片特征的一个map:
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 | package cc11001100.misc.crawler.captcha.teambition; import com.alibaba.fastjson.JSON; import com.alibaba.fastjson.JSONArray; import lombok.extern.slf4j.Slf4j; import org.apache.commons.io.FileUtils; import org.apache.commons.lang3.StringUtils; import javax.imageio.ImageIO; import java.io.File; import java.io.FileInputStream; import java.io.IOException; import java.util.HashMap; import java.util.HashSet; import java.util.Map; import java.util.Set; import java.util.stream.Collectors; /** * @author CC11001100 */ @Slf4j public class DerivationLabel { // 将用到的图片集找出来并打个标注 private static void derivationLabel() throws IOException { Map<String, Integer> imageNameToHashCodeMap = new HashMap<>(); FileUtils.readLines( new File( "data/captcha/teambition/raw-captcha.jsonl" ), "UTF-8" ).stream() .filter(StringUtils::isNotBlank) .collect(Collectors.groupingBy(line -> JSON.parseObject(line).getString( "imageName" ))) .forEach((imageName, lineList) -> { Set<Integer> interceptingSet = new HashSet<>(); for (String line : lineList) { JSONArray values = JSON.parseObject(line).getJSONArray( "values" ); Set<Integer> currentSet = new HashSet<>(); // 下载的时候有几次强制中断观察效果,所以一组values的图片可能会下得不全,不全的这种就直接忽略掉了 boolean hasError = false ; for ( int i = 0 ; i < values.size(); i++) { String f = "data/captcha/teambition/captcha-imgs/" + values.getString(i) + ".png" ; try { currentSet.add(ImageUtil.hash(ImageIO.read( new FileInputStream(f)))); } catch (Exception e) { log.error( "Exception, path=" + f, e); hasError = true ; break ; } } if (hasError) { return ; } if (interceptingSet.isEmpty()) { interceptingSet.addAll(currentSet); } else { interceptingSet.retainAll(currentSet); if (interceptingSet.isEmpty()) { log.info( "数据不足,imageName={}" , imageName); break ; } } } if (interceptingSet.size() != 1 ) { log.info( "imageName={}, derivation failed" , imageName); } else { log.info( "imageName={}, set={}" , imageName, interceptingSet); imageNameToHashCodeMap.put(imageName, interceptingSet.iterator().next()); } }); imageNameToHashCodeMap.forEach((k, v) -> System.out.printf( "map.put(\"%s\", %d);\n" , k, v)); } public static void main(String[] args) throws IOException { derivationLabel(); } } |
这里对图片的特征就是取hash值,用到的工具类如下:
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | package cc11001100.misc.crawler.captcha.teambition; import lombok.extern.slf4j.Slf4j; import java.awt.image.BufferedImage; /** * @author CC11001100 */ @Slf4j public class ImageUtil { public static int hash(BufferedImage image) { StringBuilder msg = new StringBuilder(); for ( int i = 0 ; i < image.getWidth(); i++) { for ( int j = 0 ; j < image.getHeight(); j++) { msg.append(image.getRGB(i, j)).append( "|" ); } } return msg.toString().hashCode(); } } |
生成的map如下:
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | map.put("音符", 182834422); map.put("锁", 825168351); map.put("机器人", -1714422141); map.put("汽车", -769011042); map.put("钥匙", 975258806); map.put("树叶", -179264444); map.put("信封", -702966573); map.put("相机", 663652535); map.put("文件夹", -1425863546); map.put("云朵", 2106631124); map.put("飞机", 1640044711); map.put("T恤", -258338857); map.put("眼睛", 1647675580); map.put("树", -2063289315); map.put("放大镜", -1715725768); map.put("闹钟", 1335715652); map.put("回形针", 1654053339); map.put("地球", -1592219546); map.put("脚印", -1438760947); map.put("标签", 761482882); map.put("剪刀", 1998833602); map.put("灯泡", 418507311); map.put("伞", -2104015908); map.put("图表", -824773152); map.put("气球", 1423728112); map.put("太阳眼镜", 1204904862); map.put("椅子", 193112560); map.put("打印机", -939522792); map.put("旗帜", 834329993); map.put("猫", 1911236121); map.put("女人", 2047088238); map.put("男人", 664214693); map.put("卡车", -1453025175); map.put("电脑", -1970735883); map.put("裤子", -337658120); map.put("铅笔", 1993614559); map.put("房子", -1299209990); |
然后就是识别部分了,这里只是将答案打印出来,并不提交,提交的话短信就真的发出去了:
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 | package cc11001100.misc.crawler.captcha.teambition; import cc11001100.misc.crawler.utils.HttpUtil; import com.alibaba.fastjson.JSON; import com.alibaba.fastjson.JSONArray; import com.alibaba.fastjson.JSONObject; import lombok.Builder; import lombok.Data; import lombok.extern.slf4j.Slf4j; import org.jsoup.Connection; import javax.imageio.ImageIO; import java.awt.image.BufferedImage; import java.io.ByteArrayInputStream; import java.io.IOException; import java.util.HashMap; import java.util.Map; /** * @author CC11001100 */ @Slf4j public class TeambitionCaptchaCracker { private static final Map<String, Integer> map = new HashMap<>(); static { map.put( "音符" , 182834422 ); map.put( "锁" , 825168351 ); map.put( "机器人" , - 1714422141 ); map.put( "汽车" , - 769011042 ); map.put( "钥匙" , 975258806 ); map.put( "树叶" , - 179264444 ); map.put( "信封" , - 702966573 ); map.put( "相机" , 663652535 ); map.put( "文件夹" , - 1425863546 ); map.put( "云朵" , 2106631124 ); map.put( "飞机" , 1640044711 ); map.put( "T恤" , - 258338857 ); map.put( "眼睛" , 1647675580 ); map.put( "树" , - 2063289315 ); map.put( "放大镜" , - 1715725768 ); map.put( "闹钟" , 1335715652 ); map.put( "回形针" , 1654053339 ); map.put( "地球" , - 1592219546 ); map.put( "脚印" , - 1438760947 ); map.put( "标签" , 761482882 ); map.put( "剪刀" , 1998833602 ); map.put( "灯泡" , 418507311 ); map.put( "伞" , - 2104015908 ); map.put( "图表" , - 824773152 ); map.put( "气球" , 1423728112 ); map.put( "太阳眼镜" , 1204904862 ); map.put( "椅子" , 193112560 ); map.put( "打印机" , - 939522792 ); map.put( "旗帜" , 834329993 ); map.put( "猫" , 1911236121 ); map.put( "女人" , 2047088238 ); map.put( "男人" , 664214693 ); map.put( "卡车" , - 1453025175 ); map.put( "电脑" , - 1970735883 ); map.put( "裤子" , - 337658120 ); map.put( "铅笔" , 1993614559 ); map.put( "房子" , - 1299209990 ); } public static Answer getAnswer(JSONObject responseJsonObject) throws IOException { String imageName = responseJsonObject.getString( "imageName" ); Integer targetHashcode = map.get(imageName); if (targetHashcode == null ) { return null ; } JSONArray values = responseJsonObject.getJSONArray( "values" ); String uid = responseJsonObject.getString( "uid" ); for ( int i = 0 ; i < values.size(); i++) { String url = "https://auth_services.teambition.com/captcha/image?uid=" + uid + "&lang=zh&index=" + i; byte [] imageBytes = HttpUtil.request(url, null , Connection.Response::bodyAsBytes); if (imageBytes == null ) { log.info( "image download failed, imageName={}, uid={}, index={}" , imageName, uid, i); continue ; } BufferedImage image = ImageIO.read( new ByteArrayInputStream(imageBytes)); int currentImageHashcode = ImageUtil.hash(image); if (currentImageHashcode == targetHashcode) { return Answer.builder().imageName(imageName).imageUrl(url).index(i).build(); } } return null ; } public static void test() throws IOException { String url = "https://auth_services.teambition.com/captcha/setup?num=5&lang=zh&_=" + System.currentTimeMillis(); String responseJsonStr = HttpUtil.request(url, connection -> { connection.userAgent( "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36" ); }, Connection.Response::body); JSONObject responseJsonObject = JSON.parseObject(responseJsonStr); Answer answer = getAnswer(responseJsonObject); if (answer == null ) { log.info( "not find answer, responseJsonStr={}" , responseJsonStr); } else { System.out.println(JSON.toJSONString(answer, true )); } } @Data @Builder public static class Answer { private String imageName; private String imageUrl; private int index; } public static void main(String[] args) throws IOException { test(); } } |
输出如下:
01 02 03 04 05 | { "imageName" : "气球" , "imageUrl" : "https://auth_services.teambition.com/captcha/image?uid=814b9d80-0b5a-11ea-8fd8-e31d1dab49d9&lang=zh&index=3" , "index" : 3 } |
点一下查看图片(点自己控制台上的,验证码图片都是会过期的,这里的链接过不多久就不能用了),发现是气球,多试个几次也都是对的,至此破解完毕。
相关资料:
1. https://account.teambition.com/login
.
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 10年+ .NET Coder 心语,封装的思维:从隐藏、稳定开始理解其本质意义
· .NET Core 中如何实现缓存的预热?
· 从 HTTP 原因短语缺失研究 HTTP/2 和 HTTP/3 的设计差异
· AI与.NET技术实操系列:向量存储与相似性搜索在 .NET 中的实现
· 基于Microsoft.Extensions.AI核心库实现RAG应用
· 10年+ .NET Coder 心语 ── 封装的思维:从隐藏、稳定开始理解其本质意义
· 地球OL攻略 —— 某应届生求职总结
· 提示词工程——AI应用必不可少的技术
· Open-Sora 2.0 重磅开源!
· 周边上新:园子的第一款马克杯温暖上架