【Python-WebDriver实战篇】处理验证码的方式

【Selenium-WebDriver实战篇】Selenium验证码处理方式

之前记录过使用java去处理验证码的方式，但是总是不是特别理想，会有错误的时候。

这两天项目需要，就调研了下用python去破解4位数字的验证码，中间夹杂干扰项，效果是：

图片上不复杂的时候，会成功。

图片上有干扰项的时候，会识别错误（少位数或有英文）。

解决方案：

当识别的结果不足4位，或不是纯数字时，让其再进行刷新再识别。

这样以后，成功概率会很高。

帮助比较大的网站参考：

https://blog.csdn.net/weixin_58839230/article/details/124243584

一、安装ddddocr
通过命令将自动安装符合自己电脑环境的最新 ddddocr。

pip install ddddocr

如果安装速度慢，可以连接国内镜像进行安装，命令如下：

pip install ddddocr -i https://pypi.tuna.tsinghua.edu.cn/simple/

二、使用实战

识别的核心代码：

import ddddocr

ocr = ddddocr.DdddOcr()
with open('code.png', 'rb') as f:
    img_bytes = f.read()
res = ocr.classification(img_bytes)
print('识别出的验证码为：' + res)

详细代码部分：

登录代码：

    # 用户登录
    def user_login(self):
        # 输入信息
        username = 'test'
        password = '1234'

        print("请在页面输入用户名/密码登录")
        username_input = self.driver.find_element(By.NAME, value = 'username')
        # 输入用户名
        username_input.send_keys(username)
        print('输入用户名：' + username)
        time.sleep(1)

        # 输入密码
        password_input = self.driver.find_element(By.NAME, value = 'password')
        password_input.send_keys(password)
        print('输入密码：' + password)
        time.sleep(1)

        # 获取验证码
        verifycode = self.getVerification()
        isRight = False
        # 如果识别的验证码长度不是4，重新点击获取
        while isRight == False:
            if len(verifycode) != 4 or verifycode.isdigit() == 'False':
                print("验证码不是纯数字时，多次获取")
                # 定位验证码
                code_img_element = self.driver.find_element(By.CLASS_NAME, value = 'el-image__inner')
                code_img_element.click()
                verifycode = self.getVerification()
            else :
                isRight = True

        # 输入验证码
        code_input = self.driver.find_element(By.NAME, value = "code")
        code_input.send_keys(verifycode)
        time.sleep(1)

        # 点击登录按钮
        self.driver.find_element(By.XPATH,value = "//*[text()='登录']").click()
        time.sleep(1)

        print("登录成功")
        time.sleep(1)

        cookies = self.driver.get_cookies()
        token_vaule = cookies[0]['value']
        print(token_vaule)
        input("查看cookie是否正确")

        self.driver.quit()

判断验证码的代码：

    # 获取验证码信息
    def getVerification(self):
        # 获取当前文件的位置、并获取保存截屏的位置
        screenshot_path = YIQING_SCREEN_PATH
        code_path = YIQING_CODE_PATH

        time.sleep(1)
        # 截取当前网页并放到自定义目录下，并命名为printscreen，该截图中有我们需要的验证码
        self.driver.save_screenshot(screenshot_path)
        time.sleep(1)
        # 定位验证码
        code_img_element = self.driver.find_element(By.CLASS_NAME, value = 'el-image__inner')
        # 获取验证码x,y轴坐标
        location = code_img_element.location
        print('location[x]：' + str(location['x']) + 'location[y]：' + str(location['y']))
        # 获取验证码的长宽
        size = code_img_element.size
        print('size[width]：' + str(size['width']) + 'size[height]：' + str(size['height']))
        # 写成我们需要截取的位置坐标
        rangle = (int(location['x']),
                  int(location['y']),
                  int(location['x'] + size['width']),
                  int(location['y'] + size['height']))
        # 打开截图
        i = Image.open(screenshot_path)
        # 使用Image的crop函数，从截图中再次截取我们需要的区域
        fimg = i.crop(rangle)
        fimg = fimg.convert('RGB')
        # 保存我们截下来的验证码图片，并读取验证码内容
        fimg.save(code_path)

        ocr = ddddocr.DdddOcr()
        with open(code_path, 'rb') as f:
            img_bytes = f.read()
        self.res = ocr.classification(img_bytes)
        print('识别出的验证码为：' + self.res)
        return self.res

posted on 2022-08-23 17:51 伊凡Ivan 阅读(416) 评论(0) 编辑收藏举报