Python网络爬虫(selenium模拟登录12306网站)

一、通过selenium自动登录12306官网

　　1.1 超级鹰打码平台API，创建chaojiyin.py文件

#!/usr/bin/env python
# coding:utf-8

import requests
from hashlib import md5


class Chaojiying_Client(object):

    def __init__(self, username, password, soft_id):
        self.username = username
        password = password.encode('utf8')
        self.password = md5(password).hexdigest()
        self.soft_id = soft_id
        self.base_params = {
            'user': self.username,
            'pass2': self.password,
            'softid': self.soft_id,
        }
        self.headers = {
            'Connection': 'Keep-Alive',
            'User-Agent': 'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0)',
        }

    def PostPic(self, im, codetype):
        """
        im: 图片字节
        codetype: 题目类型 参考 http://www.chaojiying.com/price.html
        """
        params = {
            'codetype': codetype,
        }
        params.update(self.base_params)
        files = {'userfile': ('ccc.jpg', im)}
        r = requests.post('http://upload.chaojiying.net/Upload/Processing.php', data=params, files=files,
                          headers=self.headers)
        return r.json()

    def ReportError(self, im_id):
        """
        im_id:报错题目的图片ID
        """
        params = {
            'id': im_id,
        }
        params.update(self.base_params)
        r = requests.post('http://upload.chaojiying.net/Upload/ReportError.php', data=params, headers=self.headers)
        return r.json()

chaojiying.py

　　1.2 模拟登录12306：

from selenium import webdriver
from selenium.webdriver import ActionChains
from time import sleep
from day10620190807.chaojiying import Chaojiying_Client
from PIL import Image


# 返回验证码对应的数据
def getCode(imgPath, imgType):
    chaojiying = Chaojiying_Client('用户名', '密码', '900925')  # 用户中心>>软件ID 生成一个替换 96001
    im = open(imgPath, 'rb').read()  # 本地图片文件路径 来替换 a.jpg 有时WIN系统须要//

    return chaojiying.PostPic(im, imgType)['pic_str']


dri = webdriver.Chrome(executable_path="chromedriver.exe")

# 12306官网url
url = "https://kyfw.12306.cn/otn/login/init"
# 打开12306官网
dri.get(url)
sleep(2)

# 截取当前屏幕,并保存为main.png
dri.save_screenshot("main.png")

# 定位到验证码图片
code_img = dri.find_element_by_xpath('//*[@id="loginForm"]/div/ul[2]/li[4]/div/div/div[3]/img')
# 获取验证码左上角位置
location_position = code_img.location
print('location:', location_position)
# 获取图片的尺寸
img_size = code_img.size
print('size:', img_size)

# 定制好截取图片的尺寸(电脑分辨率必须为100%等比缩放)
rangle = (int(location_position["x"]), int(location_position["y"]),
          int((location_position["x"] + img_size["width"])),
          int((location_position["y"] + img_size["height"])))

# 打卡main.png这张图片
i = Image.open("main.png")
# 创建截图图片名称
code_img_name = "code.png"
# 按指定尺寸截取
frame = i.crop(rangle)
# 保存截取的图片
frame.save(code_img_name)

# 通过截图的图片，通过验证码API接口对图片识别
result = getCode("code.png", 9004)
# # 返回验证码的坐标对象
print("result:", result)

# 创建坐标数据结构
all_list = []
if '|' in result:
    list_1 = result.split('|')
    count_1 = len(list_1)
    for i in range(count_1):
        xy_list = []
        x = int(list_1[i].split(',')[0])
        y = int(list_1[i].split(',')[1])
        xy_list.append(x)
        xy_list.append(y)
        all_list.append(xy_list)
else:
    x = int(result.split(',')[0])
    y = int(result.split(',')[1])
    xy_list = []
    xy_list.append(x)
    xy_list.append(y)
    all_list.append(xy_list)

print(all_list)

# 移动光标进行点击验证码, 动作链移动光标
for lis in all_list:
    x = lis[0]
    y = lis[1]
    # 实例化动作链并立即执行移动操作
    ActionChains(dri).move_to_element_with_offset(code_img, x, y).click().perform()
    sleep(0.5)

# 获取到输入框 用户名
dri.find_element_by_id('username').send_keys('150236xxx8')
sleep(1)
# 获取到输入框 密码
dri.find_element_by_id('password').send_keys('xxx')
sleep(1)
# 点击登录
dri.find_element_by_id('loginSub').click()
sleep(3)

page_text = dri.page_source
with open("12306.html", "w", encoding="utf-8") as fp:
    fp.write(page_text)

sleep(3)
# 退出浏览器
dri.quit()

二、python+selenium使用location定位元素坐标偏差处理

　　使用xpath定位元素，用.location获取坐标值，出现较大的偏差原因和解决方法如下：

　　使用定位截图时出现这个问题的，之所以会出现这个坐标偏差是因为电脑上设置的显示缩放比例造成的，location获取的坐标是按显示100%时得到的坐标，而截图所使用的坐标却是需要根据显示缩放比例缩放后对应的图片所确定的，因此就出现了偏差。
解决这个问题有三种方法：
① 修改电脑显示设置为100%。这是最简单的方法。
② 缩放截取到的页面图片，即将截图的size缩放为宽和高都除以缩放比例后的大小（应该需要将缩放后的宽和高转化为int型）
③ 修改Image.crop的参数，将参数元组的四个值都乘以缩放比例（应该也需要转化为int型）

　　以上代码处理方式：②③结合即可

# 以当前电脑125%缩放比为例
rangle = (int(location_position["x"]*1.25), int(location_position["y"]*1.25),
          int((location_position["x"] + img_size["width"])*1.25),
          int((location_position["y"] + img_size["height"])*1.25))

# 动作链作移动定位时，需要等比缩小25%即可
for lis in all_list:
    x = lis[0]*0.85
    y = lis[1]*0.85
    # 实例化动作链并立即执行移动操作
    ActionChains(dri).move_to_element_with_offset(code_img, x, y).click().perform()
    sleep(0.5)

posted @ 2019-08-07 21:55 Amorphous 阅读(699) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

Amorphous

Python网络爬虫(selenium模拟登录12306网站)

一、通过selenium自动登录12306官网

二、python+selenium使用location定位元素坐标偏差处理

公告