使用python的selenium库自动填写网页(疫情每日一报)
每日一报这种东西,上有政策,下就一定有对策。比如我就写了个python程序实现了自动登录填表,由于众所周知的原因,这里略去具体机构和网页,只分享代码。
首先需要安装selenium 依赖,并且还需要下载一个webdriver,我使用的是chrome的webdriver。
https://chromedriver.chromium.org/downloads
直接上代码:
1 from selenium import webdriver 2 import time 3 4 from selenium.common.exceptions import NoSuchElementException 5 6 driver = webdriver.Chrome('./chromedriver') 7 driver.get("http://XXXXXXX.XXXX.XXX.cn") 8 9 account = driver.find_element_by_id("username") 10 account.send_keys("1XXXXXXX") 11 12 pwd = driver.find_element_by_id("password") 13 pwd.send_keys("0XXXXXXXXX") 14 15 login = driver.find_element_by_id("login-submit") 16 login.click() 17 time.sleep(3) 18 try: 19 messageNotification = driver.find_element_by_id("layui-layer1") 20 except NoSuchElementException: 21 print ("No new messages") 22 else: 23 confirm = driver.find_element_by_class_name("layui-layer-btn0") 24 confirm.click() 25 try: 26 while True: 27 driver.find_element_by_css_selector('[style="color:red;"]').find_element_by_xpath('..').click() 28 rtn = driver.find_element_by_css_selector('[role="button"]') 29 rtn.click() 30 except NoSuchElementException: 31 print ("No unread message found!") 32 33 34 try: 35 while True: 36 x = driver.find_element_by_id("fineui_1") 37 x.click() 38 reportHis = driver.find_element_by_id("lnkReportHistory") 39 reportHis.click() 40 41 toWrite = driver.find_element_by_css_selector('[href^="/DayReport.aspx"]') 42 toWrite.click() 43 checkBox = driver.find_element_by_id("p1_ChengNuo-inputEl-icon") 44 checkBox.click() 45 temperature = driver.find_element_by_id("p1_TiWen-inputEl") 46 temperature.send_keys("36") 47 submit = driver.find_element_by_id("p1_ctl00_btnSubmit") 48 submit.click() 49 subConf = driver.find_element_by_id("fineui_68") 50 subConf.click() 51 time.sleep(3) 52 subConfConf = driver.find_element_by_id("fineui_73") 53 subConfConf.click() 54 except NoSuchElementException: 55 print("All reported. Exiting...") 56 57 driver.close()
先大体解释一下,9-17行进行登录操作,18-31行检查消息中心的未读消息并全部自动读取,34-55行进入每日一报,自动填写所有未填的表单。
技术上值得注意的一些地方:
- 有时,代码逻辑没有问题,但却出现类似 stale element reference: element is not attached to the page document 这样的错误,也就是webdriver找不到我们指定的元素,很有可能是因为网页还没完全加载,程序就对其进行操作。使用time.sleep() 让程序等待一下网页,一般就能解决。
- 无论是find_element_by 还是find_elements_by ,如果没有找到对应的元素,抛出一个NoSuchElementException 异常。
- 定位到父节点元素的方法:
find_element_by_xpath('..')
- 使用css selector 以属性定位元素的方法:
driver.find_element_by_css_selector('[style="color:red;"]')
- 用类似于正则的属性通配符,对属性值进行模糊匹配:
find_element_by_css_selector('[href^="/DayReport.aspx"]') 参考:https://www.w3.org/TR/selectors/#attribute-substrings
6.2. Substring matching attribute selectors
Three additional attribute selectors are provided for matching substrings in the value of an attribute:
- [att^=val]
- Represents an element with the
att
attribute whose value begins with the prefix "val". If "val" is the empty string then the selector does not represent anything.- [att$=val]
- Represents an element with the
att
attribute whose value ends with the suffix "val". If "val" is the empty string then the selector does not represent anything.- [att*=val]
- Represents an element with the
att
attribute whose value contains at least one instance of the substring "val". If "val" is the empty string then the selector does not represent anything.