selenium库

selenium

安装,配置及连接

selenium

官方网站：http://www.seleniumhq.org

GitHub：https://github.com/SeleniumHQ/selenium/tree/master/py

PyPI：https://pypi.python.org/pypi/selenium

官方文档：http://selenium-python.readthedocs.io

中文文档：http://selenium-python-zh.readthedocs.io

安装:pip3 install selenium

selenium需要配合浏览器及其驱动配合

ChromeDriver

官方网站：https://sites.google.com/a/chromium.org/chromedriver(墙)

下载地址：https://chromedriver.storage.googleapis.com/index.html

下载地址2:http://npm.taobao.org/mirrors/chromedriver/

版本映射表:https://blog.csdn.net/huilan_same/article/details/51896672

浏览器中查看Chrome版本(版本 67.0.3396.99（正式版本）Built on Ubuntu , running on Ubuntu 16.04 （64 位）),安装相应的ChromeDriver版本(2.38,2.39.2.40)

将可执行文件配置到环境变量或将文件移动到属于环境变量的目录里:sudo mv chromedriver /usr/bin

GeckoDriver

GitHub：https://github.com/mozilla/geckodriver

下载地址：https://github.com/mozilla/geckodriver/releases

下载相应版本,将可执行文件配置到环境变量或将文件移动到属于环境变量的目录里sudo mv geckodriver /usr/bin

命令行下直接执行geckodriver命令测试：geckodriver

python中测试:

from selenium import webdriver
browser = webdriver.Firefox()

PhantomJS(无界面浏览器,新版selenium中已被弃用)

官方网站：http://phantomjs.org

官方文档：http://phantomjs.org/quick-start.html

下载地址：http://phantomjs.org/download.html

API接口说明：http://phantomjs.org/api/command-line.html

Chrome和Firefox的无界模式

Chrome无界模式(chrome变为firefox就是Firefox无界模式):

from selenium.webdriver.chrome.options import Options

chrome_options = Options()
chrome_options.add_argument('--headless')  # 无头参数
chrome_options.add_argument('--disable-gpu')  # 禁用gpu加速
driver = webdriver.Chrome(chrome_options=chrome_options)

另一种:

options = webdriver.FirefoxOptions()
options.set_headless()
# options.add_argument(‘--headless‘)
#options.add_argument(‘--disable-gpu‘)
driver=webdriver.Firefox(firefox_options=options)

基本使用

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait

browser = webdriver.Chrome()  # 1.声明浏览器对象
try:
    browser.get('https://www.baidu.com')  # 2.get()方法请求网页
    input = browser.find_element_by_id('kw')  # 3.查找节点
    input.send_keys('Python')  # 4.节点操作
    input.send_keys(Keys.ENTER)
    wait = WebDriverWait(browser, 10)
    wait.until(EC.presence_of_element_located((By.ID, 'content_left')))
    print(browser.current_url)
    print(browser.get_cookies())
    print(browser.page_source)  # 5.返回信息
finally:
    browser.close()  # 6.关闭浏览器对象,标签页.quit关闭浏览器.

声明对象

from selenium import webdriver

browser = webdriver.Chrome()  # 相应支持的浏览器

访问页面

browser.get('https://www.taobao.com')

查找节点

element是查找单个节点,变为elements将查找多个节点,返回列表

find_element()  # 通用方法,它需要传入两个参数：查找方式By和值,例如:find_element(By.ID, id)
find_element_by_id()
find_element_by_name()
find_element_by_xpath()  # xpath选择器
find_element_by_link_text()
find_element_by_partial_link_text()
find_element_by_tag_name()
find_element_by_class_name()
find_element_by_css_selector()  # css选择器

节点交互

输入文字时用:send_keys()方法,特殊的按键可以使用Keys类来输入，该类继承自 selenium.webdriver.common.keys
清空文字时用:clear()方法
点击按钮时用:click()方法

交互动作介绍文档：http://selenium-python.readthedocs.io/api.html#module-selenium.webdriver.remote.webelement。

动作链

拖放
使用拖放，移动一个元素，或放到另一个元素内:

element = driver.find_element_by_name("source")
target = driver.find_element_by_name("target")

from selenium.webdriver import ActionChains
action_chains = ActionChains(driver)
action_chains.drag_and_drop(element, target).perform()

动作链操作参考文档：http://selenium-python.readthedocs.io/api.html#module-selenium.webdriver.common.action_chains。

执行javaScript

execute_script()方法,模拟运行JavaScript,实现api没有提供的功能

例如将进度条下拉到最底部:browser.execute_script('window.scrollTo(0, document.body.scrollHeight)')

获取节点信息

get_attribute():获取节点的属性
text属性:获取文本值
id属性:获取节点id
location属性:获取该节点在页面中的相对位置
tag_name属性:获取标签名称，
size属性:获取节点的大小，也就是宽高

切换Frame

Selenium打开页面后，它默认是在父级Frame里面操作，而此时如果页面中还有子Frame，不能获取到子Frame里面的节点

switch_to.frame()方法来切换Frame

延时等待

隐式等待:

当查找元素或元素并没有立即出现的时候，隐式等待将等待一段时间再查找 DOM，没找到抛出找不到元素的异常,默认的时间是0
browser.implicitly_wait(10)：设置等待时间

显示等待:

指定最长等待时间和条件，如果条件满足，就返回查找的节点，不满足继续等待直到条件满足或超出最长等待时间（抛出异常）
首先引入WebDriverWait这个对象，指定最长等待时间，然后调用它的until()方法，传入要等待条件expected_conditions

例如:

wait = WebDriverWait(browser, 10)
input = wait.until(EC.presence_of_element_located((By.ID, 'q')))
button = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, '.btn-search')))

等待条件:

title_is	标题是某内容
title_contains	标题包含某内容
presence_of_element_located	节点加载出来，传入定位元组，如(By.ID, 'p')
visibility_of_element_located	节点可见，传入定位元组
visibility_of	可见，传入节点对象
presence_of_all_elements_located	所有节点加载出来
text_to_be_present_in_element	某个节点文本包含某文字
text_to_be_present_in_element_value	某个节点值包含某文字
frame_to_be_available_and_switch_to_it	加载并切换
invisibility_of_element_located	节点不可见
element_to_be_clickable	节点可点击
staleness_of	判断一个节点是否仍在DOM，可判断页面是否已经刷新
element_to_be_selected	节点可选择，传节点对象
element_located_to_be_selected	节点可选择，传入定位元组
element_selection_state_to_be	传入节点对象以及状态，相等返回True，否则返回False
element_located_selection_state_to_be	传入定位元组以及状态，相等返回True，否则返回False
alert_is_present	是否出现警告