selenium爬取百度图片

一：简介

通过selenium模块，模拟火狐浏览器进行搜索下载操作。

二：脚本内容

# -*- coding:utf-8 -*-

# 百度图片自动爬去
# Chrome浏览器类似，设置其options：
# download.default_directory：设置下载路径
# profile.default_content_settings.popups：设置为 0 禁止弹出窗口

import time
from selenium import webdriver

# 创建一个浏览器的profile文件
profile = webdriver.FirefoxProfile()
# 指定下载路径
profile.set_preference('browser.download.dir',"D:\\images")
# 设置成 2 表示使用自定义下载路径；设置成 0 表示下载到桌面；设置成 1 表示下载到默认路径
profile.set_preference('browser.download.folderList', 2)
# 是否弹出下载管理器
profile.set_preference('browser.download.manager.showWhenStarting', False)
# 指定下载文件的数据类型
profile.set_preference('browser.helperApps.neverAsk.saveToDisk','image/jpeg, image/png')

# 打开火狐浏览器，并指定profile文件
firfox = webdriver.Firefox(firefox_profile=profile)
# 访问百度
firfox.get("http://www.baidu.com")
# 清除百度搜索框，元素id为f12查看
firfox.find_element_by_id("kw").clear()
# 发送内容到搜索框
firfox.find_element_by_id("kw").send_keys(u"美女")
time.sleep(2)
# 点击百度一下按钮
firfox.find_element_by_id("su").click()
time.sleep(5)
# 对当前页面进行定位（不一定生效）
firfox.current_window_handle
# 根据xpath定位
firfox.find_element_by_xpath("//*[@id=\"s_tab\"]/a[5]").click()
time.sleep(3)
firfox.current_window_handle
# 根据xpath定位
firfox.find_element_by_xpath("//*[@id=\"imgid\"]/div/ul/li[1]/div[1]/a/img").click()
time.sleep(3)
# 加载当前页面的元素
firfox.switch_to_window(firfox.window_handles[1])
while True:
    
    #firfox.find_element_by_xpath("//html/body/div[1]/div[2]/div/div[2]/div/div[1]/span[7]").click()
    # 根据class进行定位，也就是点击下载按钮
    firfox.find_element_by_class_name(r"bar-btn.btn-download").click()
    time.sleep(10)
    # 切换图片
    firfox.find_element_by_xpath("//*[@id=\"container\"]/span[2]").click()
    time.sleep(10)
firfox.quit()

posted @ 2017-10-21 02:44 Goun 阅读(1046) 评论(0) 收藏举报

刷新页面返回顶部

Goun

GitHup：https://github.com/GounGG

selenium爬取百度图片

一：简介

二：脚本内容

公告