青春叛逆者

2018年12月26日

摘要： #浏览器一直转圈，无法爬取，解决办法 browser.set_page_load_timeout(10) try: browser.get('https://yq.aliyun.com/articles/490268?tdsourcetag=s_pcqq_aiomsg') except TimeoutError: print('time out after 30 seconds ... 阅读全文

posted @ 2018-12-26 10:37 青春叛逆者阅读(1189) 评论(0) 推荐(0) 编辑

python爬虫切换窗口与休眠

摘要： #切换到新窗口 import time from selenium import webdriver from selenium.webdriver.firefox.options import Options as FOptions options=FOptions() browser=webdriver.Firefox(executable_path="/Users/mac126/g... 阅读全文

posted @ 2018-12-26 10:19 青春叛逆者阅读(909) 评论(0) 推荐(0) 编辑

python爬虫动作链进阶

摘要： #动作链进阶 import time from selenium import webdriver from selenium.webdriver.firefox.options import Options as FOptions options=FOptions() browser=webdriver.Firefox(executable_path="/Users/mac126/ge... 阅读全文

posted @ 2018-12-26 10:17 青春叛逆者阅读(355) 评论(0) 推荐(0) 编辑

python爬虫鼠标模拟悬停并点击

摘要： #鼠标模拟悬停并点击 import time from selenium import webdriver from selenium.webdriver.firefox.options import Options as FOptions options=FOptions() browser=webdriver.Firefox(executable_path="/Users/mac12... 阅读全文

posted @ 2018-12-26 10:16 青春叛逆者阅读(2658) 评论(0) 推荐(0) 编辑

python 爬虫元素交互

摘要： import time from selenium import webdriver from selenium.webdriver.firefox.options import Options as FOptions options=FOptions() browser=webdriver.Firefox(executable_path="/Users/mac126/geckodrive... 阅读全文

posted @ 2018-12-26 10:15 青春叛逆者阅读(236) 评论(0) 推荐(0) 编辑

2018年12月25日

python xpath基础 03

摘要： from lxml import etree text = ''' first item second item third item fourth item fifth item ''' html = etree.HTML(text) result = html.xpath(... 阅读全文

posted @ 2018-12-25 20:34 青春叛逆者阅读(159) 评论(0) 推荐(0) 编辑

python xpath基础 02

摘要： from lxml import etree html = etree.parse('./test.html', etree.HTMLParser())## #test.html是html文件，etree.HTMLParser(),解析器 # result = html.xpath('//li')#选取所有的li节点，是一个列表的形式 # print(result) # print(resul... 阅读全文

posted @ 2018-12-25 20:28 青春叛逆者阅读(176) 评论(0) 推荐(0) 编辑

面试题之获取IP地址

摘要： #方法一import re from lxml import html import requests def myRequest(url): ''' 封装自己爬取exam页面的request :param url: 地址 :return: ''' response = requests.get(url) cookiejar = respo... 阅读全文

posted @ 2018-12-25 19:30 青春叛逆者阅读(549) 评论(0) 推荐(0) 编辑

python xpath基础 01

摘要： from lxml import etree text = ''' first item second item third item fourth item fifth item ''' html = etree.HTML(text)#构造了一个XPath解析对象并对HTML... 阅读全文

posted @ 2018-12-25 18:12 青春叛逆者阅读(117) 评论(0) 推荐(0) 编辑

获取动态IP

摘要： import requests import re import lxml.html class Exam_spider: def __init__(self): self.base_url = 'http://datamining.comratings.com/exam' self.s = requests.session() def do... 阅读全文

posted @ 2018-12-25 10:02 青春叛逆者阅读(342) 评论(0) 推荐(0) 编辑

公告