Scrapy结合Selenium怎样采集动态加载网站?
Scrapy和Selenium的结合使用确实取决于你采集的网站,以及你想采集什么样的数据等。下面的代码就是一个简单的案例,这个案例可以帮助你在商品网站上进行翻页:
import scrapy from selenium import webdriver class ProductSpider(scrapy.Spider): name = "product_spider" allowed_domains = ['ebay.com'] start_urls = ['http://www.ebay.com/sch/i.html?_odkw=books&_osacat=0&_trksid=p2045573.m570.l1313.TR0.TRC0.Xpython&_nkw=python&_sacat=0&_from=R40'] def __init__(self): self.driver = webdriver.Firefox() def parse(self, response): self.driver.get(response.url) while True: next = self.driver.find_element_by_xpath('//td[@class="pagn-next"]/a') try: next.click() # get the data and write it to scrapy items except: break self.driver.close()