java 爬取动态js网站(phantomjs)
public static String DTCollection() throws Exception { // setAgent("10.1.111.14","1080"); //设置必要参数 DesiredCapabilities dcaps = new DesiredCapabilities(); dcaps.setCapability(PhantomJSDriverService.PHANTOMJS_EXECUTABLE_PATH_PROPERTY, "C:\\pachong\\phantomjs.exe"); PhantomJSDriver driver = new PhantomJSDriver(dcaps); driver.manage().timeouts().implicitlyWait(5, TimeUnit.SECONDS); driver.get(url); driver.executeScript("document.getElementsByTagName('html')"); Thread.sleep(2000); String body = driver.getPageSource(); driver.close(); driver.quit(); return body; }
这里用到phantomjs这个插件,直接去官网下载就可以。
加入phantomjs 所需要的maven依赖
<dependency> <groupId>org.seleniumhq.selenium</groupId> <artifactId>selenium-java</artifactId> <version>2.53.1</version> </dependency> <dependency> <groupId>com.codeborne</groupId> <artifactId>phantomjsdriver</artifactId> <version>1.3.0</version> </dependency>
我把设置代理的方法setAgent()注释了(设置代理服务器,上一篇有介绍),因为发现phantomjs无法爬取国外网站,加上会报错误。具体原因还不清楚。