python使用无界面浏览器htmlunit干活遇到的bug已解决

最近想获取亚马逊的一些cookie信息，之前采用scrapy来获取的cookie信息可以获取的不多，刚刚开始还可以用，后面太多失效的，还是使用selenium跑一遍cookie发现完美的成活率。但是有一个问题，太消耗资源，而且放在服务器也不好搭环境，就想着使用htmlunit来干活，又省资源，又快速加载东西，啧啧啧，干活啦。

首先你电脑要安装java环境 jdk，安装java之后，我们一般会在https://www.seleniumhq.org这个官网下载我们需要的se服务器

这个是下载好之后的：

然后我们执行以下代码：

java -jar selenium-server-standalone-3.141.59.jar

然后开始执行操作htmlunit

from selenium import webdriver
driver = webdriver.Remote("http://ip地址:4444/wd/hub", webdriver.DesiredCapabilities.HTMLUNIT.copy())

然后很愉快的会出现下面的错误，不管你是在win还是linux上面，两个平台我都尝试过，有问题

于是最后，不断地找问题，最后在github找到相关信息，selenium官网上面下载的服务器不能使用，这里我们用github上面的jar，这边下载：https://github.com/sveneisenschmidt/selenium-server-standalone/tree/master/bin

然后此时，我们在执行

java -jar selenium-server-standalone.jar -port 4448

然后在执行

from selenium import webdriver
driver = webdriver.Remote("http://localhost:4448/wd/hub", desired_capabilities=webdriver.DesiredCapabilities.HTMLUNIT)

然后发现可以啦

或者可以使用网址：http://localhost:4448/wd/hub/static/resource/hub.html监控

还好我有专研精神，终于解决啦！

posted @ 2019-04-11 20:15 WangHello 阅读(1059) 评论(0) 编辑收藏举报

刷新页面返回顶部

WangHello

python使用无界面浏览器htmlunit干活遇到的bug已解决

公告