随笔- 35 文章- 1 评论- 9 阅读- 44976

node 使用selenium 爬取页面数据（node爬虫）

什么是selenium-webdriver

selenium-webdriver是一种用于调动浏览器进行操作的插件。本文主要是给node使用，并拥有爬虫获取数据。

操作流程

打开npm网站，搜索selenium-webdriver
https://www.npmjs.com/package/selenium-webdriver

选择自己使用的浏览器，并安装对应的浏览器版本，一定要和自己浏览器的版本一致的驱动程序

写清楚使用的浏览器，并且调用的辅助驱动最好和调用程序放在一个目录下

也chrome版本比较多，如果找不到对应的版本还可以使用firefox，效果基本一致，只是浏览器不一样
本案例使用的就是火狐，读取一个小说网站，并通过css和标签获取章节名和链接地址
先安装模块
npm i selenium-webdriver

下面是全部代码

const {Builder, By, Key, until} = require('selenium-webdriver');
(async function example() {
  let driver = await new Builder().forBrowser('firefox').build();
  try {
    await driver.get('https://m.banzhuchilaohu.com/indexlist/2916/');
// await driver.findElement(By.id('cboxClose')).click()
    
    // await driver.findElement(By.id('kw')).sendKeys('前端', Key.RETURN);
     let items = await driver.findElements(By.css('.chapter li'));
     var list = []
    for(let i=0; i<items.length; i++) {
      let item = items[i];
      // console.log(await  item.getText())
      let title = await item.findElement(By.css("a")).getText();
      let url = await item.findElement(By.css("a")).getAttribute("href");
      list.push({title,url});
      
    }
    console.log(list);


  } finally {
    // await driver.quit();
  }
})();

posted @ 2022-02-02 16:35 IT源码猫阅读(1084) 评论(0) 编辑收藏举报

刷新页面返回顶部

登录后才能查看或发表评论，立即登录或者逛逛博客园首页

相关博文：

· node 爬虫根据目录自动存写入文档

· 同时启动node和vue项目解决方案

· 07selenium

· 浏览器模拟爬虫

· 爬虫之selenium

阅读排行：
· 单线程的Redis速度为什么快？
· 展开说说关于C#中ORM框架的用法！
· Pantheons：用 TypeScript 打造主流大模型对话的一站式集成库
· SQL Server 2025 AI相关能力初探
· 为什么退出登录或修改密码无法使 token 失效

公告

昵称： IT源码猫
园龄： 5年10个月
粉丝： 17
关注： 9

+加关注

2025年3月

日

一

二

三

四

五

六

IT源码猫

node 使用selenium 爬取页面数据（node爬虫）

什么是selenium-webdriver

操作流程

公告

搜索

常用链接

我的标签

随笔档案

阅读排行榜

评论排行榜

推荐排行榜

最新评论