[Node.js] Web Scraping with Pagination and Advanced Selectors

When web scraping, you'll often want to get more than just one page of data. Xray supports pagination by finding the "next" or "more" button on each page and cycling through each new page until it can no longer find that link. This lesson demonstrates how to paginate as well as more advanced selectors for when links are difficult to scrape.

 

复制代码
/**
 * Created by Answer1215 on 8/22/2015.
 */
var Xray = require('x-ray');
var xray = new Xray();

xray('https://news.ycombinator.com/', '.athing', [{
    rank: '.rank',
    title: 'td:nth-child(3) a',
    link: "td:nth-child(3) a@href"
}])
    .paginate('a[rel="nofollow"]:last-child@href')
    .limit(3)
    .write('./results2.json');

///////////////////////////////
//  test
///////////////////////////////

xray('https://news.ycombinator.com/', 'a[rel="nofollow"]', [{
    show: ''
}]).write('./results2.json');
/**
 * [
 {
   "show": "Segment is hiring security engineers to help secure our container fleet"
 },
 {
   "show": "Modafinil for cognitive neuroenhancement: a systematic review"
 },
 {
   "show": "Ports and Power in the Indian Ocean"
 },
 {
   "show": "Natural and Artificial Intelligence (1988) [pdf]"
 },
 {
   "show": "Proofing Spirits with a Homemade Electrobalance"
 },
 {
   "show": "Seth Nickell on Replacing the Aging Init Procedure on Linux (2003)"
 },
 {
   "show": "More"
 }
 ]
 * */

xray('https://news.ycombinator.com/', 'a[rel="nofollow"]:last-child', [{
    show: ''
}]).write('./results2.json');
/*
* [
 {
 "show": "More"
 }
 ]
* */
复制代码

 

posted @   Zhentiw  阅读(575)  评论(0编辑  收藏  举报
(评论功能已被禁用)
编辑推荐:
· SQL Server 2025 AI相关能力初探
· Linux系列:如何用 C#调用 C方法造成内存泄露
· AI与.NET技术实操系列(二):开始使用ML.NET
· 记一次.NET内存居高不下排查解决与启示
· 探究高空视频全景AR技术的实现原理
阅读排行:
· 阿里最新开源QwQ-32B,效果媲美deepseek-r1满血版,部署成本又又又降低了!
· Manus重磅发布:全球首款通用AI代理技术深度解析与实战指南
· 开源Multi-agent AI智能体框架aevatar.ai,欢迎大家贡献代码
· 被坑几百块钱后,我竟然真的恢复了删除的微信聊天记录!
· AI技术革命,工作效率10个最佳AI工具
历史上的今天:
2014-08-22 [jQuery] Custom event trigger
2014-08-22 [jQuery] $.map, $.each, detach() , $.getJSOIN()
点击右上角即可分享
微信分享提示