[Node.js] Availability and Zero-downtime Restarts
It might be possible for our node server has some downtime, no matter it is because server update or simply some crashs in the code. We want to minizie the downtime as much as possible.
1. In case of cluster worker crash, we want master worker fork a new worker:
const http = require('http'); const cluster = require('cluster'); const os = require('os'); if (cluster.isMaster) { const cpus = os.cpus().length; console.log(`Forking for ${cpus} CPUs`); for (let i = 0; i < cpus; i++) { cluster.fork(); } cluster.on('exit', (worker, code, signal) => { if (code !== 0 && !worker.exitedAfterDisconnect) { console.log(`Worker ${worker.id} crashed. Starting a new wroker`); cluster.fork(); } }) } else { require('./server'); }
It is important to check 'worker.exitedAfterDisconnect' to see whether is is because crash or because we want to exit one worker.
2. In case of upgrade, we want to restart each worker one by one, to make zero downtime:
// kill -SIGUSR2 <MASTER_PID> // In case to upgrade, we want to restart each worker one by one process.on('SIGUSR2', () => { const workers = Object.values(cluster.workers); const restartWorker = (workerIndex) => { const worker = cluster.workers[workerIndex]; if (!worker) return; // On worker exit, we want to restart it, then continue // with next worker worker.on('exit', () => { // If it is because crash, we don't continue if (!worker.exitedAfterDisconnect) return; console.log(`Exited process ${worker.process.pid}`); cluster.fork().on('listening', () => { restartWorker(workerIndex + 1); }); worker.disconnect(); }); } // Calling restartWorker recursively restartWorker(0); });
In really production, we don't actually need to code cluster by ourselve, we can use PM2 package. but it is important to understand what's happening under hood.
---
const cluster = require('cluster'); const http = require('http'); const os = require('os'); // For runing for the first time, // Master worker will get started // Then we can fork our new workers if (cluster.isMaster) { const cpus = os.cpus().length; console.log(`Forking for ${cpus} CPUs`); for (let i = 0; i < cpus; i++) { cluster.fork(); } // In case of crash, we want to strat a new worker cluster.on('exit', (worker, code, signal) => { if (code !== 0 && !worker.exitedAfterDisconnect) { console.log(`Worker ${worker.id} crashed. Starting a new wroker`); cluster.fork(); } }) // kill -SIGUSR2 <MASTER_PID> // In case to upgrade, we want to restart each worker one by one process.on('SIGUSR2', () => { const workers = Object.values(cluster.workers); const restartWorker = (workerIndex) => { const worker = cluster.workers[workerIndex]; if (!worker) return; // On worker exit, we want to restart it, then continue // with next worker worker.on('exit', () => { // If it is because crash, we don't continue if (!worker.exitedAfterDisconnect) return; console.log(`Exited process ${worker.process.pid}`); cluster.fork().on('listening', () => { restartWorker(workerIndex + 1); }); worker.disconnect(); }); } // Calling restartWorker recursively restartWorker(0); }); } else { require('./server'); }
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· SQL Server 2025 AI相关能力初探
· Linux系列:如何用 C#调用 C方法造成内存泄露
· AI与.NET技术实操系列(二):开始使用ML.NET
· 记一次.NET内存居高不下排查解决与启示
· 探究高空视频全景AR技术的实现原理
· 阿里最新开源QwQ-32B,效果媲美deepseek-r1满血版,部署成本又又又降低了!
· Manus重磅发布:全球首款通用AI代理技术深度解析与实战指南
· 开源Multi-agent AI智能体框架aevatar.ai,欢迎大家贡献代码
· 被坑几百块钱后,我竟然真的恢复了删除的微信聊天记录!
· AI技术革命,工作效率10个最佳AI工具
2018-03-09 [GraphQL] Apollo React Query Component
2018-03-09 [HTML5] Handle Offscreen Accessibility
2017-03-09 [Django] The models
2017-03-09 [Postgres] Update and Delete records in Postgres
2017-03-09 [TypeScript] Create a fluent API using TypeScript classes
2016-03-09 [RxJS] Changing Behavior with MapTo
2016-03-09 [RxJS] Displaying Initial Data with StartWith