[Node.js] Availability and Zero-downtime Restarts

It might be possible for our node server has some downtime, no matter it is because server update or simply some crashs in the code. We want to minizie the downtime as much as possible.

 

1. In case of cluster worker crash, we want master worker fork a new worker:

复制代码
const http = require('http');
const cluster = require('cluster');
const os = require('os');

if (cluster.isMaster) {
    const cpus = os.cpus().length;

    console.log(`Forking for ${cpus} CPUs`);
    for (let i = 0; i < cpus; i++) {
        cluster.fork();
    }

    cluster.on('exit', (worker, code, signal) => {
        if (code !== 0 && !worker.exitedAfterDisconnect) {
            console.log(`Worker ${worker.id} crashed. Starting a new wroker`);
            cluster.fork();
        }
    })
} else {
    require('./server');
}
复制代码

It is important to check 'worker.exitedAfterDisconnect' to see whether is is because crash or because we want to exit one worker.

 

2. In case of upgrade, we want to restart each worker one by one, to make zero downtime:

复制代码
    // kill -SIGUSR2 <MASTER_PID>
    // In case to upgrade, we want to restart each worker one by one
    process.on('SIGUSR2', () => {
        const workers = Object.values(cluster.workers);
        const restartWorker = (workerIndex) => {
            const worker = cluster.workers[workerIndex];
            if (!worker) return;

            // On worker exit, we want to restart it, then continue 
            // with next worker
            worker.on('exit', () => {
                // If it is because crash, we don't continue
                if (!worker.exitedAfterDisconnect) return;
                console.log(`Exited process ${worker.process.pid}`);
                cluster.fork().on('listening', () => {
                    restartWorker(workerIndex + 1);
                });

                worker.disconnect();
            });
        }
        // Calling restartWorker recursively
        restartWorker(0);
    });
复制代码

 

In really production, we don't actually need to code cluster by ourselve, we can use PM2 package. but it is important to understand what's happening under hood.

 

---

 

复制代码
const cluster = require('cluster');
const http = require('http');
const os = require('os');

// For runing for the first time,
// Master worker will get started
// Then we can fork our new workers
if (cluster.isMaster) {
    const cpus = os.cpus().length;

    console.log(`Forking for ${cpus} CPUs`);
    for (let i = 0; i < cpus; i++) {
        cluster.fork();
    }

    // In case of crash, we want to strat a new worker
    cluster.on('exit', (worker, code, signal) => {
        if (code !== 0 && !worker.exitedAfterDisconnect) {
            console.log(`Worker ${worker.id} crashed. Starting a new wroker`);
            cluster.fork();
        }
    })

    // kill -SIGUSR2 <MASTER_PID>
    // In case to upgrade, we want to restart each worker one by one
    process.on('SIGUSR2', () => {
        const workers = Object.values(cluster.workers);
        const restartWorker = (workerIndex) => {
            const worker = cluster.workers[workerIndex];
            if (!worker) return;

            // On worker exit, we want to restart it, then continue 
            // with next worker
            worker.on('exit', () => {
                // If it is because crash, we don't continue
                if (!worker.exitedAfterDisconnect) return;
                console.log(`Exited process ${worker.process.pid}`);
                cluster.fork().on('listening', () => {
                    restartWorker(workerIndex + 1);
                });

                worker.disconnect();
            });
        }
        // Calling restartWorker recursively
        restartWorker(0);
    });
} else {
    require('./server');
}
复制代码

 

posted @   Zhentiw  阅读(373)  评论(0编辑  收藏  举报
编辑推荐:
· SQL Server 2025 AI相关能力初探
· Linux系列:如何用 C#调用 C方法造成内存泄露
· AI与.NET技术实操系列(二):开始使用ML.NET
· 记一次.NET内存居高不下排查解决与启示
· 探究高空视频全景AR技术的实现原理
阅读排行:
· 阿里最新开源QwQ-32B,效果媲美deepseek-r1满血版,部署成本又又又降低了!
· Manus重磅发布:全球首款通用AI代理技术深度解析与实战指南
· 开源Multi-agent AI智能体框架aevatar.ai,欢迎大家贡献代码
· 被坑几百块钱后,我竟然真的恢复了删除的微信聊天记录!
· AI技术革命,工作效率10个最佳AI工具
历史上的今天:
2018-03-09 [GraphQL] Apollo React Query Component
2018-03-09 [HTML5] Handle Offscreen Accessibility
2017-03-09 [Django] The models
2017-03-09 [Postgres] Update and Delete records in Postgres
2017-03-09 [TypeScript] Create a fluent API using TypeScript classes
2016-03-09 [RxJS] Changing Behavior with MapTo
2016-03-09 [RxJS] Displaying Initial Data with StartWith
点击右上角即可分享
微信分享提示