首页  :: 新随笔  :: 管理

问题描述

MongoDB Cluster测试环境部署完后无异常,谁知过了几天报如下的错误:

mongos> sh.status()
--- Sharding Status --- 
  sharding version: {
  	"_id" : 1,
  	"minCompatibleVersion" : 5,
  	"currentVersion" : 6,
  	"clusterId" : ObjectId("61e13befe1b0433c21305391")
  }
  shards:
        {  "_id" : "rs_shardsvr0",  "host" : "rs_shardsvr0/10.150.57.13:37031,10.150.57.13:37032,10.150.57.13:37033",  "state" : 1,  "topologyTime" : Timestamp(1642151681, 2) }
        {  "_id" : "rs_shardsvr1",  "host" : "rs_shardsvr1/10.150.57.13:37041,10.150.57.13:37042,10.150.57.13:37043",  "state" : 1,  "topologyTime" : Timestamp(1642151714, 1) }
        {  "_id" : "rs_shardsvr2",  "host" : "rs_shardsvr2/10.150.57.13:37051,10.150.57.13:37052,10.150.57.13:37053",  "state" : 1,  "topologyTime" : Timestamp(1642151722, 2) }
  active mongoses:
        "5.0.5" : 3
  autosplit:
        Currently enabled: yes
  balancer:
        Currently enabled: yes
        Currently running: no
        Failed balancer rounds in last 5 attempts: 5
        Last reported error: Could not find host matching read preference { mode: "primary" } for set rs_shardsvr2
        Time of reported error: Fri Feb 25 2022 22:00:25 GMT+0800 (CST)
        Migration results for the last 24 hours: 
                No recent migrations
  databases:
        {  "_id" : "config",  "primary" : "config",  "partitioned" : true }
                config.system.sessions
                        shard key: { "_id" : 1 }
                        unique: false
                        balancing: true
                        chunks:
                                rs_shardsvr0	342
                                rs_shardsvr1	341
                                rs_shardsvr2	341
                        too many chunks to print, use verbose if you want to force print

 

截取主要报错信息:

 Last reported error: Could not find host matching read preference { mode: "primary" } for set rs_shardsvr2
        Time of reported error: Fri Feb 25 2022 22:00:25 GMT+0800 (CST)

注释:

       从字面意思上看是连接不到rs_shardsrv2复制集的PRIMARY节点。

       我的MongoDB Replica Set是三个DB节点,允许一个DB节点故障,有可能是两个DB节点网络异常导致无法投票选择新PRIMARY节点。

排查思路

1. 查看网络、keyfile是否异常。

2. 排查rs_shardsvr2复制集是否异常:rs.status()。

3. 创建新hash集合验证数据是否能平均分布到各个分片节点上:

1. 创建damocles库
> use damocles

2.对damocles库启用分片
> sh.enableSharding("damocles")

3.对damocles.order表_id字段进行哈希分片
> sh.shardCollection("damocles.order", {"_id": "hashed" })

4.插入10000条测试数据
> use damocles
> for (i = 1; i <= 10000; i=i+1){db.order.insert({'price': 1})}

5.分别到每个分片上验证数据
> rs_shardsvr0:PRIMARY> db.order.find().count()
3315

> rs_shardsvr1:PRIMARY> db.order.find().count()
3318

> rs_shardsvr2:PRIMARY> db.order.find().count()
3367

 

我的环境没有异常,当前集群正常。