MongoDB副本集(二)
9 操作
9.1 建立副本集:
[mongo_3 /mongodb]# numactl --interleave=all mongod --replSet sh1 --port 10000 --dbpath=/mongodb/sh1/data \
> --logpath=/mongodb/sh1/logs/sh1.log --logappend --fork
[mongo_3 /mongodb]# mongo --port 10000
> rs.status() --该命令用于监测集群状态
{
"startupStatus" : 3,
"info" : "run rs.initiate(...) if not yet done for the set",
"ok" : 0,
"errmsg" : "can't get local.system.replset config from self or any seed (EMPTYCONFIG)"
}
--提示未进行初始化,观察后台日志。
Tue Jul 16 20:49:42.533 [initandlisten] waiting for connections on port 10000
Tue Jul 16 20:49:42.539 [rsStart] replSet info you may need to run replSetInitiate -- rs.initiate() in the shell -- if that is not already done
Tue Jul 16 20:49:50.623 [initandlisten] connection accepted from 127.0.0.1:48208 #1 (1 connection now open)
Tue Jul 16 20:49:52.540 [rsStart] replSet can't get local.system.replset config from self or any seed (EMPTYCONFIG)
Tue Jul 16 20:50:52.542 [rsStart] replSet can't get local.system.replset config from self or any seed (EMPTYCONFIG)
9.2 进行集群初始化:
> use admin
switched to db admin
> db.runCommand({"replSetInitiate" :
... {
... "_id" : "sh1",
... "members" : [
... {"_id" : 1,"host" : "192.168.69.43:10000",priority:4}
... ]
... }
... }
... )
sh1:PRIMARY> --立刻成为primary
日志如下
Tue Jul 16 21:04:46.397 [conn4] replSet replSetInitiate exception: can't find self in the replset config my port: 10000
Tue Jul 16 21:04:46.397 [conn4] command admin.$cmd command: { replSetInitiate: { _id: "sh1", members: [ { _id: 1.0, host: "192.168.73.192:10000", priority: 4.0 } ] } } ntoreturn:1 keyUpdates:0 locks(micros) W:68 reslen:122 63001ms
Tue Jul 16 21:04:52.580 [rsStart] replSet can't get local.system.replset config from self or any seed (EMPTYCONFIG)
Tue Jul 16 21:04:57.684 [initandlisten] connection accepted from 127.0.0.1:48213 #5 (2 connections now open)
Tue Jul 16 21:05:02.581 [rsStart] replSet can't get local.system.replset config from self or any seed (EMPTYCONFIG)
Tue Jul 16 21:05:07.687 [initandlisten] connection accepted from 127.0.0.1:48214 #6 (3 connections now open)
Tue Jul 16 21:05:12.582 [rsStart] replSet can't get local.system.replset config from self or any seed (EMPTYCONFIG)
Tue Jul 16 21:05:22.582 [rsStart] replSet can't get local.system.replset config from self or any seed (EMPTYCONFIG)
Tue Jul 16 21:05:25.921 [conn6] replSet replSetInitiate admin command received from client
Tue Jul 16 21:05:25.921 [conn6] replSet replSetInitiate config object parses ok, 1 members specified
Tue Jul 16 21:05:25.922 [conn6] replSet replSetInitiate all members seem up
Tue Jul 16 21:05:25.922 [conn6] ******
Tue Jul 16 21:05:25.922 [conn6] creating replication oplog of size: 23441MB... –5%硬盘大小
Tue Jul 16 21:05:26.107 [FileAllocator] done allocating datafile /mongodb/sh1/data/local.11, size: 2047MB, took 0.011 secs
Tue Jul 16 21:05:26.109 [FileAllocator] allocating new datafile /mongodb/sh1/data/local.12, filling with zeroes...
Tue Jul 16 21:05:26.118 [FileAllocator] done allocating datafile /mongodb/sh1/data/local.12, size: 2047MB, took 0.008 secs
Tue Jul 16 21:05:26.118 [conn6] ******
Tue Jul 16 21:05:26.119 [conn6] replSet info saving a newer config version to local.system.replset
Tue Jul 16 21:05:26.119 [conn6] replSet saveConfigLocally done
Tue Jul 16 21:05:26.119 [conn6] replSet replSetInitiate config now saved locally. Should come online in about a minute.
Tue Jul 16 21:05:26.119 [conn6] command admin.$cmd command: { replSetInitiate: { _id: "sh1", members: [ { _id: 1.0, host: "192.168.69.43:10000", priority: 4.0 } ] } } ntoreturn:1 keyUpdates:0 locks(micros) W:197451 reslen:112 198ms
Tue Jul 16 21:05:32.583 [rsStart] replSet I am 192.168.69.43:10000
Tue Jul 16 21:05:32.583 [rsStart] replSet STARTUP2
Tue Jul 16 21:05:33.584 [rsSync] replSet SECONDARY
Tue Jul 16 21:05:33.585 [rsMgr] replSet info electSelf 1
Tue Jul 16 21:05:34.584 [rsMgr] replSet PRIMARY
我们看到,启动后就建立oplog日志,占用23441MB,大约5%硬盘大小。在投票时候,首先投给了自己,随后就变成primary
9.3 启动第二个成员
默认优先级,添加到副本集:
[root@localhost mongodb]# numactl --interleave=all /mongodb/bin/mongod --replSet sh1 \
--port 10000 --dbpath=/mongodb/sh1/data --logpath=/mongodb/sh1/logs/sh1.log --logappend --fork
sh1:PRIMARY> rs.add("192.168.69.44:10000")
{ "ok" : 1 }
sh1:PRIMARY> rs.conf()
{
"_id" : "sh1",
"version" : 2,
"members" : [
{
"_id" : 1,
"host" : "192.168.69.43:10000",
"priority" : 4
},
{
"_id" : 2,
"host" : "192.168.69.44:10000"
}
]
}
日志表明选举过程:
Wed Jul 17 09:48:00.134 [conn8] replSet replSetReconfig config object parses ok, 2 members specified
Wed Jul 17 09:48:00.139 [conn8] replSet replSetReconfig [2]
Wed Jul 17 09:48:00.139 [conn8] replSet info saving a newer config version to local.system.replset
Wed Jul 17 09:48:00.139 [conn8] replSet saveConfigLocally done
Wed Jul 17 09:48:00.139 [conn8] replSet info : additive change to configuration
Wed Jul 17 09:48:00.139 [conn8] replSet replSetReconfig new config saved locally
Wed Jul 17 09:48:00.140 [rsHealthPoll] replSet member 192.168.69.44:10000 is up
Wed Jul 17 09:48:00.140 [rsMgr] replSet total number of votes is even - add arbiter or give one member an extra vote --建议奇数个成员
Wed Jul 17 09:48:04.803 [initandlisten] connection accepted from 192.168.69.44:49397 #9 (2 connections now open)
Wed Jul 17 09:48:04.804 [conn9] end connection 192.168.69.44:49397 (1 connection now open)
Wed Jul 17 09:48:04.805 [initandlisten] connection accepted from 192.168.69.44:49398 #10 (2 connections now open)
Wed Jul 17 09:48:06.143 [rsHealthPoll] replset info 192.168.69.44:10000 thinks that we are down
Wed Jul 17 09:48:06.143 [rsHealthPoll] replSet member 192.168.69.44:10000 is now in state STARTUP2
Wed Jul 17 09:48:21.093 [initandlisten] connection accepted from 192.168.69.44:49399 #11 (3 connections now open)
Wed Jul 17 09:48:21.297 [conn11] end connection 192.168.69.44:49399 (2 connections now open)
Wed Jul 17 09:48:21.816 [initandlisten] connection accepted from 192.168.69.44:49400 #12 (3 connections now open)
Wed Jul 17 09:48:22.151 [rsHealthPoll] replSet member 192.168.69.44:10000 is now in state RECOVERING
Wed Jul 17 09:48:22.300 [initandlisten] connection accepted from 192.168.69.44:49401 #13 (4 connections now open)
Wed Jul 17 09:48:23.305 [slaveTracking] build index local.slaves { _id: 1 }
Wed Jul 17 09:48:23.308 [slaveTracking] build index done. scanned 0 total records. 0.001 secs
Wed Jul 17 09:48:24.152 [rsHealthPoll] replSet member 192.168.69.44:10000 is now in state SECONDARY
Wed Jul 17 09:48:34.822 [conn10] end connection 192.168.69.44:49398 (3 connections now open)
Wed Jul 17 09:48:34.823 [initandlisten] connection accepted from 192.168.69.44:49402 #14 (4 connections now open)
Wed Jul 17 09:49:04.838 [conn14] end connection 192.168.69.44:49402 (3 connections now open)
9.4 kill其中一个
由于无法和两个以上节点通信,所以无法进行选举
Wed Jul 17 09:54:12.672 [conn12] end connection 192.168.69.44:49400 (1 connection now open)
Wed Jul 17 09:54:14.341 [rsHealthPoll] DBClientCursor::init call() failed
Wed Jul 17 09:54:14.341 [rsHealthPoll] replset info 192.168.69.44:10000 heartbeat failed, retrying
Wed Jul 17 09:54:14.342 [rsHealthPoll] replSet info 192.168.69.44:10000 is down (or slow to respond):
Wed Jul 17 09:54:14.342 [rsHealthPoll] replSet member 192.168.69.44:10000 is now in state DOWN
Wed Jul 17 09:54:14.342 [rsMgr] can't see a majority of the set, relinquishing primary
Wed Jul 17 09:54:14.342 [rsMgr] replSet relinquishing primary state
Wed Jul 17 09:54:14.342 [rsMgr] replSet SECONDARY
Wed Jul 17 09:54:14.342 [rsMgr] replSet closing client sockets after relinquishing primary
Wed Jul 17 09:54:15.343 [conn8] end connection 127.0.0.1:48216 (0 connections now open)
Wed Jul 17 09:54:15.346 [initandlisten] connection accepted from 127.0.0.1:48232 #26 (1 connection now open)
Wed Jul 17 09:54:16.343 [rsHealthPoll] replset info 192.168.69.44:10000 heartbeat failed, retrying
Wed Jul 17 09:54:18.345 [rsHealthPoll] replset info 192.168.69.44:10000 heartbeat failed, retrying
Wed Jul 17 09:54:20.346 [rsHealthPoll] replset info 192.168.69.44:10000 heartbeat failed, retrying
Wed Jul 17 09:54:20.346 [rsMgr] replSet can't see a majority, will not try to elect self
Wed Jul 17 09:54:22.347 [rsHealthPoll] replset info 192.168.69.44:10000 heartbeat failed, retrying
sh1:SECONDARY> rs.status()
{
"set" : "sh1",
"date" : ISODate("2013-07-17T01:55:41Z"),
"myState" : 2,
"members" : [
{
"_id" : 1,
"name" : "192.168.69.43:10000",
"health" : 1,
"state" : 2, --状态为1表示primary,2表示secondary,,3表示recovering,8表示not healthy, 10表示removed。
"stateStr" : "SECONDARY",
"uptime" : 47159,
"optime" : {
"t" : 1374025680,
"i" : 1
},
"optimeDate" : ISODate("2013-07-17T01:48:00Z"),
"self" : true
},
{
"_id" : 2,
"name" : "192.168.69.44:10000",
"health" : 0,
"state" : 8,
"stateStr" : "(not reachable/healthy)",
"uptime" : 0,
"optime" : {
"t" : 1374025680,
"i" : 1
},
"optimeDate" : ISODate("2013-07-17T01:48:00Z"),
"lastHeartbeat" : ISODate("2013-07-17T01:55:40Z"),
"lastHeartbeatRecv" : ISODate("1970-01-01T00:00:00Z"),
"pingMs" : 0,
"syncingTo" : "192.168.69.43:10000"
}
],
"ok" : 1
}
9.5 启动第三个节点,加入到副本集
优先级设置为5,重新开始选举,第三个节点接管成为primary。
sh1:PRIMARY> rs.add({_id: 3,host:"192.168.69.45:10000",priority:5})
{ "ok" : 1 }
修改优先级:
现在尝试修改69,44上的优先级,让他成为主节点:
sh1:PRIMARY> rs.conf()
{
"_id" : "sh1",
"version" : 13,
"members" : [
{
"_id" : 2,
"host" : "192.168.69.44:10000"
},
{
"_id" : 3,
"host" : "192.168.69.45:10000",
"priority" : 5
},
{
"_id" : 1,
"host" : "192.168.69.43:10000",
"priority" : 2
},
{
"_id" : 4,
"host" : "192.168.69.46:10000",
"priority" : 7
}
]
}
sh1:PRIMARY> cfg=rs.conf()
sh1:PRIMARY> cfg.members[0].priority = 10
rs.reconfig(cfg)
注意member[]里的数的选择。例如,对于上例,从第一个开始数,members依次为0,1,2,3.不要以_id为准。
这种方式通常会使得primary step down,引发新的选举,而且这通常会耗费10-20s时间,客户端会终端。
节点故障后,如果短时间无法恢复,最好手动用命令rs.remove()把他移除出副本集,否则剩余服务器还会尝试连接。
9.6 添加仲裁节点
numactl --interleave=all /mongodb/bin/mongod --replSet sh1 --port 10001 \
--dbpath=/mongodb/arb/data --logpath=/mongodb/arb/logs/arb.log --logappend --fork
sh1:PRIMARY> rs.addArb("192.168.69.45:10001")
{ "ok" : 1 }
sh1:PRIMARY>
sh1:PRIMARY>
sh1:PRIMARY>
sh1:PRIMARY> rs.conf
function () { return db.getSisterDB("local").system.replset.findOne(); }
sh1:PRIMARY> rs.conf()
{
"_id" : "sh1",
"version" : 24,
"members" : [
{
"_id" : 2,
"host" : "192.168.69.44:10000",
"priority" : 20
},
{
"_id" : 3,
"host" : "192.168.69.45:10000",
"priority" : 5
},
{
"_id" : 4,
"host" : "192.168.69.46:10000",
"priority" : 8
},
{
"_id" : 5,
"host" : "192.168.69.45:10001",
"arbiterOnly" : true
}
]
}