How to perform MongoDB Sharded Cluster Cross-DC Failover

How to perform MongoDB Sharded Cluster Cross-DC Failover

Imagine the following MongoDB architecture, how do you perform a manual failover when the entire DC 1 is unavailable? I couldn’t find any official MongoDB documentation. Thus, I decided to write my own.

Note:

  1. MongoDB version 3.2
  2. Config Server Replica Set
  3. Assumed DC1 total failure and unrecoverable

mongodb-blog-post

High Level Steps:

  1. Stop mongos
  2. Update mongos configuration to point to the new single node config server replica set
  3. Reconfigure the config server to be a single node replica set
  4. Update config database shards collection
  5. Reconfigure the DC2 shards to be single node replica set
  6. Clean up shard caches
  7. Start mongos

Detailed Steps

Stop mongos

systemctl stop mongos

Update mongos configuration file to point to the new single node config server replica set

From:
configDB: rs0/node1:27019,node2:27019,node3:27019

To:
configDB: rs0/node3:27019

Reconfigure the config server to be a single node replica set

cfg = rs.conf()
cfg.members = [cfg.members[]]
rs.reconfig(cfg, {force : true})

Update config database shards collection

mongo localhost:27019
mongo> use config
mongodb> shards.update(
 { _id: "<replSetName" },
 { $set:
 {
 "host": "replSetName/<dc2_shard_hostname:27017"
 }
 }
)

Reconfigure the DC2 shards to be single node replica set

cfg = rs.conf()
cfg.members = [cfg.members[]]
rs.reconfig(cfg, {force : true})

Clean up shards caches on all shard new single node replica set (Updated: This is only applicable for 3.2)

Add the following to the mongodb config file and reboot the mongod process
setParameter:
   recoverShardingState:false
# systemctl restart mongod

Clean up the config database cache

mongo> use admin
mongo> db.system.version.remove(
   { _id: "minOpTimeRecovery" },
   { writeConcern: { w: "majority" } }
)

Remove recoverShardingState: false from mongodb config file and reboot the mongod process

Note: For MongoDB 3.4, they have removed the setParameter recoveryShardingState. So, instead of setting that parameter, just remove –shardsrv. It will allow the shards to behave as if it is not a shard. This will allow you to clean up the config database cache. Also, the documentation on MongoDB website for 3.4 is outdated.(Link)

Start mongos

systemctl start mongos

Reference:

Regards,
Wei Shan

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s