Decreasing downtime during manual failover in MongoDB 3.2 and later

Decreasing downtime during manual failover in MongoDB 3.2 and later

In many situations, you will need to perform manual failover on MongoDB to minimize the downtime window. It could be version upgrades, upgrading of keyfile authentication to x509 or just changing static variables in configuration. In the default scenario, a manual failover would encompass a 10s to 15s downtime when you trigger rs.stepDown() on the current master. This might not be ideal in a high workload situation.

There is a setting called “electionTimeoutMillis” in the MongoDB configuration file. It is set to 10s by default. You can change the value dynamically.

"settings" : {
 "chainingAllowed" : true,
 "heartbeatIntervalMillis" : 2000,
 "heartbeatTimeoutSecs" : 10,
 "electionTimeoutMillis" : 30000,
 "catchUpTimeoutMillis" : 2000,
 "getLastErrorModes" : {
}

Here is how can you change the value right before you perform your failover.

replset:PRIMARY> cfg = rs.conf()
replset:PRIMARY> cfg.settings.electionTimeoutMillis=1000
replset:PRIMARY> rs.reconfig(cfg)

 

With a electionTimeoutMillis of 1s, the manual failover will happen in about 1-3s. Do remember to change the value back for false elections. Obviously, you will still get dropped request. But with MongoDB 3.6 retry logic, your client should not be affected at all. Of course, if you can’t wait till 3.6, just implement your own retry logic!

Regards,
Wei Shan

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s