Archive for category MongoDB 3.2

MongoDB 3.x – How to get more details about a collection in MongoDB

MongoDB 3.x – How to get more details about a collection in MongoDB

In MongoDB shell

mongo> db.collection.stats()

{{ "ns" : "db.collection", 
"count" : 1474168, 
"size" : 1628955640, 
"avgObjSize" : 1105, 
"storageSize" : 62074880, 
"capped" : false, 
"wiredTiger" : 
     { "metadata" : { "formatVersion" : 1 }, 
     "creationString" : "access_pattern_hint=none,
          allocation_size=4KB,app_metadata=(formatVersion=1),
          block_allocation=best,block_compressor=zlib,cache_resident=false,
          checksum=on,colgroups=,collator=,columns=,dictionary=0,encryption=(keyid=,name=),
          exclusive=false,extractor=,format=btree,huffman_key=,huffman_value=,
          ignore_in_memory_cache_size=false,immutable=false,internal_item_max=0,
          internal_key_max=0,internal_key_truncate=true,internal_page_max=4KB,
          key_format=q,key_gap=10,leaf_item_max=0,leaf_key_max=0,leaf_page_max=32KB,
          leaf_value_max=64MB,log=(enabled=true),lsm=(auto_throttle=true,bloom=true,
          bloom_bit_count=16,bloom_config=,bloom_hash_count=8,bloom_oldest=false,
          chunk_count_limit=0,chunk_max=5GB,chunk_size=10MB,merge_max=15,merge_min=0),
          memory_page_max=10m,os_cache_dirty_max=0,os_cache_max=0,prefix_compression=false,
          prefix_compression_min=4,source=,split_deepen_min_child=0,split_deepen_per_child=0,
          split_pct=90,type=file,value_format=u",
"type" : "file", 
"uri" : "statistics:table:collection/0-8813564358223606299", 
"nindexes" : 3, 
"totalIndexSize" : 83034112, 
"indexSizes" : { "_id_" : 72835072, "st_1_ih_1_vt_1_ua_1" : 5107712, "ea_1" : 5091328 }, 
"ok" : 1}

From the above, we can gather a few important things (highlighted in bold):

  1. There are 1474168 documents in the collections
  2. On disk, ignoring any filesystem compression, it is about 62074880 bytes
  3. It is using zlib compression
  4. On disk, the datafile will be $dbPath/collection/0-8813564358223606299.wt
  5. There are 3 indexes on this collection

Hope this helps!

Regards,
Wei Shan

Leave a comment

Migrating Oracle Sample Schema to MongoDB

Migrating Oracle Sample Schema to MongoDB

It is not an easy feat to migrate your current application using Oracle or any relational database to MongoDB. You will need to remodel your schema to fit into MongoDB. If you are migrating from a relational database to another relational database, it is a lot easier (Still not that easy though!). In this blog post, I try to migrate the famous Oracle database sample schema to MongoDB. I’m not saying that from an application POV(think ACID!), this is a good thing, I’m illustrating the fact that you can’t just take a schema from a relational database and dump it onto MongoDB.

Let’s take a look at the ERD diagram of sales history schema.

sh schema

The sales history schema above represents the sales history for a company.

The sample company does a high volume of business, so it runs business statistics reports to aid in decision support. Many of these reports are time-based and nonvolatile. That is, they analyse past data trends. The company loads data into its data warehouse regularly to gather statistics for these reports. These reports include annual, quarterly, monthly, and weekly sales figures by product. The company also runs reports on distribution channels through which its sales are delivered. When the company runs special promotions on its products, it analyse the impact of the promotions on sales. It also analyse sales by geographical area.

Based on my understanding above, there’s a few assumptions to be made here:

  1. This is a data warehouse schema, not OLTP workload
  2. The application query distribution channels through which its sales are delivered.
  3. The application query about promotion impact on sales
  4. The application query sales history by geographical area

Rules to follow in MongoDB data modelling

MongoDB understands that people who are new to NoSQL would not understand how to data model efficiently. They have an entire chapter of documentation dedicated to it! Some of the limitations or rules you should be aware of:

MongoDB “Sales History” Schema – Version 1

Based on the MongoDB official documentation in data modelling, the following would be good enough.

SALES

{
 "_id": {
 "PROD": {},
  "PROD_ID": {},
  "PROD_NAME": {},
  "PROD_DESC": {},
  "PROD_SUBCATEGORY": {},
  "PROD_SUBCATEGORY_ID": {},
  "PROD_SUBCATEGORY_DESC": {},
  "PROD_CATEGORY": {},
  "PROD_CATEGORY_ID": {},
  "PROD_CATEGORY_DESC": {},
  "PROD_WEIGHT_CLASS": {},
  "PROD_UNIT_OF_MEASURE": {},
  "PROD_PACK_SIZE": {},
  "PROD_SUPPLIER_ID": {},
  "PROD_STATUS": {},
  "PROD_LIST_PRICE": {},
  "PROD_MIN_PRICE": {},
  "PROD_TOTAL": {},
  "PROD_TOTAL_ID": {},
  "PROD_SRC_ID": {},
  "PROD_EFF_FROM": {},
  "PROD_EFF_TO": {},
  "PROD_VALID": {},
  "PROD_COST": {
  "TIME_DESC": {},
  "UNIT_COST": {},
  "UNIT_PRICE": {}
 }
 },
 "CUST": {
  "CUST_ID": {},
  "CUST_FIRST_NAME": {},
  "CUST_LAST_NAME": {},
  "CUST_GENDER": {},
  "CUST_YEAR_OF_BIRTH": {},
  "CUST_MARITAL_STATUS": {},
  "CUST_STREET_ADDRESS": {},
  "CUST_POSTAL_CODE": {},
  "CUST_CITY": {},
  "CUST_CITY_ID": {},
  "CUST_STATE_PROVINCE": {},
  "CUST_STATE_PROVINCE_ID": {},
  "COUNTRY": {
  "COUNTRY_ID": {},
   "COUNTRY_ISO_CODE": {},
   "COUNTRY_NAME": {},
   "COUNTRY_SUBREGION": {},
   "COUNTRY_SUBREGION_ID": {},
   "COUNTRY_REGION": {},
   "COUNTRY_REGION_ID": {},
   "COUNTRY_TOTAL": {},
   "COUNTRY_TOTAL_ID": {},
   "COUNTRY_NAME_HIST": {}
 },
  "CUST_MAIN_PHONE_NUMBER": {},
  "CUST_INCOME_LEVEL": {},
  "CUST_CREDIT_LIMIT": {},
  "CUST_EMAIL": {},
  "CUST_TOTAL": {},
  "CUST_TOTAL_ID": {},
  "CUST_SRC_ID": {},
  "CUST_EFF_FROM": {},
  "CUST_EFF_TO": {},
  "CUST_VALID": {}
 },
 "TIME": {
  "TIME_ID": {},
  "DAY_NAME": {},
  "DAY_NUMBER_IN_WEEK": {},
  "DAY_NUMBER_IN_MONTH": {},
  "CALENDAR_WEEK_NUMBER": {},
  "FISCAL_WEEK_NUMBER": {},
  "WEEK_ENDING_DAY": {},
  "WEEK_ENDING_DAY_ID": {},
  "CALENDAR_MONTH_NUMBER": {},
  "FISCAL_MONTH_NUMBER": {},
  "CALENDAR_MONTH_DESC": {},
  "CALENDAR_MONTH_ID": {},
  "FISCAL_MONTH_DESC": {},
  "FISCAL_MONTH_ID": {},
  "DAYS_IN_CAL_MONTH": {},
  "DAYS_IN_FIS_MONTH": {},
  "END_OF_CAL_MONTH": {},
  "END_OF_FIS_MONTH": {},
  "CALENDAR_MONTH_NAME": {},
  "FISCAL_MONTH_NAME": {},
  "CALENDAR_QUARTER_DESC": {},
  "CALENBDAR_QUARTER_ID": {},
  "FISCAL_QUARTER_DESC": {},
  "FISCAL_QUARTER_ID": {},
  "DAYS_IN_CAL_QUARTER": {},
  "DAYS_IN_FIS_QUARTER": {},
  "END_OF_CAL_QUARTER": {},
  "END_OF_FIS_QUARTER": {},
  "CALENDAR_QUARTER_NUMBER": {},
  "FISCAL_QUARTER_NUMBER": {},
  "CALENDAR_YEAR": {},
  "CALENDAR_YEAR_ID": {},
  "FISCAL_YEAR": {},
  "FISCAL_YEAR_ID": {},
  "DAYS_IN_CAL_YEAR": {},
  "DAYS_IN_FIS_YEAR": {},
  "END_OF_CAL_YEAR": {},
  "END_OF_FIS_YEAR": {}
 },
 "CHANNEL": {
  "CHANNEL_ID": {},
  "CHANNEL_DESC": {},
  "CHANNEL_CLASS": {},
  "CHANNEL_CLASS_ID": {},
  "CHANNEL_TOTAL": {},
  "CHANNEL_TOTAL_ID": {}
 },
 "PROMO": {
  "PROMO_ID": {},
  "PROMO_NAME": {},
  "PROMO_SUBCATEGORY": {},
  "PROMO_SUBCATEGORY_ID": {},
  "PROMO_CATEGORY": {},
  "PROMO_CATEGORY_ID": {},
  "PROMO_COST": {},
  "PROMO_BEGIN_DATE": {},
  "PROMO_END_DATE": {},
  "PROMO_TOTAL": {},
  "PROMO_TOTAL_ID": {}
 },
 "QUANTITY_SOLD": {},
 "AMOUNT_SOLD": {}
}

However, this design pattern will result into 1 big massive JSON document that is slow. It is also more likely to encounter the MongoDB limitations listed earlier. What is a better approach then?

MongoDB “Sales History” Schema – Version 2

Let’s go with the easy bits first.

CUSTOMER

{
 "_id": {},
 "CUST_FIRST_NAME": {},
 "CUST_LAST_NAME": {},
 "CUST_GENDER": {},
 "CUST_YEAR_OF_BIRTH": {},
 "CUST_MARITAL_STATUS": {},
 "CUST_STREET_ADDRESS": {},
 "CUST_POSTAL_CODE": {},
 "CUST_CITY": {},
 "CUST_STATE_PROVINCE": {},
 "COUNTRY": {
  "COUNTRY_NAME": {},
  "COUNTRY_ISO_CODE": {},
  "COUNTRY_SUBREGION": {},
  "COUNTRY_REGION": {},
  "COUNTRY_TOTAL": {},
  "COUNTRY_NAME_HIST": {}
 },
 "CUST_MAIN_PHONE_NUMBER": {},
 "CUST_INCOME_LEVEL": {},
 "CUST_CREDIT_LIMIT": {},
 "CUST_EMAIL": {},
 "CUST_TOTAL": {},
 "CUST_SRC_ID": {},
 "CUST_EFF_FROM": {},
 "CUST_EFF_TO": {},
 "CUST_VALID": {}
}

Simply follow this one to one relationship design pattern for the CUSTOMER collection.

PRODUCT

{
 "_id": {},
 "PROD_NAME": {},
 "PROD_DESC": {},
 "PROD_SUBCATEGORY": {
  "PROD_SUBCATEGORY_ID": {},
  "PROD_SUBCATEGORY_DESC": {}
 },
 "PROD_CATEGORY": {
  "PROD_CATEGORY_ID": {},
  "PROD_CATEGORY_DESC": {}
 },
 "PROD_WEIGHT_CLASS": {},
 "PROD_UNIT_OF_MEASURE": {},
 "PROD_PACK_SIZE": {},
 "PROD_SUPPLIER_ID": {},
 "PROD_STATUS": {},
 "PROD_LIST_PRICE": {},
 "PROD_MIN_PRICE": {},
 "PROD_TOTAL": {},
 "PROD_TOTAL_ID": {},
 "PROD_SRC_ID": {},
 "PROD_EFF_FROM": {},
 "PROD_EFF_TO": {},
 "PROD_VALID": {},
 "COST": {
  "TIME_ID": {},
  "PROMO_ID": {},
  "CHANNEL_ID": {},
  "UNIT_COST": {},
  "UNIT_PRICE": {}
 }
}

PROMOTION

{
 "_id": {},
 "PROMO_NAME": {},
 "PROMO_SUBCATEGORY": {},
 "PROMO_CATEGORY": {},
 "PROMO_COST": {},
 "PROMO_BEGIN_DATE": {},
 "PROMO_END_DATE": {},
 "PROMO_TOTAL": {}
}

I have chose to combine PRODUCTS and COSTS into a single PRODUCT collection. You may find this design pattern over here. I did not choose to have the PROMOTION in the same collection. Instead, I decided to use references. The reason is because a product might have hundreds or thousands of promotion, an embedded document here would not be scalable for the PRODUCT collection.

CHANNELS

{
 "_id": {},
 "CHANNEL_DESC": {},
 "CHANNEL_CLASS": {},
 "CHANNEL_CLASS_ID": {},
 "CHANNEL_TOTAL": {},
 "CHANNEL_TOTAL_ID": {}
}

TIMES

{
 "_id": {},
 "DAY_NAME": {},
 "DAY_NUMBER_IN_WEEK": {},
 "DAY_NUMBER_IN_MONTH": {},
 "CALENDAR_WEEK_NUMBER": {},
 "FISCAL_WEEK_NUMBER": {},
 "WEEK_ENDING_DAY": {},
 "WEEK_ENDING_DAY_ID": {},
 "CALENDAR_MONTH_NUMBER": {},
 "FISCAL_MONTH_NUMBER": {},
 "CALENDAR_MONTH_DESC": {},
 "CALENDAR_MONTH_ID": {},
 "FISCAL_MONTH_DESC": {},
 "FISCAL_MONTH_ID": {},
 "DAYS_IN_CAL_MONTH": {},
 "DAYS_IN_FIS_MONTH": {},
 "END_OF_CAL_MONTH": {},
 "END_OF_FIS_MONTH": {},
 "CALENDAR_MONTH_NAME": {},
 "FISCAL_MONTH_NAME": {},
 "CALENDAR_QUARTER_DESC": {},
 "CALENDAR_QUARTER_ID": {},
 "FISCAL_QUARTER_DESC": {},
 "FISCAL_QUARTER_ID": {},
 "DAYS_IN_CAL_QUARTER": {},
 "DAYS_IN_FIS_QUARTER": {},
 "END_OF_CAL_QUARTER": {},
 "END_OF_FIS_QUARTER": {},
 "CALENDAR_QUARTER_NUMBER": {},
 "FISCAL_QUARTER_NUMBER": {},
 "CALENDAR_YEAR": {},
 "CALENDAR_YEAR_ID": {},
 "FISCAL_YEAR": {},
 "FISCAL_YEAR_ID": {},
 "DAYS_IN_CAL_YEAR": {},
 "DAYS_IN_FIS_YEAR": {},
 "END_OF_CAL_YEAR": {},
 "END_OF_FIS_YEAR": {}
}

Honestly, I’m not too happy to keep TIMES and CHANNEL collection separate. However, if I try to embed them in other collection (COST/SALES), I would not be able to create a new TIMES or CHANNEL document without having a COST or SALES for it.

SALES

{
 "PROD_ID": {},
 "CUST_ID": {},
 "TIME_ID": {},
 "CHANNEL_ID": {},
 "PROMO_ID": {},
 "QUANTITY_SOLD": {},
 "AMOUNT_SOLD": {}
}

I would shard the SALES collection on TIME_ID, CHANNEL_ID and PROMO_ID. Based on the assumptions previously, I would assume that majority of the query will filter by TIME_ID, PROMO_ID and CHANNEL_ID.

Summary

On the example above, I assumed that most of the queries will be revolving around SALES collection. On various collections above, I tried to guess the collection growth size and decide if I should embed the collection or use ID reference instead. Obviously, if my assumptions are wrong, the queries would be sub-optimal.

As you might have noticed, just because MongoDB doesn’t have a schema restriction, it doesn’t mean that you don’t need data modelling. In fact, it’s a paradigm shift for schema design between relational database and MongoDB. In MongoDB, you need to design your schema based on the type of questions that your application will be asking.

Now, if you are SQL developer, you will notice that there will be a need to join some of the collections together for BI reports. If so, you will need to do that at the application layer. If you are looking for a single left outer join, use $lookup instead.

At the end of the day, MongoDB is not suitable for all application! Yes, you can now use $Graphlook for finding relationships between objects. But you wouldn’t want to based your brand new Uber-For-Facebook(the friend suggestion feed) on MongoDB. It is clearly not designed to do that. In the era of NoSQL, use the right database for the correct problem.

References:

P.S: Do let you know if you have any other ideas on modelling the sample schema. I would love to hear your thoughts on this!

Regards,
Wei Shan

Leave a comment

Compound Index Vs Multiple Single Field Index in MongoDB 3.x – Part II

Compound Index Vs Multiple Single Field Index in MongoDB 3.x – Part II

After the first blog post was published, it was suggested to me that I should run the test with w:1 j:true. It make sense since no one would or should be running workloads with w:1, j:false.

By definition:

With j: true, MongoDB returns only after the requested number of members, including the primary, have written to the journal. Previously j: true write concern in a replica set only requires the primary to write to the journal, regardless of the w: <value> write concern.

If you are using MMapv1, then you should be using j:true else you risk data loss. If you are using WiredTiger, you will only lose at most 60 seconds of data between the checkpoints.

Insert Performance

No Index

After 295 seconds, 5754368 new records inserted - collection has 5754368 in total
20017 inserts per second since last report 99.62 % in under 50 milliseconds
0 keyqueries per second since last report 100.00 % in under 50 milliseconds
0 updates per second since last report 100.00 % in under 50 milliseconds
0 rangequeries per second since last report 100.00 % in under 50 milliseconds

~19,763.2 inserts/second

Compound Index

After 295 seconds, 4497408 new records inserted - collection has 4497408 in total
14131 inserts per second since last report 98.63 % in under 50 milliseconds
0 keyqueries per second since last report 100.00 % in under 50 milliseconds
0 updates per second since last report 100.00 % in under 50 milliseconds
0 rangequeries per second since last report 100.00 % in under 50 milliseconds

~15,245.4 inserts/second (22% performance reduction)

Single-Field Index

After 295 seconds, 2438144 new records inserted - collection has 2438144 in total
6758 inserts per second since last report 35.55 % in under 50 milliseconds
0 keyqueries per second since last report 100.00 % in under 50 milliseconds
0 updates per second since last report 100.00 % in under 50 milliseconds
0 rangequeries per second since last report 100.00 % in under 50 milliseconds

~8,264.8 inserts/second (58% performance reduction)

Summary

The drop in performance ratio has reduced from when j:false. Previously it was about 40% performance drop with compound index and 78% performance drop with single-field indexes. Turns out, compound index isn’t too bad actually, taking into consideration that I created a 9 field compound index on the collection.

Regards,
Wei Shan

Leave a comment

Compound Index Vs Multiple Single Field Index in MongoDB 3.x

Compound Index Vs Multiple Single Field Index in MongoDB 3.x

MongoDB supports both compound index and single field index. It’s a common knowledge that index decrease your INSERTS/UPDATES performance but increases your SELECT performance. However, if you need to index multiple fields, is it better to use a compound index or just create multiple single field indexes? This post tries to investigate which is the better option in terms of performance.

Insert Performance

Test Setup:

  • Single MongoDB 3.2 instance running on my Mac
  • POC Driver uses single thread to perform inserts only
  • Test run for 5 mins
  • Empty collection after every run
  • Default readConcern
  • Default writeConcern

Sample Document:

> db.testCol.findOne()
{
 "_id" : {
 "w" : 2,
 "i" : 0
 },
 "fld0" : NumberLong(530752),
 "fld1" : ISODate("2015-06-02T08:14:18.687Z"),
 "fld2" : "tempor invidunt ut labore et",
 "fld3" : "eirmod tempor invidunt ut",
 "fld4" : "Lorem ipsum dolor sit amet,",
 "fld5" : ISODate("2016-10-12T01:58:24.580Z"),
 "fld6" : NumberLong(1712708),
 "fld7" : "sit amet. Lorem ipsum dolor",
 "fld8" : "sit amet, consetetur",
 "fld9" : NumberLong(652363),
 "bin" : BinData(0,"")
}

Phase 1 (No indexes except on _id field)

After 295 seconds, 13340672 new records inserted - collection has 13340672 in total
46498 inserts per second since last report 99.97 % in under 50 milliseconds
0 keyqueries per second since last report 100.00 % in under 50 milliseconds
0 updates per second since last report 100.00 % in under 50 milliseconds
0 rangequeries per second since last report 100.00 % in under 50 milliseconds

~45,222 inserts/second

Phase 2 (Compound index on fld1 to fld9)

After 295 seconds, 7739904 new records inserted - collection has 7739904 in total
20830 inserts per second since last report 99.72 % in under 50 milliseconds
0 keyqueries per second since last report 100.00 % in under 50 milliseconds
0 updates per second since last report 100.00 % in under 50 milliseconds
0 rangequeries per second since last report 100.00 % in under 50 milliseconds

~26,236 inserts/second

Phase 3 (Single field indexes on each fld2, fld6 and fld8)

After 295 seconds, 2998272 new records inserted - collection has 2998272 in total
7012 inserts per second since last report 63.61 % in under 50 milliseconds
0 keyqueries per second since last report 100.00 % in under 50 milliseconds
0 updates per second since last report 100.00 % in under 50 milliseconds
0 rangequeries per second since last report 100.00 % in under 50 milliseconds

~10,163 inserts/second

Storage Utilisation

On the same collection with 2,817,024 documents:

Compound Index

db.testCol.stats().indexSizes
{
 "_id_" : 75079680,
 "fld0_1_fld1_1_fld2_1_fld3_1_fld4_1_fld5_1_fld6_1_fl7_1_fld8_1_fld9_1" : 2617061376
}

~2.44GB

Single-Field Index

> db.testCol.stats().indexSizes
{
 "_id_" : 75079680,
 "fld0_1" : 65343488,
 "fld1_1" : 79781888,
 "fld2_1" : 35471360,
 "fld3_1" : 35475456,
 "fld4_1" : 35475456,
 "fld5_1" : 79781888,
 "fld6_1" : 65339392,
 "fld7_1" : 35475456,
 "fld8_1" : 35475456,
 "fld9_1" : 65347584,
}

~0.53GB

Index Selectivity Usage

For single-field index, there’s no limitation at all. MongoDB optimiser will pick and choose whichever index it deems fit. For compound index, it’s slightly different.

Query like these will not use the compound index:

  • db.testCol.find({fld9:1,fld8:1,fld7:1})
  • db.testCol.find({fld9:1,fld3:1,fld6:1})
  • db.testCol.find({fld9:1,fld8:1,fld7:1})

Query like these will use the compound index:

  • db.testCol.find({fld0:1,fld1:1})
  • db.testCol.find({fld5:1,fld8:1,fld0:1}) (not very efficient..)
  • db.testCol.find({fld0:1,fld3:1,fld5:1})

The general idea is that MongoDB will use it the compound index if it matches the index prefix. If it doesn’t match it correctly, it might still use the index if it is still more efficient that performing a full collection scan. The only way to find out is educate and test the queries yourself!

Summary

A very short summary in key points below:

  1. No index has the best insert performance (duh!!)
  2. Compound index is faster than Single-field index in insert performance
  3. Compound index uses more storage than Single-field index

Depending on the situation, choose your poison carefully 🙂

References:

P.S: The above test does not attempt to quantify the absolute performance numbers but rather provide a good performance degradation ratio for indexes

Regards,
Wei Shan

Leave a comment

How to perform MongoDB Sharded Cluster Cross-DC Failover

How to perform MongoDB Sharded Cluster Cross-DC Failover

Imagine the following MongoDB architecture, how do you perform a manual failover when the entire DC 1 is unavailable? I couldn’t find any official MongoDB documentation. Thus, I decided to write my own.

Note:

  1. MongoDB version 3.2
  2. Config Server Replica Set
  3. Assumed DC1 total failure and unrecoverable

mongodb-blog-post

High Level Steps:

  1. Stop mongos
  2. Update mongos configuration to point to the new single node config server replica set
  3. Reconfigure the config server to be a single node replica set
  4. Update config database shards collection
  5. Reconfigure the DC2 shards to be single node replica set
  6. Clean up shard caches
  7. Start mongos

Detailed Steps

Stop mongos

systemctl stop mongos

Update mongos configuration file to point to the new single node config server replica set

From:
configDB: rs0/node1:27019,node2:27019,node3:27019

To:
configDB: rs0/node3:27019

Reconfigure the config server to be a single node replica set

cfg = rs.conf()
cfg.members = [cfg.members[]]
rs.reconfig(cfg, {force : true})

Update config database shards collection

mongo localhost:27019
mongo> use config
mongodb> shards.update(
 { _id: "<replSetName" },
 { $set:
 {
 "host": "replSetName/<dc2_shard_hostname:27017"
 }
 }
)

Reconfigure the DC2 shards to be single node replica set

cfg = rs.conf()
cfg.members = [cfg.members[]]
rs.reconfig(cfg, {force : true})

Clean up shards caches (On all shard new single node replica set)

Add the following to the mongodb config file and reboot the mongod process
setParameter:
   recoverShardingState:false
# systemctl restart mongod

Clean up the config database cache

mongo> use admin
mongo> db.system.version.remove(
   { _id: "minOpTimeRecovery" },
   { writeConcern: { w: "majority" } }
)

Remove recoverShardingState: false from mongodb config file and reboot the mongod process

Start mongos

systemctl start mongos

Reference:

Regards,
Wei Shan

, ,

Leave a comment

Tips on using MongoDB Shell

Tips on using MongoDB Shell

I’m currently taking the MongoDB online course, M202 right now. It’s an advance course that covers sharded cluster, performance tuning and others useful topics. Also, I have learnt a couple of tricks about mongo shell from the course so far.

show profile

On a database that has profiling enabled, the show profile helper in the mongo shell displays the 5 most recent operations that took at least 1 millisecond to execute. Issue show profile from the mongo shell, as follows:

show profile

This is particularly useful when you don’t want to go through the entire collection. MongoDB query syntax isn’t the most intuitive language in the world and it helps when you can just do a quick analysis of the slow queries. If you need something more detailed, try this.

show log

There are many ways to view MongoDB log files:

  1. cat mongod.log
  2. via Kibana
  3. mongo shell, show log

I find that using #3 is handy, especially during a severity 1 issue. It shows all the logs that is currently in memory. This also means it will only show relevant logs 🙂

Hope this helps!

Regards,
Wei Shan

Leave a comment

MongoDB 3.x – Maximum Oplog Size

MongoDB 3.x – Maximum Oplog Size

In the official MongoDB documentation, it is said that the maximum size for oplog is 50GB. With 5 years of Oracle database experience, I developed the bad habit of trusting official documentation entirely. Following the procedure to resize the oplog on my laptop, I was able to resize it to 60GB instead.

To check the current oplog size;

my_rs_1:PRIMARY> rs.printReplicationInfo() 

configured oplog size:   61440MB
log length start to end: 1568secs (0.44hrs)
oplog first event time:  Sun Jan 29 2017 17:55:26 GMT+0000 (GMT)
oplog last event time:   Sun Jan 29 2017 18:21:34 GMT+0000 (GMT)
now:                     Sun Jan 29 2017 18:25:29 GMT+0000 (GMT)

I have resized it to 60GB instead! 🙂

Working with MongoDB for the last 3 months has made me realised a lot of things that Oracle DBA has taken for granted. Things that is supposed to work the way it is documented seems to be a bonus instead of being a requirement.

P.S: On the hindsight, I believe the 50GB upper bound only applies for the default 5% and not the maximum configurable size.

Hope this helps!

Regards,
Wei Shan

2 Comments

MongoDB 3.2 Sharded Cluster – rs.reconfig() does not work

MongoDB 3.2 Sharded Cluster – rs.reconfig() does not work

Following this document to reconfigure a sharded replica set does not work. The replica set seems to be confused. rs.status() show the correct output but db.serverStatus() show otherwise.

rs:PRIMARY> rs.status()
{
"set" : "rs",
"date" : ISODate("2017-01-18T03:05:06.202Z"),
"myState" : 1,
"term" : NumberLong(20),
"configsvr" : true,
"heartbeatIntervalMillis" : NumberLong(2000),
"members" : [
{
"_id" : 1,
"name" : "node1:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 439,
"optime" : {
"ts" : Timestamp(1484708697, 1),
"t" : NumberLong(20)
},
"optimeDate" : ISODate("2017-01-18T03:04:57Z"),
"infoMessage" : "could not find member to sync from",
"electionTime" : Timestamp(1484708696, 1),
"electionDate" : ISODate("2017-01-18T03:04:56Z"),
"configVersion" : 83051,
"self" : true
}
],
"ok" : 1
}
rs:PRIMARY> db.serverStatus()
{
"host" : "node1:27017",
"process" : "mongod",
"repl" : {
"hosts" : [
"node1:27017"
],
"setName" : "rs",
"setVersion" : 83051,
"ismaster" : false,
"secondary" : true,
"primary" : "node1:27017",
"me" : "node1:27017",
"electionId" : ObjectId("7fffffff0000000000000014"),
"rbid" : 398723451

Error log on mongos

2017-01-17T21:38:40.143Z W NETWORK [ReplicaSetMonitorWatcher] No primary detected for set rs0
2017-01-17T21:38:40.144Z W NETWORK [ReplicaSetMonitorWatcher] No primary detected for set rs1
2017-01-17T21:38:50.440Z W NETWORK [ReplicaSetMonitorWatcher] No primary detected for set rs0
2017-01-17T21:39:00.762Z W NETWORK [ReplicaSetMonitorWatcher] No primary detected for set rs1
2016-12-20T21:18:27.060Z W SHARDING [replSetDistLockPinger] pinging failed for distributed lock pinger :: caused by :: FailedToSatisfyReadPreference: could not find host matching read preference { mode: "primary" } for set rs0(config)
2016-12-20T21:18:29.187Z W NETWORK [mongosMain] Failed to connect to , reason: errno:65 No route to host
2016-12-20T21:18:29.188Z W NETWORK [mongosMain] Failed to connect to , reason: errno:65 No route to host
2016-12-20T21:19:44.217Z W NETWORK [mongosMain] Failed to connect to after 5000 milliseconds, giving up.
2016-12-20T21:19:44.218Z W NETWORK [mongosMain] No primary detected for set rs0(config)
2016-12-20T21:19:44.218Z I SHARDING [mongosMain] Error initializing sharding state, sleeping for 2 seconds and trying again :: caused by :: FailedToSatisfyReadPreference: could not get updated shard list from config server due to could not find host matching read preference { mode: "nearest" } for set rs0(config)

Error log on the single node replica set mongod process

2017-01-17T23:50:36.601Z I NETWORK [HostnameCanonicalizationWorker] Starting hostname canonicalization worker 
2017-01-17T23:50:36.602Z I NETWORK [initandlisten] waiting for connections on port 27017 ssl 
2017-01-17T23:50:36.602Z I REPL [ReplicationExecutor] New replica set config in use: {...} 
2017-01-17T23:50:36.603Z I REPL [ReplicationExecutor] This node is in the config 
2017-01-17T23:50:36.603Z I REPL [ReplicationExecutor] transition to STARTUP2 
2017-01-17T23:50:36.603Z I REPL [ReplicationExecutor] Starting replication applier threads 
2017-01-17T23:50:36.604Z I REPL [ReplicationExecutor] transition to RECOVERING 
2017-01-17T23:50:36.608Z I REPL [ReplicationExecutor] transition to SECONDARY 
2017-01-17T23:50:36.608Z I REPL [ReplicationExecutor] conducting a dry run election to see if we could be elected 
2017-01-17T23:50:36.608Z I REPL [ReplicationExecutor] dry election run succeeded, running for election 
2017-01-17T23:50:36.608Z I REPL [ReplicationExecutor] election succeeded, assuming primary role in term 38 
2017-01-17T23:50:36.608Z I REPL [ReplicationExecutor] transition to PRIMARY 
2017-01-17T23:50:37.668Z I SHARDING [rsSync] Sharding state recovery process found document { _id: "minOpTimeRecovery", configsvrConnectionString: "rs0/", shardName: "rs0", minOpTime: { ts: Timestamp 1481206005000|6, t: 15 }, minOpTimeUpdaters: 0 } 
2017-01-17T23:50:37.669Z I SHARDING [ShardingState initialization] first cluster operation detected, adding sharding hook to enable versioning and authentication to remote servers 2017-01-17T23:50:37.690Z I SHARDING [ShardingState initialization] Updating config server connection string to: rs0/ 
2017-01-17T23:50:37.690Z I NETWORK [ShardingState initialization] Starting new replica set monitor for rs0/ 
2017-01-17T23:50:37.690Z I NETWORK [ReplicaSetMonitorWatcher] starting 
2017-01-17T23:50:37.703Z I SHARDING [thread1] creating distributed lock ping thread for process rs0/ (sleeping for 30000ms) 
2017-01-17T23:51:32.666Z I ACCESS [conn2] Successfully authenticated as principal user on admin 
2017-01-17T23:51:52.712Z W NETWORK [replSetDistLockPinger] Failed to connect to after 5000 milliseconds, giving up. 
2017-01-17T23:51:52.712Z W NETWORK [ShardingState initialization] Failed to connect to after 5000 milliseconds, giving up. 
2017-01-17T23:51:52.712Z W NETWORK [ShardingState initialization] No primary detected for set rs0(config) 2017-01-17T23:51:52.712Z W SHARDING [replSetDistLockPinger] pinging failed for distributed lock pinger :: caused by :: FailedToSatisfyReadPreference: could not find host matching read preference { mode: "primary" } for set rs0(config) 
2017-01-17T23:51:52.712Z I SHARDING [ShardingState initialization] Error initializing sharding state, sleeping for 2 seconds and trying again :: caused by :: FailedToSatisfyReadPreference: could not get updated shard list from config server due to could not find host matching read preference { mode: "nearest" } for rs0(config)

Root Cause:

Because the config database information isn’t updated in the shard members. We need to remove the cache data from  db.system.version in step #6 of this document.

Hope this helps.

Regards,
Wei Shan

Leave a comment