MongoDB 3.2 – Duplicating a collection with or without transformation

MongoDB 3.2 – Duplicating a collection with or without transformation

If you have a requirement to duplicate a collection, there’s multiple ways to do it. You could do an mongoexport/mongoimport, db.collection.copyTo, write a javascript loop function or simply, use aggregation to do it. The 2 method lacks option to perform any transformation. The last 2 option allows you to choose whether you want to perform any transformation.

Consider a books collection with the following document:

 "_id" : ObjectId("5836d55d500dsa1230f488ab0"),
 "a" : 1,
 "b" : "12345678",
 "c" : "12345678"

We want to duplicate the above, create another field, e, with the value of _id to another collection called books2. We could write a javascript using for loop (find.ForEach) to create initializeUnorderedBulkOp job, transform the fields and insert into another collection.

We could also use MongoDB aggregation to do it.

db.bookes.aggregate( [ { $project : { _id: 0, a:1, b:1, c:1, d: "$_id"} }, { $match: {} }, { $out : "books2" } ] )

1 line of code to do exactly what you need! Using my laptop(MBP with 16GB memory) as a test, with a million record, I achieved around ~12 times faster on aggregation than on javascript. Unlike copyTo(), there’s no database-level locking, so it’s friendly on your production system(Although it will mess up your cache with the big collection).

Personally, I like aggregation a lot. It feels a lot like Hadoop/MapReduce functions where the throughput is amazing but the latency is crap.

Wei Shan



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s