MongoDB 3.2 – Hardening and User Access Control

MongoDB 3.2 – Hardening and User Access Control

MongoDB has been on the news¬†for being one of the most insecure NoSQL database software. By default, it is insecure with absolutely no security measures in place. Oracle and other various database software usually has a default admin user created and doesn’t allow any non-user to login to the database. However, MongoDB allows all user to login to the system on port 27017. Unless, of course, you lock in downūüôā

The MongoDB documentation¬†has all the information you need to lock it down. Once you start MongoDB with –auth parameter, the system is effectively locked down. Recently, I had the job of securing a MongoDB cluster. One of the more difficult task is applying RBAC policies to your environment as you will need to understand the application/users access your MongoDB cluster.

My tips on securing your MongoDB cluster.

  1. Enable TLS for all MongoDB components (mongos, config servers and shard servers)
  2. Encrypt your drives that are storing MongoDB data
  3. Use x509 authentication mechanism
  4. Disable TLS1.0 and TLS1.1 TLS protocol
  5. Use RequireSSL for “SSLMode” parameter
  6. Use mutual TLS authentication to authenticate your application too
  7. Don’t use a self signed certificate. Get one from a public CA.
  8. Create a superuser with root and backup role. (This will be your “oracle” user in Oracle DB context)
  9. Create a local superuser with root and backup role on each sharded replica set. This user will be required for shard maintenance

Things you should know about TLS

  1. Remove unnecessary parameter (CAFile, ¬†clusterFile) from your configuration file else mongod will hit errors while parsing them. Don’t even put them in the configuration file with empty string.
  2. MongoDB automatically enable authorization when you enable TLS. So remember to create your users before that! (link)
  3. MongoDB does not have a parameter for choosing which ciphersuite to use.

Once you enable TLS, you will need to work around the authentication mechanism in MongoDB. Some tips below will be helpful.

When you are only using TLS but not client authentication

Create a superuser via mongos

use admin
db.createUser( { user: "superuser", pwd: "password", roles: ["root", "backup"] }, { w:"majority", wtimeout: 5000} )

Create a role via mongos

use admin
db.createRole( { role: "readWrite", privileges: [ { resource: { db: "database_name", collection: "collection_name" }, actions: ["find", "insert"] } ], roles:[ ] }, { w: "majority" } )

How to login to the MongoDB cluster via mongos

# mongo 10.1.64.11:27017/admin --authenticationDatabase admin
mongo> db.auth("username","password")

When you are using both TLS and client authentication

Create a superuser via mongos

db.getSiblingDB("$external").runCommand(
 {
 createUser: "CN=superuser,OU=Test ,O=Test,L=UK,ST=London,C=UK",
 roles: [
 { role: 'root', db: 'admin' },
 { role: 'backup', db: 'admin'}
 ],
 writeConcern: { w: "majority" , wtimeout: 5000 }
 }
)

Create a role via mongos

use admin
db.createRole( { role: "readWrite", privileges: [ { resource: { db: "database_name", collection: "collection_name" }, actions: ["find", "insert"] } ], roles:[ ] }, { w: "majority" } )

How to login to the MongoDB cluster via mongos

db.getSiblingDB("$external").auth(
 {
 mechanism: "MONGODB-X509",
 user: "CN=superuser,OU=Test ,O=Test,L=UK,ST=London,C=UK"
 }
)

How to list all your mutual TLS authentication users

db.getSiblingDB("$external").runCommand( { usersInfo: 1 } )

Hope this helps!

 

Regards,
Wei Shan

Leave a comment

MongoDB 3.2 – setShardVersion failed, stale config on lazy receive

MongoDB 3.2 – setShardVersion failed, stale config on lazy receive

While we were hacking the config servers today, we hit multiple errors.

  • MongoDB version 3.2.11
  • 2 shards
  • 3 replica set per shards

Any find() or aggregation query against mongoS

2016-11-24T15:03:31.699+0000 E QUERY [thread1] Error: command failed: {
 "code" : 10429,
 "ok" : 0,
 "errmsg" : "setShardVersion failed shard: s1:s1./hostnames7 { oldVersion: Timestamp 0|0, oldVersionEpoch: ObjectId('000000000000000000000000'), ns: \"collections\", version: Timestamp 0|0, versionEpoch: ObjectId('000000000000000000000000'), globalVersion: Timestamp 1000|0, globalVersionEpoch: ObjectId('5836fa3761234567600eebbf'), reloadConfig: true, ok: 0.0, errmsg:
\"could not refresh metadata for collections with requested shard version 0|0||000000000000000000000000, stored shard version is 1|0||1234fa37631...\" }"
} : aggregate failed :
_getErrorWithCode@src/mongo/shell/utils.js:25:13
doassert@src/mongo/shell/assert.js:16:14
assert.commandWorked@src/mongo/shell/assert.js:290:5
DBCollection.prototype.aggregate@src/mongo/shell/collection.js:1312:5
@(shell):1:1

db.printCollectionStats() against mongoS

error: {
 "$err" : "stale config on lazy receive :: caused by :: $err: \"[myzips.zips] shard version not ok in Client::Context: this shard contains versioned chunks for myzips.zips, but no version set in request ( ns : myzips.zip...\" ( ns : myzips.zips, received : 0|0||000000000000000000000000, wanted : 1|1||50fdd55b14faa2aa46422a7a, recv )",
 "code" : 9996
} at src/mongo/shell/query.js:128

Root cause:

  • config servers have new updated metadata about the sharded cluster that is not updated in both mongoSs and mongoDs (Yes, despite MongoDB official documentation, mongoDs does have a cache copy of your config servers metadata

Solution as below:

  1. Restart mongoSs
  2. Restart all your shards mongoDs

Regards,
Wei Shan

Leave a comment

MongoDB 3.2 – server returned error on SASL authentication step: Authentication failed.

MongoDB 3.2 – server returned error on SASL authentication step: Authentication failed.

This issue will happen to all MongoDB tools (mongo, mongodump, mongotop, mongostat). However, I encounter the issue while using mongotop.

#mongotop --host=192.168.56.104:27017 -u username -p "THIS1sR3tard3d!!" --authenticationDatabase=admin 5
mongotop --host=192.168.56.104:27017 -u superuser -p "THIS1sR3tard3dmongostat --host=192.168.56.104:27017 --username=user --password='THIS1sR3tard3d!!' --authenticationDatabase=admin " --authenticationDatabase=admin 5
2016-11-17T17:43:59.523+0000 Failed: error connecting to db server: server returned error on SASL authentication step: Authentication failed.

The issue is with the password with special characters. Apparently, even when you double quote a password, it still doesn’t ignore the special characters. t doesn’t help the situation when the error message is really vague.

#mongotop --host=192.168.56.104:27017 -u username -p 'THIS1sR3tard3d!!' --authenticationDatabase=admin 5

To solve the issue, simply single quote the password fieldūüôā

Update:

Apparently, escaping the special characters using single quote works too!

mongo -u username -p '1234'@''@'' --authenticationDatabase admin <ip>:port/db

Hope this helps

Regards,
Wei Shan

Leave a comment

Oracle 11gR2 – Default Audit Policies

Oracle 11gR2 – Default Audit Policies

By default, when we create a new database instance in Oracle, the default value for AUDIT_TRAIL is set to “none”. When you set it to “DB” or “DB,EXTENDED”, it will have a set of default logging policies.

SQL> select * from dba_priv_audit_opts;

USER_NAME PROXY_NAME PRIVILEGE SUCCESS FAILURE
------------------------------ ------------------------------ ---------------------------------------- ---------- ----------
 CREATE EXTERNAL JOB BY ACCESS BY ACCESS
 CREATE ANY JOB BY ACCESS BY ACCESS
 GRANT ANY OBJECT PRIVILEGE BY ACCESS BY ACCESS
 EXEMPT ACCESS POLICY BY ACCESS BY ACCESS
 CREATE ANY LIBRARY BY ACCESS BY ACCESS
 GRANT ANY PRIVILEGE BY ACCESS BY ACCESS
 DROP PROFILE BY ACCESS BY ACCESS
 ALTER PROFILE BY ACCESS BY ACCESS
 DROP ANY PROCEDURE BY ACCESS BY ACCESS
 ALTER ANY PROCEDURE BY ACCESS BY ACCESS
 CREATE ANY PROCEDURE BY ACCESS BY ACCESS
 ALTER DATABASE BY ACCESS BY ACCESS
 GRANT ANY ROLE BY ACCESS BY ACCESS
 CREATE PUBLIC DATABASE LINK BY ACCESS BY ACCESS
 DROP ANY TABLE BY ACCESS BY ACCESS
 ALTER ANY TABLE BY ACCESS BY ACCESS
 CREATE ANY TABLE BY ACCESS BY ACCESS
 DROP USER BY ACCESS BY ACCESS
 ALTER USER BY ACCESS BY ACCESS
 CREATE USER BY ACCESS BY ACCESS
 CREATE SESSION BY ACCESS BY ACCESS
 AUDIT SYSTEM BY ACCESS BY ACCESS
 ALTER SYSTEM BY ACCESS BY ACCESS

By default, it will log ALL sessions. This will increase the audit log size by a tremendous amount. Thus, unless you have a unique business requirement, it will be wiser if we just log failure login attempts only.

Disable audit logging on all session creation. Enable audit logging on unsuccessful session creation.

SQL> noaudit create session;
SQL> audit create session whenever not successful;

Verify Settings.

SQL> select * from dba_priv_audit_opts;

USER_NAME PROXY_NAME PRIVILEGE SUCCESS FAILURE
------------------------------ ------------------------------ ---------------------------------------- ---------- ----------
 CREATE SESSION NOT SET BY ACCESS

Hope this helps!

Regards,
Wei Shan

Leave a comment

MongoDB 3.2 – Could not find host matching read preference

MongoDB 3.2 – Could not find host matching read preference.

Environment:

  • MongoDB 3.2, WiredTiger
  • 2 shards
  • 3 node replica set

Error Message

mongos> db.collection.find()
Error: error: {
 "ok" : 0,
 "errmsg" : "could not find host matching read preference { mode: \"primary\", tags: [ {} ] } for set rs0",
 "code" : 133
}

The above error means that the query, after being routed through mongos[mongos caches config server shards information], was unable to find data from the shards to satisfy the query. 

This means there’s several possibilities why the error occurs:

  1. The metadata information in the config collections (stored in config server) is incorrect. This could mean you have the wrong IP address, wrong replica set names and etc.. Check the documents in the chunks, tags or shards collections
  2. The primary replica set in down and the automatic failover did not kick in for some reasons. Also, your read preference is set to primary and thus, there was not primary node to satify your query.

Solutions:

  1. Verify the metadata in your config database and use relevant MongoDB commands to rectify any mis-configurations. (DO NOT MODIFY MANUALLY THIS UNLESS ABSOLUTELY CONFIDENT!!)
  2. Veriy that your primary replica set is up or change your readPreference to a value that is not “primary”

Thanks!

Regards,
Wei Shan

Leave a comment

Linux YUM Best Kept Secret – whatprovides

Linux YUM Best Kept Secret – whatprovides

Most of the time, when we are installing a software like Oracle or DB2 on a Linux system, there will be required RPMs that has not been installed yet. If the RPM required is unfamiliar, we will be forced to research and find out what is the RPM about or where to download it.

From Linux Man Pages (whatprovides argument):

Is used to find out which package provides some feature or file. Just use a specific name or a file-glob-syntax wildcards to list the packages available or installed that provide that feature or file.

Usage Example:

When I tried to install DB2, it complains the following error:

DBT3514W ¬†The db2prereqcheck utility failed to find the following 32-bit library file: “libstdc++.so.6”.

So let’s try the argument out!

[root@centos68 server_t]# yum whatprovides libstdc++.so.6
libstdc++-4.4.7-17.el6.i686 : GNU Standard C++ Library
Repo : base
Matched from:
Other : libstdc++.so.6

[root@centos68 server_t]# yum install -y libstdc++-4.4.7-17.el6.i686

[root@centos68 server_t]# yum list libstdc++.i686 0:4.4.7-17.el6
Installed Packages
libstdc++.i686 4.4.7-17.el6 @base

Regards,
Wei Shan

Leave a comment

Investigating Oracle Database Memory Pressure Behavior

Investigating Oracle Database Memory Pressure Behavior

Oracle database will grab all and release as the O/S sees memory pressure.

I came across the above statement recently during a discussion., which led to this blog post. I wasn’t very sure if it was true so I did not reply to it. I’m aware of the behavior when it face CPU pressure but not memory pressure. So, this has definitely¬†piqued¬†my curiosity to find out more.


In Oracle 11gR2 and later, we can choose to use have the following options:

  • Automatic Memory Management (AMM)
    • We configure MEMORY_TARGET and MEMORY_MAX _TARGET and let Oracle database manage the SGA and PGA for us
  • Automatic¬†Shared Memory Management (ASMM)
    • We configure SGA_TARGET and SGA_MAX_SIZE¬†and let Oracle manage SGA for us.
  • Automatic PGA Memory (Usually used together with ASMM)
    • We configure PGA_AGGREGATE_TARGET and let Oracle manage PGA for us.

So, we can be sure that the parameters below affect Oracle database memory usage on a system. (Of course, we are not referring to manual memory tuning.)

  1. MEMORY_MAX _TARGET
  2. MEMORY_TARGET
  3. SGA_TARGET
  4. SGA_MAX_SIZE
  5. PGA_AGGREGATE_TARGET

It is widely understood that the above parameters are merely a a target. It may exceed the optimum size due to other circumstances. For example, the number of users increases beyond limit or insane SQL queries that exceed PGA value. PGA size is the unknown factor here, since we do not know how many users or what kind of queries will be running in the database.

The question here is, when it uses more memory than the target value, does it start to swap or does it grab the free memory from the non-allocated memory from the O/S. I would think that it will use free memory then swap space since it will ask the memory from O/S until it has memory pressure which then, it will start to swap.

Let’s test it out!

Let’s login and check all the baseline numbers.

[oracle@primary ~]$ sqlplus / as sysdba

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> show parameter pga_aggregate_target

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
pga_aggregate_target big integer 200M

#free -m

total used free shared buffers cached
Mem: 2002 721 1281 183 24 451
-/+ buffers/cache: 245 1757
Swap: 4031 0 4031


select a.name, to_char(b.value, '999,999,999') bytes,
to_char(round(b.value/1024/1024,1), '99,999.9' ) mbytes
from v$statname a, v$mystat b
where a.statistic# = b.statistic#
and a.name like '%ga memory%';

NAME BYTES MBYTES
-------- ------------ ---------
session uga memory 2,515,088 2.4
session uga memory max 2,515,088 2.4
session pga memory 3,555,880 3.4
session pga memory max 22,889,000 21.8

The PGA_AGGREGATE_TARGET is configure to use 200MB. Currently it is only using 2.4MB. We have 4GB of swap space configured as well.

Let’s create a package and generate the workload

SQL> create or replace package demo_pkg
 as
 type array is table of char(2000) index by binary_integer;
 g_data array;
 end;
 /

SQL> begin
 for i in 1 .. 2000000
 loop
 demo_pkg.g_data(i) := 'x';
 end loop;
 end;
 /
 ERROR at line 1:
 ORA-04030: out of process memory when trying to allocate 16328 bytes (koh-kghu sessi,pl/sql vc2)

The above SQL script definitely uses more than 2GB of memory for sorting, which lead to ORA-04030 error.

From another separate SSH session, let’s monitor the memory usage via vmstat

[root@primary ~]# vmstat 1 6000000
 procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
 r b swpd free buff cache si so bi bo in cs us sy id wa st
 0 0 0 1311424 24952 462624 0 0 1611 58 303 486 6 10 83 1 0
 0 0 0 1311392 24952 462628 0 0 0 16 151 282 0 1 99 0 0
 0 0 0 1311384 24952 462628 0 0 0 0 118 228 0 0 100 0 0
 0 0 0 1311384 24952 462628 0 0 0 32 151 287 0 0 100 0 0
 0 0 0 1311384 24952 462628 0 0 0 0 112 220 0 0 100 0 0
 0 0 0 1311384 24952 462628 0 0 0 0 141 263 0 1 99 0 0
 0 0 0 1311384 24960 462628 0 0 0 52 135 267 0 0 99 1 0
 0 0 0 1311384 24960 462628 0 0 0 0 141 271 0 0 100 0 0
 0 0 0 1311384 24960 462628 0 0 0 0 109 215 0 0 100 0 0
 0 0 0 1311384 24960 462628 0 0 0 32 150 283 0 0 100 0 0
 0 0 0 1311384 24960 462628 0 0 0 0 123 232 0 0 100 0 0
 0 0 0 1311384 24968 462624 0 0 0 20 144 276 0 1 99 0 0
 0 0 0 1311392 24968 462628 0 0 0 32 121 250 0 0 100 0 0
 0 0 0 1311136 24968 462628 0 0 0 0 167 329 1 1 98 0 0
 0 0 0 1311136 24968 462628 0 0 0 0 109 211 0 0 100 0 0
 1 0 0 226260 24968 463068 0 0 280 32 998 310 22 67 10 0 0
 2 4 86096 13744 60 105252 0 93124 12636 93144 2101 1698 13 86 0 1 0
 3 2 314548 14296 60 82612 1596 226304 29708 226304 2234 1695 9 91 0 0 0
 3 4 552488 13732 60 83128 9568 244640 35048 244664 2549 2140 8 90 0 2 0
 2 0 806904 13888 60 82228 1920 267312 34484 267344 2438 1904 9 89 0 2 0

The highlighted values above are the period where the SQL query executed. Noticed that the free memory was at 1.3GB until it drop to 200MB before it actually start to swap.

Summary

From the above, I have highlighted the important points:

  1. It is not possible to “overuse” SGA as it uses LRU algorithm to remove unnecessary objects under memory pressure.
  2. Although we configure PGA to only use a maximum of 200MB, it is possible to exceed this value due to user queries or sessions.
  3. Under memory pressure, the database will try to use available memory until it doesn’t have a choice, which it will choose to use swap space.
  4. When space space run out, the database will hit ORA-04030 error.

Regards,
Wei Shan

References: Expert Oracle Database Architecture -Thomas Kyte

 

 

 

 

Leave a comment

Story of recovering a PostgreSQL 9.2 database

Story of recovering a PostgreSQL 9.2 database

Background Story

It was a 2 node PostgreSQL 9.2 streaming replication setup. The master node crashed without possibility of recovery. Thus, there was a need to restore the standby database to resume the production services. The standby database also reboot when master node crashed. The standby node was performing database backup via pg_start_backup when it rebooted.

I will simulate the recovery below.


postgresql.conf

wal_level = hot_standby
archive_mode = on
archive_command = 'cp %p /var/lib/pgsql/9.2/data/pg_archive/%f'

recovery.conf

standby_mode = 'on'
primary_conninfo = 'host=10.1.0.3 port=5432 user=postgres application_name=testdb keepalives_idle=60 keepalives_interval=5 keepalives_count=5'
restore_command = 'cp /var/lib/pgsql/9.2/data/pg_archive/%f %p'
recovery_target_timeline = 'latest'

Start PostgreSQL backup on the database

postgres=# select pg_start_backup('test1');
 pg_start_backup
-----------------
 0/40000B8
(1 row)

Abort PostgreSQL database and start it again. (Notice we are not able to login the psql)

-bash-4.1$ /usr/pgsql-9.2/bin/pg_ctl -D /var/lib/pgsql/9.2/data -m immediate stop
waiting for server to shut down... done
server stopped
-bash-4.1$ /usr/pgsql-9.2/bin/pg_ctl -D /var/lib/pgsql/9.2/data -l logfile start
server starting
-bash-4.1$ psql
psql: FATAL: the database system is starting up

Notice that the startup process is waiting for a WAL file.

-bash-4.1$ ps -aef | grep postgres
root 2405 2283 0 18:19 pts/1 00:00:00 su - postgres
postgres 2407 2405 0 18:19 pts/1 00:00:00 -bash
postgres 2921 1 0 19:21 pts/1 00:00:00 /usr/pgsql-9.2/bin/postgres -D /var/lib/pgsql/9.2/data
postgres 2923 2921 0 19:21 ? 00:00:00 postgres: logger process
postgres 2924 2921 0 19:21 ? 00:00:00 postgres: startup process waiting for 000000010000000000000004
postgres 2925 2921 0 19:21 ? 00:00:00 postgres: checkpointer process
postgres 2926 2921 0 19:21 ? 00:00:00 postgres: writer process

Error logs in /var/log/messages

FATAL: the database system is starting up
FATAL: the database system is starting up
FATAL: could not connect to the primary server: could not connect to server: Connection timed out
 Is the server running on host "10.1.0.3" and accepting
 TCP/IP connections on port 5432?

cp: cannot stat `/var/lib/pgsql/9.2/data/pg_archive/000000010000000000000004': No such file or directory
cp: cannot stat `/var/lib/pgsql/9.2/data/pg_archive/000000010000000000000004': No such file or directory
cp: cannot stat `/var/lib/pgsql/9.2/data/pg_archive/00000002.history': No such file or directory
LOG: received smart shutdown request
FATAL: terminating walreceiver process due to administrator command
LOG: shutting down
LOG: database system is shut down
LOG: database system was shut down in recovery at 2016-08-03 19:17:49 SGT
cp: cannot stat `/var/lib/pgsql/9.2/data/pg_archive/00000002.history': No such file or directory
LOG: entering standby mode
LOG: restored log file "000000010000000000000004" from archive
LOG: redo starts at 0/4000020
LOG: record with zero length at 0/40000B8
LOG: record with zero length at 0/40000B8
cp: cannot stat `/var/lib/pgsql/9.2/data/pg_archive/00000002.history': No such file or directory
FATAL: the database system is starting up

The WAL file that it was waiting for existed in pg_xlog but not in pg_archive.

-bash-4.1$ ls /var/lib/pgsql/9.2/data/pg_archive
000000010000000000000002 000000010000000000000003

-bash-4.1$ ls /var/lib/pgsql/9.2/data/pg_xlog
000000010000000000000002 000000010000000000000003 000000010000000000000004

Solution: Remove recovery.conf and restart PostgreSQL. Noticed the we didn’t have to use¬†pg_stop_backup to remove the backup label.

-bash-4.1$ mv recovery.conf recovery.done
-bash-4.1$ /usr/pgsql-9.2/bin/pg_ctl -D /var/lib/pgsql/9.2/data -l logfile start
server starting
-bash-4.1$ psql
psql (9.2.17)
Type "help" for help.

postgres=# select pg_stop_backup();
ERROR: a backup is not in progress

Verification (Formatted for readability)

-bash-4.1$ /usr/pgsql-9.2/bin/pg_controldata
Database cluster state: in production
pg_control last modified: Wed 03 Aug 2016 07:22:27 PM SGT
Latest checkpoint location: 0/40000B8
Prior checkpoint location: 0/4000058
Latest checkpoint's REDO location: 0/40000B8
Minimum recovery ending location: 0/0
Backup start location: 0/4000020
Backup end location: 0/0
End-of-backup record required: no
Current wal_level setting: hot_standby

Summary

There is a few learning points from the story.

  • When a database crashed while it is running pg_start_backup() mode, after it reboots, it is not¬†required to run pg_stop_backup().
  • The reason why it was waiting for the WAL file that was in pg_xlog is due to recovery.conf. The restore command in recovery.conf was pointing to pg_archive to restore the WAL file.
  • It was a crash recovery, we didn’t have to use recovery.conf to restore the database. recovery.conf is only used when we are restoring the database from backup

Regards,
Wei Shan

Leave a comment