Archive for category NewSQL

A neutral perspective of the current database industry

A neutral perspective of the current database industry

According to db-engines, there are more than 317 database software in the market right now. If you have requirements to develop a new application , how do you know which database to use? When I started my career in 2011, the de facto standard was to use Oracle database. It was mature, reliable and able to support multiple types of workload. If you can’t afford Oracle database, you could use MySQL, if you require high performance with HA capabilities. If you require a mature product with strict compliant to ANSI SQL standards, there’s always PostgreSQL. However, Oracle is still the #1 choice for a lot of reasons. Also, Oracle DBAs were also paid the most, compared to other databases DBA.

Five years has passed, a lot has changed in the database industry. Even though Oracle is still the #1 in the market, there are many other comparable choices now. Oracle supporters will claim that Oracle database is still the best and no other database products will come close. They will also claim that NoSQL products are overhyped and will die down soon enough. Open-source supporters will spread hate on Oracle, claim that it’s overpriced and that you don’t need Oracle database at all.

In my opinion, enterprise companies AKA companies whom are loaded with cash, will still continue to pay 7 figures for Oracle support contract. They continue to do so because they are not willing to migrate their legacy application to something else. Also, It doesn’t make sense for them to use other database software when they have unlimited license usage for Oracle software. It’s also possible that the CIO aren’t willing to risk his/her neck by changing the database stack 🙂 For companies that aren’t as rich, usually their ERP software will still be stuck using Oracle database. That’s if they haven’t move to a cloud ERP yet. For new applications, they are already looking at alternatives.

Oracle is still the #1 database, in terms of technical superiority. No other databases vendors have implemented Oracle’s Real Application Cluster (RAC). It’s an extremely powerful feature that offers possibly, zero downtime for node failure. However, how many real life applications require that? Is it feasible to have that feature for an additional 23,000 USD/core? Is it acceptable to have 15s of downtime during node failure instead? That would decrease your cost to about nothing for licensing.

If you have a multiple terabyte application, the cost of having that on a Oracle database is not going to be cheap. A four-core EE licensed server is going to cost you at least 200,000 USD already. If you run it on MongoDB or MySQL, it’s not going to cost you anything. Obviously, your application needs to be able to able to support that (ACID compliant). If you help your company to save that 200k USD, you might find a sweeter bonus at the end of the year!

Oracle database has got a lot of great technology behind it. The database optimisation engine is far more impressive than any other vendors I know. They had parallel query even before version 8i. PostgreSQL only had it recently. However, Oracle database sales team has been trying to piss off their customers for awhile now.I know that several banks in Asean region are using PostgreSQL/MySQL for their new applications Also, many companies are banning new Oracle database implementations implicitly unless there’s no alternatives. People didn’t have much choice for choosing database software back then. It is vastly different now.

It’s a lot harder to pick the right database for new applications now. It used to be just Oracle and Oracle only. These days, you got to think of the nature and volume of the data/application. You also need to decide if your application requires ACID. If you are using NoSQL, there’s graph, key-value, document and columnar. You don’t really want to use the wrong database for your application. Using the wrong datastore for your application will result into sub-optimal performance or ugly workarounds that you wouldn’t like. For example, you could use MongoDB for text search. However, a better database for heavy text search use case would be Elasticsearch. Just because MongoDB have text indexes and is able to perform text search doesn’t mean it’s the right datastore for it. Heck, you could even use MySQL as a pure key-value storage. Databases is all about polyglot persistence these days.

I believe that in years to come, Oracle database will become the mainframe of database software. Highly profitable yet legacy piece of software whereby companies only use it because they were too afraid to use something else. For all my friends that still works actively on Oracle databases, perhaps it’s time to learn a new skill or move on to something else.

Regards,
Wei Shan

4 Comments

SQL Vs NoSQL Vs NewSQL Databases

SQL Vs NoSQL Vs NewSQL Databases

Before I join the database field in 2011, I was told that the role of a Database Administrator(DBA) is extremely dull and boring. However, until now, it has been nothing but exciting. Buzz words like Big Data and IoT have generated new found interest in databases. The number of new databases that have sprung out of nowhere is tremendous. CockroachDB, Riak, Voldemort and etc.. In this post, I will be sharing about 3 different category of databases.

SQL Databases

SQL, also known as relational database management system(RDBMS), is what most companies are running on today. Popular relation databases are Oracle, SQL Server, MySQL and PostgreSQL. Data are store in rows and columns. You can use SQL to query for data in a RDBMS. A lot of applications are running on either of them. Bank transactions, Data Warehouse operations and payment transactions. These are the core applications that any companies will be supporting.

Relational databases are famous for ACID. Atomicity, Consistency, Isolation and Durability. You can read about them over here. Basically, it means that your transaction, once committed, it will be safe and there should not be any data loss. Simultaneous transactions should not interfere with each other.

NoSQL Databases

NoSQL is a term for databases that does not store its data in rows and columns. They are typically not ACID but BASE compliant. This is because they sacrifice C for AP in CAP theorem.  Also, you don’t have to define your schema upfront and is considered “schema-less”. Below are the different types of NoSQL databases

  • Document => MongoDB
  • Key-Value => Redis
  • Graph => Neo4j
  • Column => Cassandra

Different types of NoSQL database are specific for different use case. For example, Neo4j is perfect as a data-store for social media sites because supports defining relations between entities. MongoDB are typically used for e-commerce sites because each cart can be defined as 1 collection. Redis is often being used as a queue or caching tier that sits in-front of another database. NoSQL can be an entirely subject of it’s own and I’m probably not qualified to write anything more than an introduction on it. (But this won’t be true after I finish my MongoDB course!)

NoSQL can be accessed use SQL-LIKE language or their own native language like Neo4j-Cypher.

NewSQL Databases

NewSQL is our shorthand for the various new scalable/high-performance SQL database vendors. We have previously referred to these products as “ScalableSQL” to differentiate them from the incumbent relational database products. Since this implies horizontal scalability, which is not necessarily a feature of all products, we adopted the term NewSQL in the new report. And to clarify, like NoSQL, NewSQL is not to be taken too literally: the new thing about the NewSQL vendors is the vendor, not the SQL. NewSQL is a set of various new scalable/high-performance SQL database vendors (or databases). These vendors have designed solutions to bring the benefits of the relational model to the distributed architecture, and improve the performance of relational databases to an extent that the scalability is no longer an issue.

– 451 Group’s senior analyst, Matthew Aslett

NewSQL are pretty new to the database industry. They are supposed to be ACID-compliant yet be highly scalable. Also, they are also using SQL-LIKE for CRUD operations. Below are some NewSQL databases.

  • VoltDB
  • Clustrix
  • NuoDB

In order to achieve the above requirements, different NewSQL vendors have implemented their design different.For example, NuoDB uses a “2 tier approach”. Transaction Engines hold a subset of the objects in memory while Storage Managers are servers that have a complete copy of all objects on disks. Also, NuoDB uses Durable Distributed Cache (DDC) which sound like a massive RAC-Cache Fusion to me. VoltDB uses a mixture of partitioning, stored procedures as a unit of transaction and deterministic command to attain the same. Each of the NewSQL databases have very different concepts to it!

Summary

In my opinion, NoSQL will never replace SQL databases. They are simply used for different scenario and it’s never meant to replace each other. The RDBMS is just so good in handling OTLP transactions that you will never want to migrate it to MongoDB or Riak because of performance or ACID compliance. Likewise, you wouldn’t want your 25-node MongoDB cluster to be migrated to Oracle databases. You will either die trying or your company will go bankrupt first.

However, I believe that we will see a huge increase in companies adopting NoSQL because storing and analysing data is becoming very crucial or business to succeed these days. It’s becoming a key differentiator in the corporate world. It helps understand your customers better. Look at companies like Uber or Amazon. They are real good examples of using data to succeed.

Some people may ask, if NoSQL/NewSQL is so powerful, why don’t people use it to replace their existing Oracle Data Warehouse. Typical Oracle shops spend at least 7-figure sum on the Data Warehouse licensing per year (EE Edition, RAC, AWR packs..). In my opinion, there are a couple of reasons why:

  1. Current analytics tool are pretty robust and mature. But they are only support on ANSI compliant SQL language. Using non-SQL or SQL-LIKE tool will require a re-write of all their existing software.
  2. RDBMS contains dimension and fact tables which are already “cleaned” as compared to NoSQL
  3. SQL is a very easy language to learn and the existing users are just too comfortable on it

However, I believe that NoSQL will soon rise to join the ranks of SQL database in the Data Warehouse scene. This is because companies now want to store their non-structured data as part of their data lake. So they could be using RDBMS for their existing applications and using NoSQL for non-structured data. Then they could use something like Big SQL to query data from both type of databases, forming something called Data Lake.

For DBAs like myself, it means you will get to play with more databases! 🙂
Cheers,
WeiShan

2 Comments