Relational databases are also called Relational Database Management Systems (RDBMS) or SQL databases. Historically, the most popular of these have been Microsoft SQL Server, Oracle Database, MySQL, and IBM DB2. The RDBMS’s are used mostly in large enterprise scenarios, with the exception of MySQL, which is also used to store data for Web applications.
All relational databases can be used to manage transaction-oriented applications (OLTP), and most non-relational databases, in the categories of Document Stores and Column Stores, can also be used for OLTP, adding to the confusion between them. OLTP databases can be thought of as “operational” databases, characterized by frequent, short transactions that include updates, touch a small amount of data, and provide concurrency to thousands (if not more) of transactions (some examples include banking applications and online reservations).
James Serra, a Big Data Evangelist at Microsoft, discussed the many differences, advantages and disadvantages, and various use cases of relational and non-relational databases during his Enterprise Data World 2016 Conference presentation.
He began by discussing the fact that the integrity of data is very important, so RDBMSs support ACID transactions (Atomicity, Consistency, Isolation, and Durability). RDBMSs have provided for data integrity needs for decades, but the exponential growth of data over the past 10 years or so, along with many new data types have changed the data equation entirely, and so non-relational databases have grown from such a need.
Non-relational databases are also called NoSQL databases. NoSQL has become an industry standard term, but the name is beginning to lose popularity since it doesn’t fully cover the complexity and range of non-relational data stores that are available. Some of the most known NoSQL or non-relational DBs that Serra discussed are MongoDB, DocumentDB, Cassandra, Coachbase, HBase, Redis, and Neo4j. There are literally hundreds, if not thousands, more.
Hadoop is also part of this entire discussion, said Serra. But, “keep in mind Hadoop is a file system with components made up of Hadoop Distributed File System (HDFS), Yarn, and MapReduce.” So while it is a significant part of the relational and non-relational discussion, it includes many other components as well. For an outline of Hadoop, see the DATAVERSITY® article titled Hadoop Overview: A Big Data Toolkit.
If an organization is using SQL Server, said Serra,
“And I need to index a few thousand documents and search them. No problem. I can use Full-Text Search. But what happens if I need to store and analyze a few million web pages?”
Enter Hadoop and non-relational databases. Using SQL Server, if an internal company application needs to handle a few thousand transactions per second it’s no problem. SQL Server can handle that with a nice size server. But in a situation where users can enter millions of transactions per second, this becomes a serious problem. Enter NoSQL as a solution, said Serra. But most enterprise data still only needs an RDBMS.