MongoDB-Blog.jpg

MongoDB, Hadoop And The Democratization Of Data

MongoDB, Hadoop And The Democratization Of Data

 

When I was in engineering school and wanted to get some serious data crunching done (shout out to Patran/Nastran aficionados), I would go downstairs to the lab and chat up Harvey. He owned the interface to the powerful and expensive mainframe, and nothing was going to get slotted in or processed without the blessing of this high priest of what, at the time, was Big Data. As much as I liked Harvey, boy am I glad that times have changed. But the advent of better number-crunching technology hasn’t happened overnight.

First Wave -- Easier Data Access

The democratization of data access within the business has been years in the making. Whereas 20 years ago there were a limited number of databases that were often run on large systems and had high barriers/licenses to start using (by people with advanced SQL training), the world has been evolving in a number of different directions. With the advent of MySQL in the mid-1990s, it became free and easy to get started on storing data, even with a minimum of relational database knowledge. MySQL then went on to power much of the website revolution in the late 90s.

Read Also:
Open data adoption on the rise

Second Wave -- Cost Effective, Powerful Processing

Just as the first wave was starting to build, the need for a second wave was already starting. While it became easier to stand up and start a website and its underlying database, the explosive growth of the internet was starting to lead to other issues. Trying to find ways to index and search all the content being created was a daunting task. The major search engines such as Yahoo and Google were struggling with traditional ways of searching stacks of data. So instead of indexing every piece of hay in the proverbial haystack, they found ways to break it up and “mapreduce” it over several batches. The foundations for Hadoop came out of these efforts along with the work on Nutch by Doug Cutting.

Wave Three – The Bridge to Easy and Powerful

While cost effective and easy access sound great, this democracy presents opportunities and challenges since most existing data is already on legacy SQL systems. On the one hand, there are powerful new ways to combine NoSQL, SQL, and Hadoop for new areas such as IoT, as Matt Asay points out. On the other hand, there is now a modern Data Supply Chain (as Dan Woods notes) that must be put into place to manage all of this. That complexity can be intimidating for data architects, who only a decade ago were focused primarily on SQL and didn’t have to piece together NoSQL and Hadoop as well.

Read Also:
Announcing R Tools for Visual Studio



Chief Analytics Officer Spring 2017

2
May
2017
Chief Analytics Officer Spring 2017

15% off with code MP15

Read Also:
The Emergence and Future of the Data Engineer

Big Data and Analytics for Healthcare Philadelphia

17
May
2017
Big Data and Analytics for Healthcare Philadelphia

$200 off with code DATA200

Read Also:
Think Managing Big Data Is Much Too Complex? Just Wait

SMX London

23
May
2017
SMX London

10% off with code 7WDATASMX

Read Also:
2015: A Transformative Year for Big Data

Data Science Congress 2017

5
Jun
2017
Data Science Congress 2017

20% off with code 7wdata_DSC2017

Read Also:
Tomorrow's Surveillance: Four Technologies The NSA Will Use to Spy on You

AI Paris

6
Jun
2017
AI Paris

20% off with code AIP17-7WDATA-20

Read Also:
New Social Media Location Intelligence Platform Getting Serious Attention

Leave a Reply

Your email address will not be published. Required fields are marked *