MongoDB-Blog.jpg

MongoDB, Hadoop And The Democratization Of Data

MongoDB, Hadoop And The Democratization Of Data

 

When I was in engineering school and wanted to get some serious data crunching done (shout out to Patran/Nastran aficionados), I would go downstairs to the lab and chat up Harvey. He owned the interface to the powerful and expensive mainframe, and nothing was going to get slotted in or processed without the blessing of this high priest of what, at the time, was Big Data. As much as I liked Harvey, boy am I glad that times have changed. But the advent of better number-crunching technology hasn’t happened overnight.

First Wave -- Easier Data Access

The democratization of data access within the business has been years in the making. Whereas 20 years ago there were a limited number of databases that were often run on large systems and had high barriers/licenses to start using (by people with advanced SQL training), the world has been evolving in a number of different directions. With the advent of MySQL in the mid-1990s, it became free and easy to get started on storing data, even with a minimum of relational database knowledge. MySQL then went on to power much of the website revolution in the late 90s.

Read Also:
Act-On Announces Data Studio for Deeper Analytics, Visual Discovery

Second Wave -- Cost Effective, Powerful Processing

Just as the first wave was starting to build, the need for a second wave was already starting. While it became easier to stand up and start a website and its underlying database, the explosive growth of the internet was starting to lead to other issues. Trying to find ways to index and search all the content being created was a daunting task. The major search engines such as Yahoo and Google were struggling with traditional ways of searching stacks of data. So instead of indexing every piece of hay in the proverbial haystack, they found ways to break it up and “mapreduce” it over several batches. The foundations for Hadoop came out of these efforts along with the work on Nutch by Doug Cutting.

Wave Three – The Bridge to Easy and Powerful

While cost effective and easy access sound great, this democracy presents opportunities and challenges since most existing data is already on legacy SQL systems. On the one hand, there are powerful new ways to combine NoSQL, SQL, and Hadoop for new areas such as IoT, as Matt Asay points out. On the other hand, there is now a modern Data Supply Chain (as Dan Woods notes) that must be put into place to manage all of this. That complexity can be intimidating for data architects, who only a decade ago were focused primarily on SQL and didn’t have to piece together NoSQL and Hadoop as well.

Read Also:
Data Scientists: The talent crunch (that isnt?), FOMO and Spanish silver



Data Science Congress 2017

5
Jun
2017
Data Science Congress 2017

20% off with code 7wdata_DSC2017

Read Also:
How to Turn Big Data into Big Wins: 3 Tools

AI Paris

6
Jun
2017
AI Paris

20% off with code AIP17-7WDATA-20

Read Also:
Gain competitive advantage with NoSQL databases

Chief Data Officer Summit San Francisco

7
Jun
2017
Chief Data Officer Summit San Francisco

$200 off with code DATA200

Read Also:
How Airbnb Uses Big Data And Machine Learning To Guide Hosts To The Perfect Price

Customer Analytics Innovation Summit Chicago

7
Jun
2017
Customer Analytics Innovation Summit Chicago

$200 off with code DATA200

Read Also:
Logentries' New Analytics Language Makes the Power of Log Data Accessible to the Masses

HR & Workforce Analytics Innovation Summit 2017 London

12
Jun
2017
HR & Workforce Analytics Innovation Summit 2017 London

$200 off with code DATA200

Read Also:
How Airbnb Uses Big Data And Machine Learning To Guide Hosts To The Perfect Price

Leave a Reply

Your email address will not be published. Required fields are marked *