Is Spark better than Hadoop Map Reduce?

Is Spark better than Hadoop Map Reduce?

For anyone who gets into the Big Data world, the terms Big Data and Hadoop become synonyms. As they learn the ecosystem along with the tools and their workings, people become more aware about what big data actually means, and what role Hadoop has in the big data ecosystem.

According to Wikipedia, “Big data is a broad term for data sets so large or complex that traditional data processing applications are inadequate”.

To put it in simple terms, as the size of data increases the usual processing methods takes too longer or proves to be too costly.

Hadoop was created in ,2005, by Doug Cutting, who was inspired by Google’s white papers on GFS and MapReduce. Hadoop is an open source software framework for distributed storage and distributed processing of very large data sets. In other words, it is designed to reduce cost and time of processing large data sets.

Hadoop, with its distributed file system (HDFS) and distributed processing model (MapReduce) became the de-facto standard in big data computing. The term ‘Hadoop’ refers to not only the base modules, but also the ecosystem of other software packages that can be used along with Hadoop.

As time went on, data generation exploded and the need for processing large amounts of data also exploded. This eventually generated a variety of needs in big data computing, not all of which could be satisfied by Hadoop.

Most of the analysis done on data are iterative in nature. While iterative processing could be done in MapReduce, data should be read for each iteration of the process. Under normal circumstances, this would be fine, but reading 100′s of GB’s of data or a few TB’s of data is going to take time and people are not patient.

Many people consider data analytics to be an art rather than a science. In any art, the creator creates a small piece of the puzzle and attaches it to the bigger one to witness its growth. Loosely translated, data analysts want to see the results of each process before proceeding to the next one. In other words, a lot data analytics is interactive in nature. Traditionally, interactive analytics is effected through SQL. Analysts write queries which operate on data in databases. Although, Hadoop had equivalents (Hive & Pig), this proved to be time consuming as each query takes a lot of time processing the data.

Both these hurdles led to the birth of Spark, a new processing model that facilitates iterative programming and interactive analytics. Spark provided an in-memory primitive models that loads the data into memory and query it repeatedly. This makes Spark well suited for a lot data analytics and machine learning algorithms.

Note that, Spark only defines the distributed processing model. Storing the data part is not addressed by Spark and it still relies on hadoop (HDFS) to efficiently store the data in a distributed way.

Spark is setting the big data ecosystem on hyperdrive. It promises to be 10-100 times faster than MapReduce. Many think this could be the end of MapReduce.

Share it:
Share it:

[Social9_Share class=”s9-widget-wrapper”]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You Might Be Interested In

How Machine Learning is Revolutionizing Digital Enterprises

18 Apr, 2017

According to the prediction of IDC Futurescapes,  Two-thirds of Global 2000 Enterprises CEOs will center their corporate strategy on digital transformation. …

Read more

How IoT Is Shaping the Agriculture Sector: Benefits Offered by Latest Trends of IoT

22 Jul, 2019

With the changing streams of today’s technology, business sectors are gaining various benefits from the latest trends. As the modern …

Read more

Data modeling software tackles glut of new data sources

22 May, 2019

Data modeling, a key component of data management techniques and analytics processes, comprises many complex steps that are getting increasingly …

Read more

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

3-steps-to-drive-analytics-adoption

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.