What is Hadoop? Introduction to Hadoop Framework Blog

What is Hadoop? Introduction to Hadoop Framework

by 7wData
September 23, 2017

What comprises of Hadoop architecture/ecosystem?
The architecture can be broken down into two branches, i.e., core components and complementary/other components.
Core components :
There are four basic or core components :
Common – It is a set of common utilities and libraries which handle other Hadoop modules. It makes sure that the hardware failures are managed by Hadoop cluster automatically.

HDFS – HDFS is a distributed file system that stores that stores data in the form of small memory block and distributes them across the cluster. Each data is replicated multiple times to ensure data availability.
YARN – It allocates resources which in turn allow different users to execute various applications without worrying about the increased workloads.
MapReduce – It executes tasks in a parallel fashion by distributing it as small blocks.

Complementary/other components
Ambari – Ambari is a web-based interface for managing, configuring and testing big data clusters to support its components like HDFS, MapReduce, Hive , HCatalog, HBase, ZooKeeper, Oozie, Pig and Sqoop. It provides a console for monitoring the health of the clusters as well as allows assessing the performance of certain components such as MapReduce, Pig, Hive, etc., in a user-friendly way.
Cassandra – An open source highly scalable distributed database system based on NoSQL dedicated to handle massive amount of data across multiple commodity servers, ultimately contributing to high availability without a single failure.
Flume – A distributed and reliable tool to for effectively collecting, aggregating and moving bulk of streaming data into HDFS.
HBase – A non-relational distributed databases running on the Big Data Hadoop cluster that stores large amount of structured data. HBase acts as an input for the jobs of MapReduce.
HCatalog – It is a layer of table and storage management which allows the developers to access and share the data.
Hive – Hive is a data warehouse infrastructure that allows summarization, querying, and analyzing of data with the help of a query language similar to SQL.
Oozie – A server-based system that schedules and manages the Hadoop jobs..
Pig – A dedicated high-level platform which is responsible for manipulating the data stored in HDFS with the help of a compiler for Mapreduce and a language called Pig Latin. It allows the analysts to extract, transform and load (ETL) the data without writing the codes for MapReduce.
Solr – A highly scalable search tool which enables indexing, central configuration, failovers and recovery.
Spark – An open source fast engine responsible for Hadoop streaming and supporting SQL, machine learning and processing graphs.
Sqoop – A mechanism to transfer huge amount of data between Hadoop and structured databases.
Zookeeper – An open source application that configures synchronizes the distributed systems.

Some of the interesting facts behind the evolution of Big Data Hadoop are:
The Google File system gave rise to the HDFS
The MapReduce program was created to parse web pages
The Google BigTable directly gave rise to HBase

Why should we use Apache Hadoop?
With evolving Big Data around the world, the demand for Hadoop developers is increasing at rapid pace. The well-versed Hadoop developers with the knowledge of practical implementation are very much required to add value into existing process. However apart from many other reasons, following are the prime reasons to use this technology:

Extensive use of Big Data : More and more companies are realizing that in order to cope with the outburst of data, they will have to implement a technology that could subsume such data into itself and come out with something meaningful and valuable. Hadoop has certainly addressed this concern and the companies are tending towards adopting this technology.

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

What is Hadoop? Introduction to Hadoop Framework

Leave a Reply Cancel reply

Upcoming Events

MarkLogic World | Amsterdam

Knowledge Graph — The Ultimate Center of Excellence

From Text to Value: Pairing Text Analytics and Generative AI

Bringing Data Closer to Decision Makers with Data Fabric

Categories

Tags

You Might Be Interested In

Why AI and machine learning are drifting away from the cloud

How Artificial Intelligence is Changing the Music Industry

Stop Hiring Data Scientists if you’re not ready for Data Science

Recent Jobs

IT Engineer

Data Engineer

Applications Developer

D365 Business Analyst

Do You Want to Share Your Story?

Join our community

Our Services

Company

Work With Us

Follow Us

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.

What is Hadoop? Introduction to Hadoop Framework

Leave a Reply Cancel reply

Upcoming Events

Categories

Tags

You Might Be Interested In

Recent Jobs

Do You Want to Share Your Story?

Join our community

Our Services

Company

Work With Us

Follow Us

Get the 3 STEPS

To Drive Analytics Adoption And manage change

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.

To Drive Analytics Adoption
And manage change