Lambda Architecture for Big Data Systems

Lambda Architecture for Big Data Systems

Lambda Architecture for Big Data Systems

Big data analytical ecosystem architecture is in early stages of development. Unlike traditional data warehouse / business intelligence (DW/BI) architecture which is designed for structured, internal data, big data systems work with raw unstructured and semi-structured data as well as internal and external data sources. Additionally, organizations may need both batch and (near) real-time data processing capabilities from big data systems. Lambda architecture - developed by Nathan Marz - provides a clear set of architecture principles that allows both batch and real-time or stream data processing to work together while building immutability and recomputation into the system. Batch processes high volumes of data where a group of transactions is collected over a period of time. Data is collected, entered, processed and then batch results produced. Batch processing requires separate programs for input, process and output. An example is payroll and billing systems. In contrast, real-time data processing involves a continual input, process and output of data. Data must be processed in a small time period (or near real-time). Customer services and bank ATMs are examples. Lambda architecture has three (3) layers:

Read Also:
Going with the stream: unbounded data processing with Apache Flink

Batch Layer (Apache Hadoop) Hadoop is an open source platform for storing massive amounts of data. Lambda architecture provides "human fault-tolerance" which allows simple data deletion (to remedy human error) where the views are recomputed (immutability and recomputation). The batch layer stores the master data set (HDFS) and computes arbitrary views (MapReduce). Computing views is continuous: new data is aggregated into views when recomputed during MapReduce iterations.

 



Data Innovation Summit 2017

30
Mar
2017
Data Innovation Summit 2017

30% off with code 7wData

Read Also:
Bridge the CxO Gap with a Data-Driven Approach

Big Data Innovation Summit London

30
Mar
2017
Big Data Innovation Summit London

$200 off with code DATA200

Read Also:
Rise In Cognitive Computing In 2017

Enterprise Data World 2017

2
Apr
2017
Enterprise Data World 2017

$200 off with code 7WDATA

Read Also:
Machine learning: The big disruptor, says Ovum

Data Visualisation Summit San Francisco

19
Apr
2017
Data Visualisation Summit San Francisco

$200 off with code DATA200

Read Also:
Why monetization of transaction data is not always worth the risk
Read Also:
How To Use Twitter to Learn Data Science (or anything)

Chief Analytics Officer Europe

25
Apr
2017
Chief Analytics Officer Europe

15% off with code 7WDCAO17

Read Also:
Machine learning: The big disruptor, says Ovum

Leave a Reply

Your email address will not be published. Required fields are marked *