Real-Time Big Data Processing with Spark and MemSQL

Real-Time Big Data Processing with Spark and MemSQL

Real-Time Big Data Processing with Spark and MemSQL

Learn how you can maximize big data in the cloud with Apache Hadoop. Download this eBook now. Brought to you in partnership with Hortonworks.

I got an opportunity to work extensively with big data and analytics in Myntra, an e-commerce store based in India. Data-driven intelligence is one of the core values at Myntra, so crunching and processing data and reporting meaningful insights for the company is of utmost importance.

Every day, millions of users visit Myntra via the app or website, generating billions of clickstream events. It's very important for the data platform team to scale to such a huge number of incoming events, ingest them in real-time with minimal or no loss, and process the unstructured or semi-structured data to generate insights.

We use a varied set of technologies and in-house products to achieve the above, including Go, Kafka, Secor, Spark, Scala, Java, S3, Presto, and Redshift.

As more and more business decisions tend to be based on data and insights, batch and offline reporting from data were simply not enough. We required real-time user behavior analysis, real-time traffic, real-time notification performance, and more to be available with minimal latency. We needed to ingest as well as filter and process data in real-time and also persist it in a write-fast performant data store to do dashboarding and reporting.

Read Also:
Apache Spark: A Unified Engine for Big Data Processing

Meterial is a pipeline that does exactly this and even more with a feedback loop for other teams to take action from the data in real-time.

Our event collectors written in Golang sit behind Amazon ELB to receive events from the app or website. They add a timestamp to the incoming clickstream events and push them into Kafka. From Kafka, a Meterial-ingestion layer based on Apache Spark streaming ingests somewhere around four million events per minute, filters and transforms the incoming events based on a configuration file, and persists them to a MemSQL row-store every minute.



Chief Analytics Officer Spring 2017

2
May
2017
Chief Analytics Officer Spring 2017

15% off with code MP15

Read Also:
6 Ideas to Help Government Realize Open Data

Big Data and Analytics for Healthcare Philadelphia

17
May
2017
Big Data and Analytics for Healthcare Philadelphia

$200 off with code DATA200

Read Also:
Invasion of the Data Snatchers: Big Data and the Internet of Things Means the Surveillance of Everything
Read Also:
The 20 top data scientists in banking and finance

SMX London

23
May
2017
SMX London

10% off with code 7WDATASMX

Read Also:
5 Ways the Second Machine Age Will Morph the Future of Big Data

Data Science Congress 2017

5
Jun
2017
Data Science Congress 2017

20% off with code 7wdata_DSC2017

Read Also:
5 Surprising Ways Data Can Be Used

AI Paris

6
Jun
2017
AI Paris

20% off with code AIP17-7WDATA-20

Read Also:
Data Increasingly Used As a Force for Good

Leave a Reply

Your email address will not be published. Required fields are marked *