Benchmarks to prove you need an analytical database for Big Data

Benchmarks to prove you need an analytical database for Big Data

Benchmarks to prove you need an analytical database for Big Data

When it comes to big data, it’s important to ask “What’s next?” Sure, users find lower licensing costs when storing data in Hadoop—although they often do pay for subscriptions. Storing data efficiently in a cluster of nodes is the table stakes for data management today. However, it’s important to remember what happens next. The next step is often about performing analytics on the data as it sits in the Hadoop cluster. When it comes to this, our internal benchmarking testing reveals limitations of the Apache Hadoop platform.

We set up a 5-node cluster of Hewlett Packard Enterprise DL380 ProLiant servers. We created 3 TBs of data in ORC, Parquet, and our own ROS format. Then, we put the TPC-DS benchmarks to the test with Vertica, Impala, Hive on Tez, and even Apache Spark. We took a look at CDH 5.7.1 and Impala 2.5.0 and HDP 2.4.2 Hawq 2.0 in comparison to Vertica.

We first took note of whether all the benchmarks would run. This becomes important when you’re thinking about the analytical workload. Do you plan to perform any complex analytics? In our benchmarks, Vertica completed 100% of the TPC-DS benchmarks while all others could not.

Read Also:
Big data in financial markets is now getting the 'fintech' treatment

Initially, the results were very poor for the Hadoop-based solutions until we found some rewritten queries on github for some tools.

For example, if you want to perform time series analytics and the queries are not available, how much will it cost you to engineer a solution? How many lines of code will you have to write and maintain to accomplish the desired analytics.

 



Chief Analytics Officer Spring 2017

2
May
2017
Chief Analytics Officer Spring 2017

15% off with code MP15

Read Also:
Does Open Data Really Empower Consumers?

Big Data and Analytics for Healthcare Philadelphia

17
May
2017
Big Data and Analytics for Healthcare Philadelphia

$200 off with code DATA200

Read Also:
What is Analytic Athleticism and Why is it Important?

SMX London

23
May
2017
SMX London

10% off with code 7WDATASMX

Read Also:
Digital Transformation Requires Agility and Energy Measurement

Data Science Congress 2017

5
Jun
2017
Data Science Congress 2017

20% off with code 7wdata_DSC2017

Read Also:
Business Intelligence vs. CRM: Which one is best for you?
Read Also:
Demystifying Advanced Data Visualization

AI Paris

6
Jun
2017
AI Paris

20% off with code AIP17-7WDATA-20

Read Also:
Think Managing Big Data Is Much Too Complex? Just Wait

Leave a Reply

Your email address will not be published. Required fields are marked *