SQL on Hadoop benchmarks get serious

SQL on Hadoop benchmarks get serious

Unless a "neutral" third party publishes them, we tend to view benchmarks as self-serving exercises that vendors typically stack in their own favor. But recent benchmarks issued by Cloudera and Hortonworks for their SQL on Hadoop engines point to something serious going on. In an era of Spark hype, SQL remains table stakes for Hadoop platforms.

Yes, you can perform machine learning, model customer ecosystems as social graphs, run streaming, and conduct sentiment analysis, but for most organizations, the first question they often ask is how fast is the interactive SQL. Using Hadoop only for SQL query might seem like a waste, given its appeal to R or Python developers. But getting buy-in requires satisfying the BI crowd, because in many organizations, SQL's the gateway drug to Hadoop.

And looking at the benchmarking press releases, you get a sense of who's afraid of whom. For Cloudera, it's Amazon. Competitive benchmarks pitted Impala 2.6, Cloudera's SQL-on-Hadoop MPP engine, against Amazon Redshift columnar analytic database. The results, announced a couple weeks back at Strata, showed Impala performing four to 10x faster on either S3 (which Redshift doesn't use) or EBS (which it does).

Cloudera is stating that now even a database that is decoupled from storage (Impala) can perform better than one that followed a traditional tightly coupled data warehouse architecture (Redshift). It's a shot across the bow, given if you want consistent SLAs, high concurrency, or support of very complex SQL syntax, conventional wisdom has been to use a database rather than Hadoop. Cloudera's results don't change that reality, but they do show results in the ballpark with Redshift. And they get the results using AWS's default S3 storage.

But Cloudera's underlying message is not just that Impala has been tuned to go faster. It knows that, while only a minority of customers are deploying to the cloud today, in the long term, the writing is on the wall.

 

Share it:
Share it:

[Social9_Share class=”s9-widget-wrapper”]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You Might Be Interested In

HPE Turns to Containers With Its Ezmeral Software

29 Jun, 2020

The News: During HPE’s virtual Discover event this week, the company unveiled its new Ezmeral software platform which includes container …

Read more

How Hard Is It to Be a Real Data Scientist?

7 Sep, 2020

How many mathematicians study Linguistics? How many mathematicians study Healthcare? So why are we any good at solving problems in …

Read more

ML and BI Are Coming Together, Gartner Says

19 Feb, 2020

The convergence of machine learning and business intelligence is upon us, as BI tool makers increasingly are exposing ML capabilities …

Read more

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

3-steps-to-drive-analytics-adoption

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.