Expert Interview: HPE Vertica's Steve Sarsfied on Big Data Innovations

Expert Interview: HPE Vertica’s Steve Sarsfied on Big Data Innovations

Expert Interview: HPE Vertica’s Steve Sarsfied on Big Data Innovations

Syncsort’s Paige Roberts caught up with Steve Sarsfield from Hewlett Packard Enterprise (HPE) at the latest Strata. Steve is the product marketing manager for HPE Big Data Software, focused on their Vertica for SQL on Hadoop product. Steve is also a notable name in the arena of data quality and governance, and authored the book The Data Governance Imperative. Enjoy some keen industry insight in this interview between Paige and Steve.

So, we’re at Strata, and you’re a Vertica person. What do you feel the intersection is for Hadoop and Vertica?

HPE and Hadoop really intersect quite a bit when it comes to some of the innovations that we’re working on. We have some great innovations that we’re showing [here at Strata]. One of the innovations is our big data reference architectures, which we’ve designed to work in partnership with Hadoop, specifically HDFS and YARN. One of the offerings we have are these reference architectures that allow you to use YARN labels to specify compute and storage, and break up compute and storage. So if you want to make that dynamic within the organization, you can use YARN labels to specify how much compute and how much storage you want to use for any job.

Read Also:
The Merging of Social Media, Big Data, Perpetually Connected Consumers and AI... Nirvana, or the End of Free-Will?

The second part is that we have HPE Vertica for SQL on Hadoop. That is a product that allows you to install our Vertica engine directly into the Hadoop cluster and perform SQL queries on Hadoop. It’s 100% TPC-DS compliant, fully ANSI SQL compliant and can be installed either in the Hadoop cluster or separately as a Vertica cluster. It’s a high-performance engine, and we’re happy to show that off here at Strata, too.

Syncsort and Vertica have been pretty tight over the years.

What do you see as the synergies? What makes it such a good partnership?

Our strength is in providing very fast analytics for massive amounts of data. We focus all of our effort, from the way we store data to the way we compress columns, so that the analysis happens fast. What Syncsort brings to the table is the basic concept of getting the data into the database. That’s really important, because although we ingest data, we don’t have that completely covered. If you have complex data or particularly tricky data, we rely on our partnerships like Syncsort. I think that’s a really important component, especially in today’s age when there are so many different file formats and unstructured data and a lot of options when it comes to storing data. We need a partner like you guys to do it.

Read Also:
Open data as a game

This is a question I’ve been asking everyone to get different perspectives. What do you think Hadoop is for?

It’s a “make you think” question.

Hadoop is a general term that describes many projects that are going on in the open source community. Hadoop and specifically HDFS is primarily to store data at a very low cost. There’s data that companies gather that they aren’t really sure what it’s good for or what value it has. They need some low-cost place to put it. Hadoop, or at least the HDFS component of Hadoop, is a really good place for that. The whole Hadoop community is based on the fact that more and more data is coming at us. However, what we aren’t seeing is IT budgets growing by a lot. What I hear is data volumes growing by 25 to 50 percent, or more in certain companies, but IT budgets are growing by about 4 percent. So companies are looking for ways to store data at a low cost, and that’s one of the functions Hadoop does well. The other thing is around data discovery, understanding what data you have, getting into the data to see if there’s any value there. Those two components are what I think it’s for. Beyond that, it’s pretty exciting to see all the other things that the Hadoop community is incubating. Countless projects that help companies manage big data.

Read Also:
My Experience as a Freelance Data Scientist

What do you think of Spark?

Spark is really exciting technology. It seems like something that will be really powerful in the future.

 



Chief Analytics Officer Europe

25
Apr
2017
Chief Analytics Officer Europe

15% off with code 7WDCAO17

Read Also:
The rise of self-learning software

Chief Analytics Officer Spring 2017

2
May
2017
Chief Analytics Officer Spring 2017

15% off with code MP15

Read Also:
Hacking the Data Science Radar with Data Science

Big Data and Analytics for Healthcare Philadelphia

17
May
2017
Big Data and Analytics for Healthcare Philadelphia

$200 off with code DATA200

Read Also:
Managed Data Lakes Deliver Exceptional Value and Accessibility

SMX London

23
May
2017
SMX London

10% off with code 7WDATASMX

Read Also:
The rise of self-learning software

Data Science Congress 2017

5
Jun
2017
Data Science Congress 2017

20% off with code 7wdata_DSC2017

Read Also:
7 Powerful tips to convert your Big Data to Smart Data

Leave a Reply

Your email address will not be published. Required fields are marked *