Seeing Graphs in Smart Data Lakes

Seeing Graphs in Smart Data Lakes

Seeing Graphs in Smart Data Lakes

Data lakes are synonymous with Hadoop to many people grappling with the promise and the peril of big data. That’s not surprising, considering Hadoop’s unparalleled capability to gobble up petabytes of messy data. But for Barry Zane and other folks at Cambridge Semantics, data lakes are taking on a decidedly graph-like appearance.

Cambridge Semantics, which acquired Zane’s latest startup SPARQL City earlier this year, is beginning to talk about its concept of the smart data lake. The data lake concept is a well-worn one by now. The “smart” part, you may have guessed, owes to the semantic aspect of how the data is stored, how it’s connected to other data in the lake, and the way it impacts how people can extract meaningful information from it.

To Zane’s way of thinking, those who can get the most insights with the least amount of effort have an advantage. Of course, this has always been the case. But the telling part is the fact that Zane—who was founder and CTO of ParAccel (acquired by Actian) and a co-founder and VP of architecture at Netezza (acquired by IBM)–sees graph databases and graph analytic technology as the best way to get there for at least the next 10 years.

Read Also:
Key Trends To Dominate Big Data Analytics

“We strongly believe that this is an extremely effective approach, a future-proof approach,” Zane tells Datanami. “Just as Hadoop basically came of maturity because relational just wasn’t able to work with a certain class of question and wasn’t able to work at a certain scale, we pursue those classes of questions and scale using the graph standards, at an incredible cost and performance advantage, as compared to hiring programmers for every question and analytic you want to perform.”

Zane, who is Cambridge Semantics vice president of engineering, sees graph databases—such as the Anzo Graph Query Engine–as a natural evolution from relational databases, which he says have developed some pretty powerful analytic capabilities themselves over the past 40 years.

“Without a doubt what we’re doing is educated by learning from Netezza, educated from learning from ParAccel.  So I really see it a just an evolution,” Zane says. “The difference is you’re able to ask more interesting question of your data. You’re able to find relationships that are otherwise nearly to impossible to find.”

Read Also:
How AI will help knowledge workers

The core problem with relational database technologies—even the massively parallel processing (MPP) technologies that he championed at ParAccel (which powers Amazon’s Redshift data warehousing service) and Netezza (which IBM has renamed into something that nobody can ever remember)—is the ease at which advanced analytics can be performed, and the length of time it takes to get answers back.

“Being a longtime relational guy, one of the great things about the relational database is that you don’t need to be programmer. You’re able to work with the database through either a set of application layer tools or in the SQL language,” he says.

“The best way to think of SPARQL and RDF is that they’re just the next evolution of relational database SQL,” he continues. “That’s the way I think about it, and that’s what got me excited because you can have people who are not super high trained programmers be able to post queries of the data in a matter of minutes or hours and get back response in a matter of seconds or minutes, as opposed to hiring very highly trained and expensive programmers for any given query.

Read Also:
Big data takes aim at pediatric cancer


Chief Analytics Officer Spring 2017

2
May
2017
Chief Analytics Officer Spring 2017

15% off with code MP15

Read Also:
Big Data: Big Opportunities

Big Data and Analytics for Healthcare Philadelphia

17
May
2017
Big Data and Analytics for Healthcare Philadelphia

$200 off with code DATA200

Read Also:
Finding “Gems” in Big Data

SMX London

23
May
2017
SMX London

10% off with code 7WDATASMX

Read Also:
Key Trends To Dominate Big Data Analytics

Data Science Congress 2017

5
Jun
2017
Data Science Congress 2017

20% off with code 7wdata_DSC2017

Read Also:
5 ways businesses can capitalize on smart data discovery tools

AI Paris

6
Jun
2017
AI Paris

20% off with code AIP17-7WDATA-20

Read Also:
Understanding Machine Learning

Leave a Reply

Your email address will not be published. Required fields are marked *