Strata HadoopWorld Fall 2016 postmortem: Maybe AI's the future

Strata HadoopWorld Fall 2016 postmortem: Maybe AI’s the future, but can we make the data science work?

Strata HadoopWorld Fall 2016 postmortem: Maybe AI’s the future, but can we make the data science work?

Given all the hype over artificial intelligence (AI) these days, at first glance it would seem surprising that it appeared as almost an afterthought at Strata last week.

There were a handful of product announcements, like Maana, which added semantic search-like capabilities in its newest release of its knowledge management platform for resource-intensive industries like oil and gas; and Splunk, which grafted machine learning to its offerings for identifying and resolving incidents from IT system log files.

And in a keynote talk entitled "Connected Eyes," Microsoft's Joseph Sirosh spoke of a project with India's leading eye institute that applied machine learning over large patient populations to improve outcomes for eye surgery.

But this obscures the bigger picture. Conference sponsor O'Reilly acknowledged this by breaking out AI into a separate pre-event track the day before. And anyway, this wasn't a Google Cloud event, where AI was front and center.

So, get used to it. There's plenty of hype going around whether AI can, will, or should replace humans (spoiler alert: the answers are "not"). But even if present-day AI is no smarter than a bunch of idiot savants, there are plenty of practical and often unglamorous jobs that AI's core ingredient, machine learning (ML), is already performing.

Read Also:
What I learnt from creating The Data Visualisation Catalogue

Last year at Strata, we saw ML becoming almost ubiquitous in tooling for data management and governance of data lakes from providers from A to Z.

The rationale for using ML, rather than static governance rules, is due to the nature of data lakes. Unlike data warehouses, you won't know exactly what data will flow in, and so therefore, it won't be practical to build rules ahead of time dictating schema, data quality, de-duping, or identifying what data is likely to be sensitive (even weblogs could give PII data away).

Governance, whether it involves preparing data, building a catalog, and identifying master or reference data may be a moving target requiring the system to "learn" how the norms are changing.

And there's ML elsewhere as well. Providers like Cloudera build ML into the trouble ticket tracking that backs the automated "phone home" function of subscriber client technical support.

As we noted with our take on DataRobot, there is a growing array of tools aimed at simplifying or accelerating different aspects of the lifecycle of building and deploying ML programs.

Read Also:
Rocana Releases Rocana Ops 1.5: Real Data Warehousing for IT Operations

And ML is showing up in end user analytic tools that help humans parse the signals in data, wrangle it into shape, suggest which questions to ask, and help piece together the narrative.

In other words, when it comes to the packaged software tools that govern big data or analyze it, we're probably starting to take embedded machine learning for granted.

But what if your own data scientists want to get their own hands dirty? As we noted a few weeks back, there's a lot of pent up enthusiasm among R and Python programmers for ML, which many look at as the latest shiny, new thing.

But for all the enthusiasm, at least among Spark users, SQL and streaming are more frequent workloads according to the 2016 Spark Survey just released by Databricks.


Leave a Reply

Your email address will not be published. Required fields are marked *