Spark’s New Deep Learning Tricks

Spark's New Deep Learning Tricks

Imagine being able to use your Apache Spark skills to build and execute Deep learning workflows to analyze images or otherwise crunch vast reams of unstructured data. That’s the gist behind Deep learning Pipelines, a new open source package unveiled yesterday by Databricks.

Deep Learning Pipelines, which was unveiled at the Spark Summit conference in San Francisco Tuesday, will essentially provide a way to extend the Spark MLlib library to popular deep learning frameworks like TensorFlow and Keras.

This will allow Spark users to leverage existing work they’ve done in MLlib, and to execute deep learning models directly in Spark’s existing Machine Learning library, says Reynold Xin, co-founder and chief architect at Databricks, the commercial outfit behind Apache Spark.

“It’s a library to integrate essentially all deep learning libraries with Spark to make deep learning substantially easier without having to actually learn about the specifics of deep learning,” Xin tells Datanami.

Deep Learning Pipelines will start out as its own source project, separate from the Apache Spark project, Xin says. Over time, depending on how things go, it could become a part of the main Apache Spark project. “It’s possible” that it will become a part of the Apache Spark project, he says. “We haven’t actually thought a lot about it. We want to get it out there and work with users.”

In the meantime, Databricks will include the new deep learning library in its own Spark-based software as a service (SaaS) offering. Databricks’ version will leverage the concept of transfer learning to take existing deep learning models available in the open domain and modify them to make them more applicable to its customers’ specific domains, Xin says.

“There might be a generic model for doing image classification, but maybe one of our customers wants to detect what kind of car is in a picture,” he says. “We have this techniques called transfer learning built into this library that, with just a few lines of code, allows users to apply an existing model, published by pretty much anybody on the Internet, and then retrain it on a much smaller amount of data in a much faster fashion — in just a few minutes — and then get a better model for their domain.”

Another cool feature that Databricks is adding with Deep Learning Pipelines is the capability to expose a trained deep learning model as SQL.

“With one line of code now the data scientist or data engineer who actually trains the model can make this model available as a SQL function,” Xin says. “So even a business analyst will be able to build, for example, predictions in their BI tools.”

Deep Learning Pipelines supports TensorFlow and Keras now, but will likely be bolstered to support other popular deep learning frameworks. Mxnet is popular on Amazon, while Theano, Torch, and Caffe are also gaining more attention as deep learning techniques become more popular.

This isn’t Spark’s first forayinto deep learning or GPU computing. But the folks at Databricks are bullish that the new Deep Learning Pipelines project could revolutionize deep learning for a more general audience.

“We do see that this library has the potential to do for deep learning what Spark did for big data, to make deep learning much more accessible to everybody,” Xin says. “Deep learning is at a similar stage right now to what MapReduce was for big data.

Share it:
Share it:

[Social9_Share class=”s9-widget-wrapper”]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You Might Be Interested In

How to manage hybrid cloud costs: 4 tips

22 Sep, 2020

Today’s hybrid cloud and multicloud environments reflect the overarching reality that cloud usage – including all manner of software delivered as …

Read more

How to Capture More Value from Big Data and Analytics

5 Feb, 2017

Big data is here and in a big way. According to IDC, big data and analytics sales are forecasted to …

Read more

Big Data and IT Asset Management: A marriage made in heaven or hell?

8 Aug, 2016

The term “big data” is thrown around these days as though it’s been around for a long time – the …

Read more

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

3-steps-to-drive-analytics-adoption

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.