Open Sourcing SparkADMM:

Open Sourcing SparkADMM: a Massively-parallel Framework for Solving Big Data Problems

Open Sourcing SparkADMM: a Massively-parallel Framework for Solving Big Data Problems

Training machine learning models over massive amounts of data is a cornerstone of many data analytics tasks. Usually this involves solving large optimization problems involving millions of optimization variables and constraints. Doing so over a parallel platform, like Spark or Hadoop, is crucial to making such computations scalable.

It is not always obvious how to solve large optimization problems in parallel. ADMM, which stands for the Alternating Directions Method of Multipliers, is a popular parallel optimization technique that provides a methodology for doing so. It permits the parallelization of a broad array of several important machine learning tasks, such as regression and classification, in a massively parallel fashion. For example, to train a classifier using ADMM over a very large dataset, a developer first splits the dataset and partitions it across multiple machines. A classifier is trained on each machine, based on the locally-stored portion of the dataset. Then, a global classifier learned from the entire dataset is extracted through consensus; ADMM averages out these classifiers and repeats the process through several iterations, forcing the local computations to be closer to the consensus value each time. This way, after several iterations, ADMM constructs a “consensus” classifier, which provably fits the entire dataset.

Read Also:
How Can Lean Six Sigma Help Machine Learning?

ADMM’s strength lies in its generality: it gives a template on how to take any serial machine learning algorithm designed to operate locally on a single dataset, and parallelize its execution over thousands of machines.

 



Chief Analytics Officer Spring 2017

2
May
2017
Chief Analytics Officer Spring 2017

15% off with code MP15

Read Also:
Big data has a bigger future

Big Data and Analytics for Healthcare Philadelphia

17
May
2017
Big Data and Analytics for Healthcare Philadelphia

$200 off with code DATA200

Read Also:
How Topic Modeling Can Change How Brands Interact with Customers

SMX London

23
May
2017
SMX London

10% off with code 7WDATASMX

Read Also:
Digital Transformation helping Smart Cities flourish

Data Science Congress 2017

5
Jun
2017
Data Science Congress 2017

20% off with code 7wdata_DSC2017

Read Also:
Transient Clusters in the Cloud for Big Data

AI Paris

6
Jun
2017
AI Paris

20% off with code AIP17-7WDATA-20

Read Also:
The Internet of Things Is Changing How We Manage Customer Relationships
Read Also:
How Apache Kafka promises to be your enterprise's central nervous system for data

Leave a Reply

Your email address will not be published. Required fields are marked *