Spark 2.0 takes an all-in-one approach to big data

Spark 2.0 takes an all-in-one approach to big data

Spark 2.0 takes an all-in-one approach to big data

Apache Spark, the in-memory processing system that's fast become a centerpiece of modern big data frameworks, has officially released its long-awaited version 2.0.

Aside from some major usability and performance improvements, Spark 2.0's mission is to become a total solution for streaming and real-time data. This comes as a number of other projects -- including others from the Apache Foundation -- provide their own ways to boost real-time and in-memory processing.

Most of Spark 2.0's big changes have been known well in advance, which has made them even more hotly anticipated.

One of the largest and most technologically ambitious additions is Project Tungsten, a reworking of Spark's treatment for memory and code generation. Pieces of Project Tungsten have showed up in earlier releases, but 2.0 adds more, such as applying Tungsten's memory management to both caching and runtime execution.

For users, these changes, plus a great many other under-the-hood improvements, provide across-the-board performance gains. Spark's developers claim a two-to-tenfold increase in speed for common DataFrames and SQL operations, thanks to a new code generation system. Window functions, used for tasks like moving averages in data, have been reimplemented natively for further speed-ups.

Read Also:
A New Take on Master Data Management

Spark 2.0 also brings a major shift in programming APIs. DataFrames and Datasets, previously two different ways of accessing structured data, are now the same under the hood; DataFrames are now "just a type alias for Dataset of Row," per Spark's release notes.

 



Data Science Congress 2017

5
Jun
2017
Data Science Congress 2017

20% off with code 7wdata_DSC2017

Read Also:
Converging IoT, Cloud, and Big Data Technologies to Revolutionize the World

AI Paris

6
Jun
2017
AI Paris

20% off with code AIP17-7WDATA-20

Read Also:
Analytics in the driver’s seat at Ford

Chief Data Officer Summit San Francisco

7
Jun
2017
Chief Data Officer Summit San Francisco

$200 off with code DATA200

Read Also:
11 Best Practices for Business Intelligence

Customer Analytics Innovation Summit Chicago

7
Jun
2017
Customer Analytics Innovation Summit Chicago

$200 off with code DATA200

Read Also:
Lack of Big Data Analytics Agility Hobbles Healthcare Orgs

Big Data and Analytics Marketing Summit London

12
Jun
2017
Big Data and Analytics Marketing Summit London

$200 off with code DATA200

Read Also:
Converging IoT, Cloud, and Big Data Technologies to Revolutionize the World
Read Also:
Machine learning in markets: When intelligent algorithms start spoofing each other regulation becomes a science

Leave a Reply

Your email address will not be published. Required fields are marked *