Scala

Scala, the Language for Data Science

Scala, the Language for Data Science

Let’s be honest, there are two reasons why it’s worth learning a new programming language. The first reason is because you will need it for your daily job and the second reason is because it’s fun.

The programming language Scala is something you would like to learn by the end of this post if you work in Data Science. Why? Because it’s a distributed-ready language, it is Open Source, runs in the JVM, it’s interactive and because Apache Spark is almost fully written in Scala, and can deal with billions of records with good performance.

First, a bit of history. The Scala language was created by Martin Odersky in 2003. It is Open Source which means among other things high interoperability with other Open Source tools written in Java. Scala runs in the Java Virtual Machine or JVM and it has Java interoperability, which means you can run Java code in Scala and you could create a Scala class extending a Java class. I assume we can agree that no single tool can do the whole process of data analysis, therefore, integration with other tools is key.

Read Also:
How Big Data is Revolutionizing Corporate Training

Let’s agree that scaling out (adding more cores to the infrastructure) is the way of getting more processing power these days rather than scaling up (speeding up the cores). In this scenario, parallelization represents the way of doing things performantly. Scala is a distributed-ready language, meaning the same code will run in a single core machine or in as many cores as they are available for the task. This is important if you want to run machine learning tasks and make sure they are optimized to perform well. The language is taking care of the infrastructure optimization. “Once you have distributed computing available the next step is to do Data Science” (Andy Petrella).

 



Data Science Congress 2017

5
Jun
2017
Data Science Congress 2017

20% off with code 7wdata_DSC2017

Read Also:
Evaluating HTAP Databases for Machine Learning Applications

AI Paris

6
Jun
2017
AI Paris

20% off with code AIP17-7WDATA-20

Read Also:
Big Data Mistakes That Most Companies Make
Read Also:
The data science project lifecycle

Chief Data Officer Summit San Francisco

7
Jun
2017
Chief Data Officer Summit San Francisco

$200 off with code DATA200

Read Also:
How Big Data is Revolutionizing Corporate Training

Customer Analytics Innovation Summit Chicago

7
Jun
2017
Customer Analytics Innovation Summit Chicago

$200 off with code DATA200

Read Also:
In big data, industrialization is innovation

Big Data and Analytics Marketing Summit London

12
Jun
2017
Big Data and Analytics Marketing Summit London

$200 off with code DATA200

Read Also:
Data Lake Governance Best Practices

Leave a Reply

Your email address will not be published. Required fields are marked *