Scala

Scala, the Language for Data Science

Scala, the Language for Data Science

Let’s be honest, there are two reasons why it’s worth learning a new programming language. The first reason is because you will need it for your daily job and the second reason is because it’s fun.

The programming language Scala is something you would like to learn by the end of this post if you work in Data Science. Why? Because it’s a distributed-ready language, it is Open Source, runs in the JVM, it’s interactive and because Apache Spark is almost fully written in Scala, and can deal with billions of records with good performance.

First, a bit of history. The Scala language was created by Martin Odersky in 2003. It is Open Source which means among other things high interoperability with other Open Source tools written in Java. Scala runs in the Java Virtual Machine or JVM and it has Java interoperability, which means you can run Java code in Scala and you could create a Scala class extending a Java class. I assume we can agree that no single tool can do the whole process of data analysis, therefore, integration with other tools is key.

Read Also:
How artificial intelligence will transform Wall Street

Let’s agree that scaling out (adding more cores to the infrastructure) is the way of getting more processing power these days rather than scaling up (speeding up the cores). In this scenario, parallelization represents the way of doing things performantly. Scala is a distributed-ready language, meaning the same code will run in a single core machine or in as many cores as they are available for the task. This is important if you want to run machine learning tasks and make sure they are optimized to perform well. The language is taking care of the infrastructure optimization. “Once you have distributed computing available the next step is to do Data Science” (Andy Petrella).

 



Data Innovation Summit 2017

30
Mar
2017
Data Innovation Summit 2017

30% off with code 7wData

Read Also:
Communicating data science: A guide to presenting your work

Big Data Innovation Summit London

30
Mar
2017
Big Data Innovation Summit London

$200 off with code DATA200

Read Also:
Machine learning is all the rage with Big Data developers
Read Also:
U.S. Chief Data Officer: 'Time is Now' For Technologists to Jump into Public Service

Enterprise Data World 2017

2
Apr
2017
Enterprise Data World 2017

$200 off with code 7WDATA

Read Also:
So You Want To Be a Data Scientist: A Guide for College Grads

Data Visualisation Summit San Francisco

19
Apr
2017
Data Visualisation Summit San Francisco

$200 off with code DATA200

Read Also:
AI or Fintech: Which will have a bigger impact on finance

Chief Analytics Officer Europe

25
Apr
2017
Chief Analytics Officer Europe

15% off with code 7WDCAO17

Read Also:
6 Big Data Predictions for 2017

Leave a Reply

Your email address will not be published. Required fields are marked *