jobgraph

Why Python (IT Best Kept Secret Is Optimization)

Why Python (IT Best Kept Secret Is Optimization)

Why are you recommending Python?  That's the question a colleague of mine asked when I was pitching Python for data science work.  It is a fair question, and I tried to answer with facts and not opinions.  Indeed, answering a question about why a language is better than others can quickly turn into a religious war.  So, let me try to avoid that with some disclaimers.  First of all, I don't think one size fits all: Python is not going to become THE programming language.  Depending on the task, other languages are a much better fit.  For instance, Java for enterprise applications solving well defined problems.  Fortran, C, and C++ are great for HPC. C is dominant for systems programming.  Javascript + node.js, or PHP, are de facto standards for web site implementation.  I could go on forever, as many languages fit a niche.  But when it comes to data science, Python has taken the lead.  Let's look at facts before you start arguing with me.

I am not the only one saying Python has the lead.  Here is a first fact supporting this.  It is the job trends for data science related topics on indeed.com. 

Read Also:
The role of machine learning in data science and analytics

These job trends are for: Python and ("data science" or "big data" or "statistical analysis" or "data mining" or "machine learning"), Scala and ("data science" or "big data" or "statistical analysis" or "data mining" or "machine learning"), R and ("data science" or "big data" or "statistical analysis" or "data mining" or "machine learning") .

I selected R, Python, and Scala for this comparison because they are the most popular open source languages for data science.  R has been for long the dominant open source for statisticians, and by extension, for data science.  But we see that Python is taking over since a couple of years.  Scala is a recent contender, because of its link to Spark and Spark ML but it is a quite distant follower still.

What about commercial software?  I do think that SPSS modeler is here to stay as well for instance.  But its target is a bit different from R, Python or Scala.  Indeed, SPSS modeler is a click and point software aimed at non programmers.  With SPSS modeler one draws the machine learning pipeline, whereas one programs it in Python, R, or Scala.  It is because of this difference that I did not include SPSS modeler in the comparison, as it would be comparing apple to orange.

Read Also:
BI Reporting tools – Uncut Diamond to Refined Diamond

Back to open source, here are other signs of Python popularity.  The table below includes the number of questions on stack overflow, the number of packages in the main package repository for the language, and the programming community index on tiobe.com.  For Scala, to be fair, one should count all Java libraries.  I did not find a simple way to evaluate their numbers, hence I left it blank.

These measure the strength and popularity of the ecosystems built around these languages.  Indeed, when comparing languages, one should not just do a feature by feature comparison, or efficiency benchmarks.  Having a vibrant community that can help newcomers, and that can further advance the language, is key. 

There are probably additional ways to evaluate the importance of an ecosystem, and I welcome suggestions.

We can also get facts about the main data scientists IDE for the languages: IPython/Jupyter for Python notebooks, RStudio for R scripts, and Apache Zeppelin for Scala notebooks.  I look at the number of stack overflow questions, at the number of github repositories using these languages, then the starts, forks, commits, and contributors for the main github directory: Jupyter/IPython, RStudio, and Zeppelin.;

Read Also:
A combination of machine learning and game theory is being used to fight elephant poaching in Uganda

 



Data Innovation Summit 2017

30
Mar
2017
Data Innovation Summit 2017

30% off with code 7wData

Read Also:
The Digital Transformation Journey: Where are you?

Big Data Innovation Summit London

30
Mar
2017
Big Data Innovation Summit London

$200 off with code DATA200

Read Also:
Open Source Toolkits for Speech Recognition

Enterprise Data World 2017

2
Apr
2017
Enterprise Data World 2017

$200 off with code 7WDATA

Read Also:
How to define quality business intelligence

Data Visualisation Summit San Francisco

19
Apr
2017
Data Visualisation Summit San Francisco

$200 off with code DATA200

Read Also:
BI Reporting tools – Uncut Diamond to Refined Diamond

Chief Analytics Officer Europe

25
Apr
2017
Chief Analytics Officer Europe

15% off with code 7WDCAO17

Read Also:
Using Microservices Architecture as API Enablement Strategy

Leave a Reply

Your email address will not be published. Required fields are marked *