AAEAAQAAAAAAAAI1AAAAJGM2NGViZDhjLWYxNTktNGNiMS04OTdjLTU0Nzk2MWQyZmYxMw

Agile Data Scientists Do Scale

Agile Data Scientists Do Scale

 

Due to the hype and rapid growth of Big Data Engineering and Data Science, it seems many companies and practitioners have gotten so excited by hiring, building infrastructure, fashionable models and shiny technology that one crucial part of the field seems to be missing - delivery. I hear of countless stories, in both small & large companies, where teams are built, clusters bought, prototype algorithms written and software is installed, but it then takes months or even longer to deliver working data driven applications, or for insights to be acted on. Hype is thick in the air but delivery is thin on the ground.

The review and related blogs correctly point out that we should focus on AI applications, that is automation. My addition is that these applications can not always be easily bought in for many domains. In such cases they should be built in-house and the builders ought to be Agile Big Data Engineers and Data Scientists that understand the importance of weekly or fortnightly iteration. The title of Data Scientist is not dead, but keeping Data Science alive will mean shifting the focus of the Data Scientist away from hacking, ad hoc analysis and prototyping and on to high quality code, automation, applications and Agile methodologies. Let's remember the technology industry has a habit of finding ways to automate the job of those that lack the imagination to transition to automators, i.e. those that cannot be scaled.

Read Also:
Don't use the cloud like a data warehouse

Why Agile methodologies are so lacking in Data Science and Big Data is confusing. Perhaps it's the age of the industry? To be frank I believe a smidgen of elitism aims to distinguish the industries from regular software development as though the practices and principles are beneath the concerns of mighty data minds. One other issue seems to be a big misconception that "exploratory" work precludes frequent iteration over automated end to end applications, that is Data Scientists claims they need to "explore" for a month or two before they can deliver. This I see as ironic since the tension between exploratory work and continuous delivery is exactly what Agile solves. Finally another recurring misconception is that the day to day practices of Agile, like tests, automation, clean code and clean structure are "time consuming" and will slow down "exploratory" work. This is also ironic since again Agile aims to make exploratory work faster and less laborious. Hopefully the details of my posts will flesh out why these objections are misconceptions.

Read Also:
3 Things That Can Stall Innovation (And How To Overcome Them)

Automatic tests are absolutely critical in correctly practicing Agile, and from TDD evolved more acronyms and terms than many Data Scientists have written tests; TDD, BDD, DDD, ATDD, SDD, EDD, CDD, unit tests, integrations tests, black box tests, end-to-end tests, systems tests, acceptance tests, property based tests, example based tests, functional tests, contract based tests, etc. At a glance things like interactive work, long running jobs, unclear objectives, peculiar development environments, etc preclude *DD approaches. Nevertheless, if one strips away the unnecessary verbosity of *DD the remaining core can easily accommodate such problems.

Sometimes Data Science can feel like academia except much better paid. So until the bubble bursts, which still doesn't seem to be any time soon, should we just have as much fun as possible?



Chief Analytics Officer Europe

25
Apr
2017
Chief Analytics Officer Europe

15% off with code 7WDCAO17

Read Also:
6 Cloud Based Machine Learning Services
Read Also:
Using Predictive Algorithms to Track Real Time Health Trends

Chief Analytics Officer Spring 2017

2
May
2017
Chief Analytics Officer Spring 2017

15% off with code MP15

Read Also:
4 startups that are disrupting the time suck that is commuting

Big Data and Analytics for Healthcare Philadelphia

17
May
2017
Big Data and Analytics for Healthcare Philadelphia

$200 off with code DATA200

Read Also:
Using Predictive Algorithms to Track Real Time Health Trends

SMX London

23
May
2017
SMX London

10% off with code 7WDATASMX

Read Also:
6 Cloud Based Machine Learning Services

Data Science Congress 2017

5
Jun
2017
Data Science Congress 2017

20% off with code 7wdata_DSC2017

Read Also:
How Digital Health Startups Can Leverage Intellectual Property

Leave a Reply

Your email address will not be published. Required fields are marked *