The Growing Significance Of DevOps For Data Science

The Growing Significance Of DevOps For Data Science

DevOps involves infrastructure provisioning, configuration management, continuous integration and deployment, testing and monitoring.  DevOps teams have been closely working with the development teams to manage the lifecycle of applications effectively.

Data science brings additional responsibilities to DevOps. Data engineering, a niche domain that deals with complex pipelines that transform the data, demands close collaboration of data science teams with DevOps. Operators are expected to provision highly available clusters of Apache Hadoop, Apache Kafka, Apache Spark and Apache Airflow that tackle data extraction and transformation. Data engineers acquire data from a variety of sources before leveraging Big Data clusters and complex pipelines for transforming it.

Data scientists explore transformed data to find insights and correlations. They use a different set of tools including Jupyter Notebooks, Pandas, Tableau and Power BI to visualize data. DevOps teams are expected to support data scientists by creating environments for data exploration and visualization.

Building machine learning models is fundamentally different from traditional application development. The development is not only iterative but also heterogeneous. Data scientists and developers use a variety of languages, libraries, toolkits and development environments to evolve machine learning models. Popular languages for machine learning development such as Python, R and Julia are used within development environments based on Jupyter Notebooks, PyCharm, Visual Studio Code, RStudio and Juno. These environments must be available to data scientists and developers solving ML problems.

Machine learning and deep learning demand massive compute infrastructure running on powerful CPUs and GPUs. Frameworks such as TensorFlow, Caffe, Apache MXNet and Microsoft CNTK exploit the GPUs to perform complex computation involved in training ML models. Provisioning, configuring, scaling and managing these clusters is a typical DevOps function. DevOps teams may have to create scripts to automate the provisioning and configuration of the infrastructure for a variety of environments. They will also need to automate the termination of instances when the training job is done.

Similar to modern application development, machine learning development is iterative. New datasets result in training and evolving new ML models that need to be made available to the users. Some of the best practices of continuous integration and deployment (CI/CD) are applied to ML lifecycle management. Each version of an ML model is packaged as a container image that is tagged differently. DevOps teams bridge the gap between the ML training environment and model deployment environment through sophisticated CI/CD pipelines.

When a fully-trained ML model is available, DevOps teams are expected to host the model in a scalable environment. They may take advantage of orchestration engines such as Apache Mesos or Kubernetes to scale the model deployment.

The rise of containers and container management tools make ML development manageable and efficient. DevOps teams are leveraging containers for provisioning development environments, data processing pipelines, training infrastructure and model deployment environments. Emerging technologies such as Kubeflow and MlFlow focus on enabling DevOps teams to tackle the new challenges involved in dealing with ML infrastructure.

Machine learning brings a new dimension to DevOps. Along with developers, operators will have to collaborate with data scientists and data engineers to support businesses embracing the ML paradigm.

Data science and machine learning are often associated with mathematics, statistics, algorithms and data wrangling.

Share it:
Share it:

[Social9_Share class=”s9-widget-wrapper”]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You Might Be Interested In

Toyota Wants Cars to Predict Heart Attacks

26 Jul, 2017

A heart attack or diabetic blackout can have especially deadly consequences for drivers when they cause car crashes. Toyota researchers hope …

Read more

How to ruthlessly use data like a boss without becoming inhuman

14 Jan, 2017

Through the power of predictive analytics, I can tell you when your employees are looking for new jobs, based on …

Read more

What mainstream VR and AR means for sustainable development

21 Feb, 2021

For the privileged few, augmented reality (AR) and virtual reality (VR) have become commonplace, increasingly used in everything from entertainment …

Read more

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

3-steps-to-drive-analytics-adoption

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.