Coder Social home page Coder Social logo

azureml-containers's Introduction

Azure Machine Learning base images

This repository contains Dockerfiles for the base images used in Azure Machine Learning.

Table of Contents

Introduction

These Docker images serve as base images for training and inference in Azure ML. While submitting a training job on AmlCompute or any other target with Docker enabled, Azure ML runs your job in a conda environment within a Docker container.

You can also use these Docker images as base images for your custom Azure ML Environments. If you specify any conda dependencies in your Environment, the extra dependencies are installed on top of the dependencies in the Docker image.

Note that these base images do not come with Python packages, notably the Azure ML Python SDK, installed. If you require the Azure ML SDK package for your job, make sure you also install the appropriate package.

Please note that images supporting Ubuntu 16.04 are now deprecated. We recommend using images supporting Ubuntu 18.04 for the timebeing as we transition towards providing 20.04 images.

Base image dependencies

Currently Azure ML supports cuda9, cuda10 and cuda11 base images. The major dependencies installed in the base images are Miniconda, OpenMPI, CUDA, cuDNN, NCCL, and git. For more detailed information, please view the dockerfiles.

The CPU images are built from ubuntu18.04 and ubuntu20.04.

The GPU images for cuda9 are built from nvidia/cuda:9.0-cudnn7-devel-ubuntu16.04.

The GPU images for cuda10 are built from:

  • nvidia/cuda:10.1-cudnn7-devel-ubuntu18.04
  • nvidia/cuda:10.2-cudnn7-devel-ubuntu18.04
  • nvidia/cuda:10.2-cudnn8-devel-ubuntu18.04

The GPU images for cuda11 are built from:

  • nvidia/cuda:11.0.3-cudnn8-devel-ubuntu18.04
  • nvidia/cuda:11.1-cudnn8-devel-ubuntu18.04
  • nvidia/cuda:11.1.1-cudnn8-devel-ubuntu20.04

How to get the Azure ML images

All images in this repository are published to Microsoft Container Registry (MCR).

You can pull these images from MCR using the following command:

  • CPU image example: docker pull mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04
  • GPU image example: docker pull mcr.microsoft.com/azureml/openmpi3.1.2-cuda10.1-cudnn7-ubuntu18.04

If you observe the naming convention, image tag maps to the folder name that contains the corresponding Dockerfile.

GPU images pulled from MCR can only be used with Azure Services. Take a look at LICENSE.txt file inside the docker container for more information. GPU images are built from nvidia images. For NVIDIA CUDA and cuDNN take a look at the ThirdPartyNotices.txt file inside the docker container for more information about NVIDIA’s license terms

Featured tags

Below is the list of tags:

Using Azure ML base images for training

In some cases, the Azure ML base images will be used by default:

  • By default, if no base image is explicitly set by the user for a training run, Azure ML will use the image corresponding to azureml.core.environment.DEFAULT_CPU_IMAGE.

  • If you are using an Azure ML curated environment, those are already configured with one of the Azure ML base images. To see which base image is used by a specific curated environment, you can run the following:

    from azureml.core import Environment
    
    curated_env_name = 'AzureML-pytorch-1.7-ubuntu18.04-py37-cuda11-gpu'
    pytorch_env = Environment.get(workspace=ws, name=curated_env_name)
    print(pytorch_env.docker.base_image)

If you want to instead explicitly use one of the Azure ML base images for your job, you can follow the steps below.

Prerequisites

Create an Azure ML Environment

If your training script requires additional dependencies, create a YAML file that defines the conda dependencies. In the below example, the file is named conda_dependencies.yml:

channels:
- conda-forge
dependencies:
- python=3.6.2
- pip:
  - azureml-defaults
  - tensorflow-gpu==2.2.0

Then, create an Azure ML environment from this conda environment specification.

from azureml.core import Environment

env = Environment.from_conda_specification(name='my-env', file_path='./conda_dependencies.yml')

If your script does not require any additional dependencies and you would just like to use the base image directly, just instantiate an Environment object with the following:

from azureml.core import Environment

env = Environment(name='my-env')

Then, for both of the above cases, set the base image you would like to use. For example, here we will specify the openmpi3.1.2-cuda10.1-cudnn7-ubuntu18.04 image:

env.docker.enabled = True
env.docker.base_image = 'mcr.microsoft.com/azureml/openmpi3.1.2-cuda10.1-cudnn7-ubuntu18.04'

Configure and submit the training job

Create a ScriptRunConfig object to specify the configuration details of your training job, including your training script, environment to use, and the compute target to run on.

from azureml.core import Workspace, Experiment
from azureml.core import ScriptRunConfig

ws = Workspace.from_config()
compute_target = ws.compute_targets['my-cluster-name']

src = ScriptRunConfig(source_directory='.',
                      script='train.py',
                      compute_target=compute_target,
                      environment=env)
                      
run = Experiment(workspace=ws, name='my-experiment').submit(src)
run.wait_for_completion(show_output=True)

What happens during job execution

As the job is executed, it goes through the following stages:

  • Preparing: A docker image is created according to the environment defined. The image is uploaded to the workspace's Azure Container Registry and cached for later runs. A new Docker image is built if this is the first time a combination of dependencies are used in a workspace. If not, a cached Docker image is used. Logs are also streamed to the run history and can be viewed to monitor progress. If a curated environment is specified instead, the cached image backing that curated environment will be used.

  • Scaling: The cluster attempts to scale up if the cluster requires more nodes to execute the run than are currently available.

  • Running: All scripts in the script folder are uploaded to the compute target, any datasets specified are mounted or downloaded, and the script is executed. Outputs from stdout and the ./logs folder are streamed to the run history and can be used to monitor the run.

  • Post-Processing: The ./outputs folder of the run is copied over to the run history.

Using your own custom Docker image or Dockerfile for training

If you instead want to use your own custom Docker image or Dockerfile for your training job instead of the Azure ML base images, you can refer to the documentation Train using a custom image.

Resources

For additional documentation and tutorials, see the following:

Projects using Azure Machine Learning

Visit following repositories to see the projects contributed by Azure ML users:

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

azureml-containers's People

Contributors

microsoftopensource avatar msftgits avatar mx-iao avatar nikai77 avatar ninghu avatar noorabani avatar saachigopal avatar subamoorthy avatar v-chbhoi avatar v-studum avatar v-vogiralav avatar vizhur avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.