Coder Social home page Coder Social logo

azure-pipelines-agent's Introduction

azure-pipelines-agent

Build Status

Self-hosted GPU agents for Azure Pipelines

https://docs.microsoft.com/en-us/azure/devops/pipelines/agents/v2-linux?view=azure-devops

The following steps are marked with (HOST) or (WEBSITE), suggesting whether the step should be carried out on the machine intened to be the self-hosted agent, or on the Azure Pipelines website.

(WEBSITE) set up a Personal Access Token

Go to https://dev.azure.com/c3srdev

profile >> security >> personal access tokens

new token

more scopes: agent pools read/manage

token is saved in bitwarden under azure-pipelines-agent token

(WEBSITE) Create an agent pool

You must tell Azure Pipelines what pool your agents work in. You can assign each job to a unique pool, so in C3SR we organize pools according to the capabilities of the underlying host. For example, an amd64 host with ubuntu 1604 and cuda 10.0 would be amd64-ubuntu1604-cuda100.

To create a new agent pool do

https://dev.azure.com/c3srdev/_settings/agentpools >> new agent pool

You will use the name of the agent pool in your azure-pipelines.yml.

(HOST) Install CUDA, docker, nvidia-docker, and python3

The Azure pipelines agent runs inside Docker, to create a fresh environment for each job. The manager is written in python.

(HOST) Run the agents using Docker

The docker agent is configured to accept a single job and then exit. This ensures that each job will have a fresh environment. python/manager.py is responsible for making sure new agents are created whenever the number of active agents falls below a threshold. The manager will run forever. When it is interrupted, it will try to clean up any dangling containers that it created.

  1. Start the manager python3 python/manager.py
python3 python/manager.py <PAT> <URL> <POOL>

The manager needs to be passed the Personal Access Token you created earlier, as well as the Azure Pipelines project URL, and the name of the pool the agent should be registered to.

for example

python3 python/manager.py [long string of letters and numbers] https://dev.azure.com/c3srdev amd64-ubuntu1604-cuda100

The manager will query the host system and try to determine the agent to run with the most CUDA support. These agent Docker images are hosted on the Docker Hub, and defined in the dockeragent directory of this repository.

  • cwpearson/azp-cuda-agent:amd64-ubuntu1604-cuda92
  • cwpearson/azp-cuda-agent:amd64-ubuntu1604-cuda100
  • cwpearson/azp-cuda-agent:amd64-ubuntu1604-cuda101

If the manager fails to understand your system, or your system is not supported by one of those images, you can supply your own docker image with the -d flag.

If you want to make host directories available to the containers, you can use the --volume option, like so

  • --volume hostpath:containerpath:envvar

this will map hostpath into the container at containerpath and execute the container with the environment variable envvar set to containerpath. --volume may be specified more than once.

  1. If you want to build your own agent:
    1. Define a Docker image compatible with your system

      Use the Dockerfiles in dockeragent as an example. You will probably need to change the downloaded Azure Pipelines agent binary, as well as the nvidia CUDA base image.

    2. Build the docker image yourself

      cd dockeragent
      docker build -f <your docker file> -t myazpagent .
      
    3. Run python/manager.py

      cd python
      python manager.py <PAT> <URL> <POOL> -d myazpagent
      

FAQ

The pool stopped accepting my jobs

If the agent is out of date, at some point it seems to stop accepting jobs.

Check https://dev.azure.com/{project}/_settings/agentpools for the current download link and make sure it matches the docker file.

azure-pipelines-agent's People

Contributors

cwpearson avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

azure-pipelines-agent's Issues

Lower CPU priority for

  1. create a user that the docker agent will run as. This is done in the Dockerfile
RUN useradd -ms /bin/bash azp
USER azp
WORKDIR /home/azp
  1. use /etc/security/limits.conf to raise the niceness of that user
RUN echo azp soft priority 20 >> /etc/security/limits.conf

Allow manager to map static data into containers

To map large graph data into a test environment

  1. the manager could map it as a read-only volume
  2. the manager could add an environment variable saying that the large test data was available and where it was
  3. If that environment variable exists, the test could run, otherwise it could be skipped

`python manager.py ...` exits with 404

python manager.py [redacted]] https://dev.azure.com/c3srdev amd64-ubuntu1604-cuda100 --volume ~/graph/:/data/graph:PANGOLIN_GRAPH_DIR
testing for nvidia-docker
nvidia-docker looks good
autodetected docker image cwpearson/azp-cuda-agent:amd64-ubuntu1604-cuda102
pulling cwpearson/azp-cuda-agent:amd64-ubuntu1604-cuda102
Traceback (most recent call last):
  File "/home/pearson/.local/share/virtualenvs/azure-pipelines-agent-fYRwYNLf/lib/python3.6/site-packages/docker/api/client.py", line 261, in _raise_for_status
    response.raise_for_status()
  File "/home/pearson/.local/share/virtualenvs/azure-pipelines-agent-fYRwYNLf/lib/python3.6/site-packages/requests/models.py", line 940, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: http+docker://localhost/v1.35/images/create?tag=amd64-ubuntu1604-cuda102&fromImage=cwpearson%2Fazp-cuda-agent

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "manager.py", line 147, in <module>
    client.images.pull(DOCKER_IMAGE)
  File "/home/pearson/.local/share/virtualenvs/azure-pipelines-agent-fYRwYNLf/lib/python3.6/site-packages/docker/models/images.py", line 444, in pull
    repository, tag=tag, stream=True, **kwargs
  File "/home/pearson/.local/share/virtualenvs/azure-pipelines-agent-fYRwYNLf/lib/python3.6/site-packages/docker/api/image.py", line 414, in pull
    self._raise_for_status(response)
  File "/home/pearson/.local/share/virtualenvs/azure-pipelines-agent-fYRwYNLf/lib/python3.6/site-packages/docker/api/client.py", line 263, in _raise_for_status
    raise create_api_error_from_http_exception(e)
  File "/home/pearson/.local/share/virtualenvs/azure-pipelines-agent-fYRwYNLf/lib/python3.6/site-packages/docker/errors.py", line 31, in create_api_error_from_http_exception
    raise cls(e, response=response, explanation=explanation)
docker.errors.NotFound: 404 Client Error: Not Found ("manifest for cwpearson/azp-cuda-agent:amd64-ubuntu1604-cuda102 not found")
$ docker version
Client: Docker Engine - Community
 Version:           19.03.1
 API version:       1.39 (downgraded from 1.40)
 Go version:        go1.12.5
 Git commit:        74b1e89
 Built:             Thu Jul 25 21:21:05 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          18.09.7
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.8
  Git commit:       2d0083d
  Built:            Thu Jun 27 17:23:02 2019
  OS/Arch:          linux/amd64
  Experimental:     false

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.