You've found the batch-orchestration
framework. Here you'll find a set of tools to assist you in scaling out simulators using Azure Batch.
- An Azure account.
- Bonsai workspace. You can find instructions on provisioning a bonsai workspace here.
- Anaconda or miniconda.
- Create a virtual environment with libraries dependencies (described in environment.yml file)
conda env create -f environment.yml
conda activate bonsai-preview
- Create your resources:
python batch_creation.py create_resources
- Build your image:
python batch_creation.py build_image
- Run your tasks:
python batch_containers.py run_tasks
- Create your brain and start training:
bonsai brain version start-training --name <brain-name>
- Attach your simulators:
bonsai simulator unmanaged connect -b <brain-name> -a Train -c <concept_name> --simulator-name <simulator-name>
There are two executable scripts in this repository:
batch_creation.py
-> creates the necessary resources on Azure to scale your simulations: Azure Batch, Azure Container Registry, and Azure Blob Storage, all within a single resource group.- NOTE: Resources may contain only lowercase alphanumeric characters, and must be between 3 and 25 characters in length.
batch_containers.py
-> executes a set of simulation jobs as a set of tasks on the Azure Batch account you created in step 1.
Both of these scripts rely on the fire
package to execute the scripts. To view how to use these scripts you are recommended to view their associated arguments and documentation:
python batch_creation.py -h
NAME
batch_creation.py
SYNOPSIS
batch_creation.py GROUP | COMMAND
GROUPS
GROUP is one of the following:
configparser
Configuration file parser.
pathlib
re
Support for regular expressions (RE).
Dict
The central part of internal API.
Union
Internal indicator of special typing constructs. See _doc instance attribute for specific docs.
fire
The Python Fire module.
COMMANDS
COMMAND is one of the following:
get_default_cli
azure_cli_run
Run Azure CLI command
AzCreateBatch
AzExtract
AcrBuild
delete_resources
Delete resource group
write_azure_config
create_resources
Main function to create azure resources and write out credentials to config file
build_image
Build ACR image from a source directory containing a dockerfile and src files.
python batch_containers.py -h
NAME
batch_containers.py
SYNOPSIS
batch_containers.py GROUP | COMMAND
GROUPS
GROUP is one of the following:
configparser
Configuration file parser.
datetime
Fast implementation of the datetime type.
pathlib
sys
This module provides access to some objects used or maintained by the interpreter and to functions that interact strongly with the interpreter.
time
This module provides various functions to manipulate time values.
List
The central part of internal API.
batch_auth
batch
batchmodels
blobxfer
fire
The Python Fire module.
xfer_utils
Run and scale simulation experiments on Azure Batch.
COMMANDS
COMMAND is one of the following:
AzureBatchContainers
run_tasks
Run simulators in Azure Batch.
stop_job
upload_files
Upload files into attached batch storage account.
While there are a lot of different functions exposed, the most common usage only relies on two of them from batch_creation
, and one from batch_containers
:
python batch_creation.py create_resources
- create the resources
python batch_creation.py build_image --image-name <image-name>
- build your Docker image on Azure Container Registry
python batch_containers.py run_tasks
- run your batch pool
The main advantage of this repository is it streamlines the process of scaling simulators using Azure Container Registry with Docker images. The only thing the user needs to do is write a Dockerfile containing their source code for running the simulator. In most cases, this is a very simple Docker image, and hence the Dockerfile is very concise. Building and running the image is done entirely using Azure Container Registry, which means you don't even need to install Docker locally!
In order to specify the number of nodes in the pool, define the following arguments:
python batch_containers.py run_tasks --dedicated-nodes=<#_of_dedicated nodes> --low-pri-nodes=<#_of_lo_pri_nodes>
The command will ask in the user to enter the number of sims to run, and the brain name.
The number of tasks per node will be automatically be deduced as number_of_sims/(number_low_pri_nodes + number_dedicated_nodes)
Note, deleting pools is the best way to completely ensure you don't run into additional costs once the brain training has completed.
In order to delete from command line, you have the following options:
- Delete last created pool:
python batch_containers.py delete_pool
. - Delete specific pool:
python batch_containers.py delete_pool --pool_name="pool-name"
. - Delete all pools within last created resource:
python batch_containers.py delete_pool --delete-all=True
.
If you want to see and/or manage your current pools through Azure, you can follow the following steps:
- Search for the Resource Group you selected when running
python batch_creation.py create_resources
. - On Overview tab, click over the item name with TYPE "Batch Account" (by default: "<your_group_name>batch").
- On left pane, on 'Features' section, click over 'Pools'.
- You can now see a drop down with the list of previously created pools.
If you want to resize an existing pool, use the function resize_pool
python batch_containers.py resize_pool --low_pri_nodes <new-low-pri-node-count> --dedicated_nodes <new-dedicated-node-count>
If you would like to modify the pool altogether, perhaps so that it uses a new image, then delete the pool first and run new tasks on the new pool:
python batch_containers.py delete_pool --pool_name <pool-to-delete>
python batch_containers.py run_tasks
The build_image
function contains a few arguments for specifying the platform, image name, as well as the docker path. Here is an example of specifying a windows platform version with a different Dockerfile location:
python batch_creation.py build_image \
--docker_folder=examples/cs-house-energy \
--dockerfile_path=Dockerfile-windows \
--platform=windows --image_name=winhouse
After building, you can run your tasks with the specific image you've created:
python batch_containers.py run_tasks --image_name=winhouse
There is currently no updated batch_orchestration
package. The best way to use this package is to install the bonsai-batch conda environment (Follow this link if you need to install conda):
conda update -n base -c defaults conda
conda env update -f environment.yml
conda activate bonsai-preview
This provides the exact versions of the packages and python environment we used to test this library and therefore will give you the highest chance of success.
The first time you use this package you'll also need to login to azure and set your subscription appropriately:
az login
az account list -o table
az account set -s <subscription-id>
The only caveat is if you need to debug your Docker image, you will need to install Docker locally (or write a batch script to run on ACR, which is a pretty inefficient method of debugging). For example, after running the batch_creation
script above, you could test your image by:
docker login azhvacacr.azurecr.io
# your username and password are available in the newconf.ini file
docker pull azhvacacr.azurecr.io/hvac:1.0
docker run -it azhvacacr.azurecr.io/hvac:1.0 bash
Simulators may unregister from the Bonsai platform for any of the following reasons:
- Software update to Bonsai platform
- WaitForState timeout
- WaitForAction timeout
When using managed simulators, the platform will automatically re-register and connect sims when they unregister. When using unmanaged simulators, such as with the bonsai-batch
scripts, the user is responsible for registering the simulators again. To aid this effort, the reconnect.py
with the following flags to repeatedly look for sims with an Unset
purpose and connect those specific session-id's back with your brain.
python reconnect.py --simulator-name HouseEnergy --brain-name 20201116_he --brain-version 1 --concept-name SmartHouse --interval 1
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.