Neusomatic with Azure BatchAI
In this example 'preprocess.py' script is executed on prem, the data are transfered to the cloud with 'upload_data.sh', and the training phase is executed with Azure BatchAI.
Directory structure:
- dataout/models blob container contains the pretrained models
- file share data contains the input files for the training (artificial case)
- file share test contains the input files for the training (AJT case)
Job setup is described in 1stgpujob.json
Prior the training phase install_stable.sh script is executed on every node. install.sh contains the possible optimizations.
rgname = hpc-batchai
wsname = neusomatic_workspace
storaccname=neusomaticstorage
expname=pytorch_experiment
az group create -n $rgname -l westeurope
az batchai workspace create -g $rgname -n $wsname -l westeurope
az batchai experiment create -g $rgname -n $expname -l westeurope -
clustername=nc6
az batchai cluster create -n $clustername -g $rgname -w $wsname -s Standard_NC6 -t 2 --generate-ssh-keys
az storage account create -n $storaccname --sku Standard_LRS -g $rgname
az storage share create -n logs --account-name $storaccname
az storage share create -n scripts --account-name $storaccname
az storage share create -n data --account-name $storaccname
az storage share create -n test --account-name $storaccname
az storage directory create -n dataout -s data --account-name $storaccname
az storage file upload -s scripts --source install.sh --path prep --account-name $storaccname
az storage file upload-batch -s /mnt/bigdata/output_dir/standalone/dataset --pattern */candidates*.tsv* --destination data --account-name $storaccname
jobname=n1
az batchai job create -c $clustername -n $jobname -g $rgname -w $wsname -e $expname -f 1stgpujob.json --storage-account-name $storaccname
az batchai job file stream -j $jobname -g $rgname -w $wsname -e $expname -f stdout-0.txt
az batchai job show -n distributed_pytorch -g $rgname -w $wsname -e $expname --query jobOutputDirectoryPathSegment
Results can be viewed with Azure Storage Explorer
az batchai cluster delete -n $clustername -g $rgname -w $wsname