Coder Social home page Coder Social logo

oe-engine's Introduction

Open Enclave Engine - Azure template generator for SGX-capable VMs

The Open Enclave Engine oe-engine generates ARM (Azure Resource Manager) template for SGX-capable virtual machines. oe-engine receives a VM definition file in JSON format and generates an ARM template file and parameters file.

The VM definition file describes properties of the VMs, such as: compute power, OS image, credentials, etc.

Refer to the deployment documentation for more details.

License

MIT

oe-engine's People

Contributors

achamayou avatar brmclaren avatar dmitsh avatar francis-liu avatar ionutbalutoiu avatar microsoft-github-policy-service[bot] avatar microsoftopensource avatar msftgits avatar oprinmarius avatar paulcallen avatar shruti25ratnam avatar vtikoo avatar yakman2020 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

oe-engine's Issues

Support for non-ACC machine sizes?

Unless I am missing something, it seems that it would be fairly easy to allow machines sizes other than DC2s and DC4s.

We really like oe-engine (because it's so simple and elegant!), and we would love to be able to use it for other types of VMs we create. Are there any fundamental obstacles to this? If not, are you open to a PR along those lines?

vnetProfile feature not working across resource groups

I first created a group of 5 machines in East US:

{
  "properties": {
    "vmProfiles": [
      {
        "name": "coco-bench-0",
        "osImageName": "UbuntuServer_16.04",
        "vmSize": "Standard_DC2s"
      },
      {
        "name": "coco-bench-1",
        "osImageName": "UbuntuServer_16.04",
        "vmSize": "Standard_DC2s"
      },
      {
        "name": "coco-bench-2",
        "osImageName": "UbuntuServer_16.04",
        "vmSize": "Standard_DC2s"
      },
      {
        "name": "coco-bench-3",
        "osImageName": "UbuntuServer_16.04",
        "vmSize": "Standard_DC2s"
      },
       {
        "name": "coco-bench-4",
        "osImageName": "UbuntuServer_16.04",
        "vmSize": "Standard_DC2s"
      }
    ],
    "linuxProfile": {
      "adminUsername": "coco"
    }
  }
}

Ran:

./oe-engine generate eastus.json --ssh-public-key ~/.ssh/id_rsa.pub
az group deployment create --name coco-bench-us --resource-group coco-bench-us --template-file _output/azuredeploy.json --parameters @_output/azuredeploy.parameters.json --subscription Gryffindor

That worked fine, and put them all on a vnet it created called coco-bench-us-vnet.

Then I created a set of 5 VMs in West EU, with the vnetProfile set to use the coco-bench-us-vnet:

{
  "properties": {
    "vmProfiles": [
      {
        "name": "coco-bench-0",
        "osImageName": "UbuntuServer_16.04",
        "vmSize": "Standard_DC2s"
      },
      {
        "name": "coco-bench-1",
        "osImageName": "UbuntuServer_16.04",
        "vmSize": "Standard_DC2s"
      },
      {
        "name": "coco-bench-2",
        "osImageName": "UbuntuServer_16.04",
        "vmSize": "Standard_DC2s"
      },
      {
        "name": "coco-bench-3",
        "osImageName": "UbuntuServer_16.04",
        "vmSize": "Standard_DC2s"
      },
       {
        "name": "coco-bench-4",
        "osImageName": "UbuntuServer_16.04",
        "vmSize": "Standard_DC2s"
      }
    ],
    "vnetProfile": {
      "vnetResourceGroup": "coco-bench-us",
      "vnetName": "coco-bench-us-vnet"
    },
    "linuxProfile": {
      "adminUsername": "coco"
    }
  }
}

Ran:

./oe-engine generate westeu.json --ssh-public-key ~/.ssh/id_rsa.pub
az group deployment create --name coco-bench-eu --resource-group coco-bench-eu --template-file _output/azuredeploy.json --parameters @_output/azuredeploy.parameters.json --subscription Gryffindor

They were created, but they're all on a newly created vnet called coco-bench-eu-vnet, rather than coco-bench-us-vnet, which is unexpected.

attested_tls test hangs during validate.sh

@oprinmarius reported this to me yesterday, and so I gathered some information from a repro environment.

About 30 minutes ago I deployed a new VM using oe-engine based on master (and the latest intel drivers), and I'm encountering the attested_tls test hanging.

azureuser@acc-ub1804:~$ ps -aef | grep tls
root      10299  10260  0 20:37 ?        00:00:00 sh -c echo Running ./attested_tls; cd ./attested_tls && make && make run
root      10393      1  0 20:37 ?        00:00:00 ./server/host/tls_server_host ./server/enc/tls_server_enc.signed -port:12341
root      10396  10390  0 20:38 ?        00:00:00 ./client/host/tls_client_host ./client/enc/tls_client_enclave.signed -server:localhost -port:12341
azureus+  12378  12356  0 21:09 pts/0    00:00:00 grep --color=auto tls
azureuser@acc-ub1804:~$ date
Thu Aug 22 21:10:01 UTC 2019

This issue happens every time we deploy a node using oe-engine and is causing deployment failures in our Jenkins E2E tests:
https://oe-jenkins.eastus.cloudapp.azure.com/job/oe-acc-ubuntu-16.04-eastus/
https://oe-jenkins.eastus.cloudapp.azure.com/job/oe-acc-ubuntu-18.04-eastus/

PR #58 disables the attested_tls test from being run as part of validate.sh, and should mask this problem. Even though it masks this problem, we should disable this test anyway because we don't run the remote_attestation test (and the attested_tls test relies on remote attestation to work).

In any case, I wanted to file this to draw attention to this issue because it probably deserves a bit of investigation as to why this test is hanging during validate.sh. It may point to an infrastructure problem with the VM's network during boot or possibly some unknown test issue.

Tagging @soccerGB @shruti25ratnam @jazzybluesea for their info!

(Note that this issue doesn't seem to happen on VMs that have existed for awhile. I don't have any problem with my attested_tls test on my dev VM, only during the runtime of validate.sh during deployment)

Regression when using custom VHD option

After 368bd09 got merged into master, the following error is thrown when using a custom image VHD option:

INFO[0000] Generating assets into _output...            
FATA[0000] error generating template oe-engine-template.json: template: resources.t:52:25: executing "resources.t" at <.>: wrong type for value; expected *api.VMProfile; got *api.Properties 

This is the oe-engine JSON used:

{
  "properties": {
    "vmProfiles": [
      {
        "name": "<VM_NAME>",
        "osType": "Linux",
        "vmSize": "Standard_DC2s",
        "ports": [22],
        "isVanilla": true,
        "hasDNSName": true
      }
    ],
    "vnetProfile": {
      "vnetResourceGroup": "<VNET_RG>",
      "vnetName": "<VNET_NAME>",
      "subnetName": "default"
    },
    "linuxProfile": {
      "adminUsername": "azureuser",
      "sshPublicKeys": [
        {
          "keyData": "<SSH_KEY>"
        }
      ],
      "osImage": {
        "url": "<VHD_URL>"
      }
    },
    "diagnosticsProfile": {
      "enabled": false
    }
  }
}

If master is reverted just before 368bd09, everything is just fine.

The regression is isolated only if a custom image VHD is provides via the osImage.url option.

OE-Engine 2019 Support

oe-engine only uses the 2016 Windows Sku's. Are there plans to add 2019 support as 2016 has been end-of-lifed?

Add support for enabling WinRM for Ansible

In our OpenEnclave CI system we are building OpenEnclave machines and saving their VHDs for faster future deployments.
We are deploying Jenkins agents from those VHDs with oe-engine.
We sysprep the images before saving the VHDs.
For Windows agents we need to add a custom script to run at first boot , in order to allow further provisioning with Ansible.

Currently oe-engine templates don't have an option to add custom_data to the Azure VMs we are trying to deploy.
We should consider adding this feature.

Support the SGX DCAP scenario for Windows deployments through oe-engine

There is ongoing work to support the SGX DCAP scenario on Windows for Open Enclave and its dependencies (like the Intel SGX DCAP driver, the Azure-DCAP-Client, etc).

Eventually, we'll want a Windows script that installs all of the appropriate dependencies for both runtime (being able to run an enclave that relies on DCAP support) and development scenarios (being able to develop and test an enclave that relies on DCAP support).

This script is currently what gets executed if you deploy a Windows system and probably installs most of the required development tools besides the DCAP dependencies:
https://github.com/microsoft/oe-engine/blob/master/parts/windowsProvision.ps1

We'll want to extend this script to support the SGX DCAP scenario. Much of the work to install the DCAP prerequisites appears to have been done by @ionutbalutoiu and the Cloudbase team in an Ansible task here:
https://github.com/microsoft/openenclave/blob/master/scripts/ansible/roles/windows/az-dcap-client/tasks/environment-setup.yml

There's some discussion about the work that led to that Ansible task here:
openenclave/openenclave#1320

@pushkarcMS can give plenty of more information about the full requirements, and possibly @ionutbalutoiu and I can give some insight as well :)

SGX driver fails to install on Ubuntu 18.04

Intel SGX driver fails to install:

Unpacking Intel SGX Driver ... done.
Verifying the integrity of the install package ... done.
Installing Intel SGX Driver ...
/tmp/sgx-driver-OG3SRI /opt/azure/acc
install -d /opt/intel/sgxdriver/package
install -d /opt/intel/sgxdriver/scripts
cp -r package/* /opt/intel/sgxdriver/package
install scripts/* /opt/intel/sgxdriver/scripts
/opt/azure/acc
/opt/intel/sgxdriver/package /opt/azure/acc

Kernel preparation unnecessary for this kernel.  Skipping...

Building module:
cleaning build area...
'make' sign KDIR=/lib/modules/5.0.0-1014-azure/build...(bad exit status: 2)
ERROR (dkms apport): binary package for sgx: 0.10 not found
Error! Bad return status for module build on kernel: 5.0.0-1014-azure (x86_64)
Consult /var/lib/dkms/sgx/0.10/build/make.log for more information.

/var/lib/dkms/sgx/0.10/build/make.log contents:

DKMS make.log for sgx-0.10 for kernel 5.0.0-1014-azure (x86_64)
Mon Aug 26 08:31:06 UTC 2019
make -C /lib/modules/5.0.0-1014-azure/build SUBDIRS=/var/lib/dkms/sgx/0.10/build CFLAGS_MODULE="-I/var/lib/dkms/sgx/0.10/build -I/var/lib/dkms/sgx/0.10/build/include" modules LE_ACTION=SIGN
make[1]: Entering directory '/usr/src/linux-headers-5.0.0-1014-azure'
Makefile:223: ================= WARNING ================
Makefile:224: 'SUBDIRS' will be removed after Linux 5.3
Makefile:225: Please use 'M=' or 'KBUILD_EXTMOD' instead
Makefile:226: ==========================================
  CC      /var/lib/dkms/sgx/0.10/build/le/main.o
  AS      /var/lib/dkms/sgx/0.10/build/le/entry.o
  CC      /var/lib/dkms/sgx/0.10/build/le/string.o
  LD      /var/lib/dkms/sgx/0.10/build/le/sgx_le_proxy
  AS [M]  /var/lib/dkms/sgx/0.10/build/sgx_le_proxy_piggy.o
  CC [M]  /var/lib/dkms/sgx/0.10/build/sgx_ioctl.o
  CC [M]  /var/lib/dkms/sgx/0.10/build/sgx_encl.o
/var/lib/dkms/sgx/0.10/build/sgx_encl.c: In function ‘sgx_process_add_page_req’:
/var/lib/dkms/sgx/0.10/build/sgx_encl.c:195:8: error: implicit declaration of function ‘vm_insert_pfn’; did you mean ‘vmf_insert_pfn’? [-Werror=implicit-function-declaration]
  ret = vm_insert_pfn(vma, addr, SGX_EPC_PFN(epc_page));
        ^~~~~~~~~~~~~
        vmf_insert_pfn
cc1: some warnings being treated as errors
scripts/Makefile.build:284: recipe for target '/var/lib/dkms/sgx/0.10/build/sgx_encl.o' failed
make[2]: *** [/var/lib/dkms/sgx/0.10/build/sgx_encl.o] Error 1
Makefile:1606: recipe for target '_module_/var/lib/dkms/sgx/0.10/build' failed
make[1]: *** [_module_/var/lib/dkms/sgx/0.10/build] Error 2
make[1]: Leaving directory '/usr/src/linux-headers-5.0.0-1014-azure'
Makefile:96: recipe for target 'sign' failed
make: *** [sign] Error 2

Request for release

The last release of oe-engine is very old (Nov '18), can we please have a more recent release?

Windows provision script is broken

  • The Windows Provision script throws the following error , and it is not installing anything on the instace:
PS C:\AzureData> .\oeWindowsProvision.ps1
At C:\AzureData\oeWindowsProvision.ps1:378 char:5
+     "@
+     ~~
White space is not allowed before the string terminator.
At C:\AzureData\oeWindowsProvision.ps1:370 char:31
+ function Add-RegistrySettings {
+                               ~
Missing closing '}' in statement block or type definition.
    + CategoryInfo          : ParserError: (:) [], ParseException
    + FullyQualifiedErrorId : WhitespaceBeforeHereStringFooter
  • After fixing the whitespace typo, the installation hangs at installing Azure DCAP client.
    InstallAzureDCAP.ps1 is waiting for input as it expects a parameter localPath to be defined.

  • Add-RegistrySettings function restarts the instance before provisioning is complete

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.