microsoft / azuretre Goto Github PK

An accelerator to help organizations build Trusted Research Environments on Azure.

Home Page: https://microsoft.github.io/AzureTRE

License: MIT License

Dockerfile 0.56% Makefile 0.87% Python 52.56% Shell 9.69% HCL 23.01% HTML 0.10% Java 2.43% PowerShell 0.12% SCSS 0.23% TypeScript 10.33% PLpgSQL 0.03% TSQL 0.07%

azuretre's Introduction

Azure Trusted Research Environment

Azure TRE documentation site: https://microsoft.github.io/AzureTRE/

Background

Across the health industry, be it a pharmaceutical company interrogating clinical trial results, or a public health provider analyzing electronic health records, there is the need to enable researchers, analysts, and developers to work with sensitive data sets.

Trusted Research Environments (TREs) enable organisations to provide research teams secure access to these data sets alongside appropriate tooling to ensure researchers can remain efficient and productive despite the security controls in place.

Further information on TREs in general can be found in many places, one good resource is HDR UK's website.

The Azure Trusted Research Environment project is an accelerator to assist Microsoft customers and partners who want to build out Trusted Research environments on Azure. This project enables authorized users to deploy and configure secure workspaces and researcher tooling without a dependency on IT teams.

This project is typically implemented alongside a data platform that provides research ready datasets to TRE workspaces.

TREs are not “one size fits all”, hence although the Azure TRE has a number of out of the box features, the project has been built be extensible, and hence tooling and data platform agnostic.

Core features include:

Self-service workspace management for TRE administrators
Self-service provisioning of research tooling for research teams
Package and repository mirroring - PyPi, R-CRAN, Apt and more.
Extensible architecture - build your own service templates as required
Microsoft Entra ID integration
Airlock - import and export
Cost reporting
Ready to workspace templates including:
- Restricted with data exfiltration control
- Unrestricted for open data
Ready to go workspace service templates including:
- Virtual Desktops: Windows, Linux
- AzureML (Jupyter, R Studio, VS Code)
- ML Flow
- Gitea

Project Status and Support

This project's code base is still under development and breaking changes will happen. Whilst the maintainers will do our best to minimise disruption to existing deployments, this may not always be possible. Stable releases will be published when the project is more mature.

The aim is to bring together learnings from past customer engagements where TREs have been built into a single reference solution. This is a solution accelerator aiming to be a great starting point for a customized TRE solution. You're encouraged to download and customize the solution to meet your requirements

This project does not have a dedicated team of maintainers but relies on you and the community to maintain and enhance the solution. Microsoft will on project-to-project basis continue to extend the solution in collaboration with customers and partners. No guarantees can be offered as to response times on issues, feature requests, or to the long term road map for the project.

It is important before deployment of the solution that the Support Policy is read and understood.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Note: maintainers should refer to the maintainers guide

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

Repository structure

├── .github
│   ├── ISSUE_TEMPLATE     - Templates for GitHub issues
│   ├── linters            - Linter definitions for workflows
│   └── workflows          - GitHub Actions workflows (CI/CD)
│
├── devops
│   ├── scripts            - DevOps scripts
│   └── terraform          - Terraform specific DevOps files/scripts for bootstrapping
│
├── docs                   - Documentation
│
├── e2e_tests              - pytest-based end-to-end tests
│
├── api_app                - API source code and docs
│
├── resource_processor     - VMSS Porter Runner
│
├── scripts                - Utility scripts
│
└── templates
    ├── core/terraform     - Terraform definitions of Azure TRE core resources
    ├── shared_services    - Terraform definitions of shared services
    ├── workspace_services - Workspace services
    └── workspaces         - Workspace templates

azuretre's People

Contributors

Stargazers

Watchers

Forkers

javier-alvarez saiprasad16 reddogproductions msebragge turtlevelocity eladiw tamas118 lybecker svenaelterman tessferrandez marrobi limorl fishadr-tms marianafcruz17 queensu-cloud dvoet damoodamoo ivybarley tanya-borisova daltskin chaseknowlden aidenqueensu stuartleeks ross-p-smith nstmtre martinpeck jjgriff93 joalmeid oliver7598 ivan3j dcr007 vjmanda jaimiewi 0xwho7 frankfanslc tamirkamara classicvalues bobmaintain friedwish nazermohamed g-arj anatbal ant0nsc python-repository-hub lohithvenkatesh mhra cipmaftei timmyreilly sshyran maansi-1608 wereinse promisinganuj sonali-rajput nagyattilamiklos ktakeda1 sharonhart guybartal ciprianmaf dusan-ilic-mhra pedro-pelegrin-nttdata sagar-timalsena nyouens yuvalyaron isabella232 sanjeevsidana vvcb onennit lenisha atpw25 hizni t-young31 bagasstrongman aetheriaxai gauravagrwal civicdatacoop clarotech oxbrcinformatics lthtr-dst ms-mikerice lizashak luanar01 ievsantillan ayoneeyee ucl-arc fabiomaistro garvitrajput-up20 twcm search1n1 luckyseal lordlinus grritch migldasilva tredell tregovbi jlabhard-sg calumwilliamssimpsons nayeem01 musama77-epic somaanjaved bytebounder

azuretre's Issues

Baseline shared services infrastructure template

Initial Infrastructure template for shared services.

Conda package management

TBD

Create initial scripts and docs for deploying the TRE solution into an existing Azure subscription

User story

As a cloud administrator
I need basic instructions on how to clone the repo and how to install the core TRE pieces into my subscription
So that I can participate in the test cycle from the beginning.

Acceptance critiera

Initial instructions on how to install the TRE solution, created in the docs section of the repo
Initial scripts for installing the TRE into an existing azure subscription created and available within the repo
Cloud admin can follow the instructions and use the scripts to repeatedly install the baseline TRE solution

Configure branch policies

Lock main
Require PR with multiple approvers

Explore: Using Azure Information Protect with Azure Machine Learning

The TRE secures the data by not allowing data transfer in or out of the closed VNET. With [Azure Information Protect] (https://docs.microsoft.com/en-us/azure/information-protection/what-is-information-protection), data can classified, protected and access governed by an AAD identity, to enhance security and minimize the risk of data leakage. E.g. if data left the TRE users will still not be able to read the content.

For Azure Machine Learning to be able to process/read data, it must be able to act as the identity of a user with permissions to read the data. This feature is in private preview: https://github.com/MayMSFT/identity-based-data-access/

Is this feature mature for TRE?

App packages shared service

Researchers will need all kinds of packages/libraries when working with data. All packages within a feed will be accessable.

Requirements:

Support as many package feeds types as possible.
It should be possible to add a new package feed.
All package feeds should be read-only, meaning publishing new package must not be possible.
System package managers like apt-get and winget.
Code/SDK package mangers like PyPi, Conda, NuGet etc.

Implementation:

#437

Explore Azure ML configuration options and requirements for closed VNet setup

We need to deploy AzureML running within the context of an AzTRE workspace so that we can deploy InnerEye and adhere to these requirements:

Deployed to VNet and adheres to the TRE workspace egress rules.
Only accessible from the Virtual Desktop service in the same workspace.
Azure AD authentication
Ability to manage AML clusters and compute instances within the TRE workspace and not outside of the TRE workspace.
AML storage provisioned in the TRE workspace.
No public endpoints.
Configuration on which services within AML should be enabled and accessible. e.g. Interactive Notebooks on/off.
Ability to connect to AML compute from Virtual Machines within the same TRE workspace.

We need to explore and document the requirements and our options and potential trade-off for having as many features as possible of AML running within the context of an AzTRE workspace.

Please see #19 for additional detail on requirements

Create initial private AML deployment
Research restriction of FQDNs for base AML deployment
Test out simple AML model deployment
Document exceptions and findings

After this task is completed there should be:

A PoC AML deployment with maximum possible set of restrictions (minimum egress, no ingress from outside of VNET)
A dataset available within AML
A deployed model that has been trained and registered
A document describing implementation details, restrictions and exceptions, including a network architecture diagram

Enable markdown linting

To ensure correctness and consistency across markdown files.

Composition service with initial management API

To be detailed, based on outcomes from the design task #13

Complete exploration of possible options and provide recommendation on approach #13
Add additional detail to this issue based on outcomes of above

API

Implement POST /workspaces #112
Implement GET /workspaces #113
Register workspace template #180 , #181 , #182
Validate API requests against workspaceTemplate parameters #142
Trigger workspace bundle #96

API Auth

Register API with Azure AD #95
Register workspace with Azure AD, #211
Enforce auth on /workspaces endpoint #96
Enforce auth on /workspaces{workspace_id} endpoint #221
Retrieve 'my' workspaces #208
Docs #236

Bundles & Deployment execution

Composition service logs and events forwarded to App Insights #174
Add bundle registry to TRE templates #65
Workspace bundle registration #138
Create sample workspace bundle #43
Invoke and install workspace bundle #64
Resource processor reports bundle execution events to state store #135
Document workspace authoring experience #59

Create resource processor function app skeleton

Integrate the resource processor function app into terraform templates and basic function app that listens on state store additions via Service Bus Queue

Acceptance criteria

IaC template(s) created for SB and Function App
Module integrated with TRE main template
Function app skeleton created in source in Python
Function app triggers on new workspace request

Apache Guacamole Virtual Desktop workspace service

High-level requirements

Virtual Desktops is in many cases the entry point into the TRE.
Virtual Desktops can be configured to be access from the Internet over secure channel.
Virtual Desktops can be configured to only be access from a private network.
Ability to provision both Windows and Linux Data Science Virtual Machines.
- Add additional base images in case the DSVM image is not desirable.
Offer a defined set of Virtual Machine configurations, e.g. Small, Medium, Large and image.
End-users should be able to connect to Virtual Desktops from Windows, Linux and Mac OSes.
Ability to control how data is allowed to be moved from/to the VM and the connecting client
- Clipboard: one-way / no-way / both-ways
- Import and export data from the VM.
Ability to mount user scoped storage across VMs
Ability for end-user to start and stop VMs
Scheduled start and stop of individual VMs
- Whole workspace scheduling
Ability to resize VM
Updates management - Ability to pause or reschedule or apply updates to the VM.
Ability to install licensed applications , e.g. Office 365 and Matlab applications
Note: Azure subscription tenant might not be the same as the tenant the researcher was authenticated by.

Implementation

Milestone 0.2

Currently planned for milestone 0.3

Update readme with an intro to Azure TRE

Add introductory text to the repo readme.

Define default logging retention policies

For auditing and troubleshooting purposes, all Azure resources provisioned should have logging and auditing enabled.
At this moment, we don't have any policies explicitly defined. These needs to be defined, implemented and clearly documented.

Originally posted by @deniscep in #49 (comment)

STORY: Create workspaces API resource collection

Workspaces are created, updated, retrieved and deleted through the /workspaces resource collection on the API.

Expose resource collection from HTTP API.
CRUD operations for workspaces based on defined data model
Resource state stored in state store
Initialize required Cosmos containers and data.

Acceptance critiera

Bootstrap cosmos db with /workspaces container #111
Workspaces resource collection exposed in api
Workspaces persisted, and retrieved from State store
- [GET] /workspaces - returns all resource documents that are not marked as deleted #114
- [POST] /workspaces - creates a resource document in the state store #112
- [GET] /workspaces/{workspace_id} - returns the given resource document from the state store #113

Define subnets address space

As a TRE Administrator
I want to define the size of an address space used by the workspace virtual network
So that I do not create a workspace with too few IP addresses.

Subnets should have sensible defaults given the number of IP-addresses needed. For workspaces we should consider the user being able to choose from a set of defaults like "small", "medium", "large".

Originally posted by @deniscep in #49 (comment)

Acceptance criteria

Explore Virtual Desktop service options

Review the available implementation options and previous learnings.

See #18 for additional detail on requirements.

Review preview implementations with Guacamole
Test disabling screen capture with RDP and Guacamole
Using RDP directly without Guacamole #35
VM sign-in options #36

Add issue and pr templates

Issue template: Bug 🐛
PR template
Spike Research Template

Issue and Spike templates should have acceptance criteria and need/value

Azure administrator deploys TRE with custom certs

S
As an Azure Administrator I want to deploy the TRE with my pre created SSL certificate so that application gateway can expose connections over HTTPS.

Depends on #66

Provide SSL certificate when deploying the TRE base infrastructure
Certificate stored in KeyVault
Certificate added to application gateway
Management API exposed over HTTPS at domain.com/api

Originally posted by @deniscep in #49 (comment)

TRE developer creates SSL certificate for development purposes

Description

As a developer I want to acquire a certificate that can be used when I deploy the TRE.

So that I can provide the certificate when deploying the TRE #51

Acceptance criteria

Set of commands to create certificate request and process certificate request
Works cross platform
Document the commands
Wildcard certificate to enable subdomains to be used on workspaces
Mirrors process of acquiring a certificate from corporate IT as close as possible. I.e. Create request, process request. So that the same process can be used to create a request to give to corporate IT to process.

Design Management API surface

The Management API should allow for control plane and eventually data plane operations to provision resources , list metadata about resources, update the state of resources etc.

The Management API will evolve as new concepts and features are introduced but we anticipate the core operations to manage workspaces and services to be part of the initial API version.

We envision a RESTful HTTP API along the lines of:

GET /workspaces - Get workspaces
POST /workspaces - Create workspace
GET /workspaces/{workspace_id} - Get specific workspace
DELETE /workspaces/{workspace_id} - Delete specific workspace
GET /sharedservices - Get core (shared) services
GET /sharedservices/{shared_service_id} - Get core (shared) service
GET /workspaces/{workspace_id}/services - Get workspace services
POST /workspaces/{workspace_id}/services - Create workspace service
DELETE /workspaces/{workspace_id}/services - Delete workspace service
GET /workspaces/{workspace_id}/services/{workspace_service_id}/resources - Create workspace service resources
POST /workspaces/{workspace_id}/services/{workspace_service_id}/resources - Create a workspace service resource
GET /workspaces/{workspace_id}/services/{workspace_service_id}/resources/{workspace_service_resource_id} - Get workspace service resource
DELETE /workspaces/{workspace_id}/services/{workspace_service_id}/resources/{workspace_service_resource_id} - Get workspace service resource

Examples

These requests are very simplistic just to illustrate the conceptual journey of creating a workspace containing a Data Science Virtual Machine. Actual spec to be developed as part of closing this issue.

Create workspace

POST /workspaces
Content-Type: application/json

{
    "name": "Workspace One",
    "owner": "chris"
}

# this will probably have to be async with a 202 Accepted in reality
201 Created
Location: /workspaces/1

{
    "id": 1,
    "name": "Workspace One",
    "owner": "chris"
}

Create workspace service - Virtual Desktops with shared Firewall

POST /workspaces/1/services
Content-Type: application/json

{
    "service_type": "virtual_desktops",
    "firewall": "shared"
}

201 Created
Location: /workspaces/1/services/2

{
    "id": 2,
    "service_type": "virtual_desktops",
    "firewall": "shared"
}

Create workspace service resource - Linux DS VM

POST /workspaces/1/services/2/resources
Content-Type: application/json

{
    "name": "DS VM One",
    "size": "Standard_DS3_v2",
    "image": {
        "publisher": "microsoft-dsvm",
        "offer": "ubuntu-1804",
        "version": "latest"
}

201 Created
Location: /workspaces/1/services/2/resources/3

{
    "id": 3,
    "name": "DS VM One",
    "size": "Standard_DS3_v2",
    "image": {
        "publisher": "microsoft-dsvm",
        "offer": "ubuntu-1804",
        "version": "latest"
}

Explore options for Source code mirror shared service

Please see parent feature #22 for requirements and additional detail

Source code mirror shared service

Researchers need the ability to access source code imported from an external location from within a TRE workspace.

High-level requirements

Git repositories
Support Git LFS
Support both public and private sources

Implementation

Explore viable Source code mirror options.
#544

Create sample workspace bundle

A really simple workspace bundle that is quick to deploy and can be used for development and validate the architecture against.

Contains a VNET, RG

Deploy InnerEye to AzureML

Explore: Using RDP directly without Guacamole

Investigate if RDP meets the requirements for Virtual Desktop Service #18. Accessing af VM in the closed VNET behind an Azure Firewall.

Benefits of using RDP:

Faster user experience
Graphics acceleration
Disabled screen capture
Known and secure technology
Supports both Windows and Linux
No need to manage Guacamole + TomCat server

Acceptance criteria:

Great Linux support for RDP
Disable copy&past for both Linux and Windows (Normally it is handled via Group policies)
Disable download and possibly upload of data/files
Make use of JIT access via Azure Firewall

Design OCI deployment container interface

The Composition delegates responsibility of the CRUD operations on Workspaces and Workspace Services to a OCI deployment container. To be able to pass parameters and get back results with information about the Workspace Services provisioned an interface/contract must be defined.

Describe how an action is passed to the OCI Deployment Container e.g. Install, Delete...
Describe a structured approach to return Workspace Service information (e.g. VM address, IP..)
Describe a way to return execution success/failed
Document the interface with examples

Automated update of shared service properties - rule on the firewall, package on package mirror

A shared resources used by multiple services may need to be updated.

For example:

Firewall rules on the firewall
Mirrored packages on the package mirror
Workspace app registration by workspace services

Requirements

For the firewall:
As a TRE service integrator
I want an uncomplicated way of enable egress traffic to a Workspace/Workspace Service
So that I can create a bundle that can access the required resources

As an Azure administrator
I want to know which changes to a shared service have been made by which resource
So that I can get an overview of what has made changes

As an Azure administrator
I want changes made to a shared service to be undone when the resource that requires them is uninstalled
So that I I only have the configuration I need

Implementation

High level flow

User submits a POST to create a new resource (eg a VM)
The resource bundle is applied and the VM created
Outputs from the process (eg an IP address) are collected
A message is returned to the API layer with details of which shared resource needs updating, and which values to update (eg a new rule on the shared firewall)
The API layer GETs the shared resource from the data store, merges the changes, and PATCHes it, polling the operation until the shared service is updated.
The original resource operation is completed.

Shared services to-dos

Create baseline resource templates

This task is to setup the initial resource templates from which we will continue to build. Let's ensure we have the below baseline templates and can deploy them into our shared dev environment.

Shared services template #28
Workspace template #29
Workspace service template #30
Provision resources with above templates in shared dev environment.
- Provision two workspaces individually after the shared services have been provisioned.
Structure the repository and templates etc in accordance to any requirements from #13

If not explicitly mentioned, all services exposed only through private endpoints to restrict access from the Internet.

Core (Shared) services

VNet
KeyVault
App Service plan
- KeyVault integration using Managed Identity

Workspace

VNet
- Peered with Core (Shared) services VNet
KeyVault
App Service plan
- KeyVault integration using Managed Identity

Workspace service

Storage account

Configure milestones to depict current roadmap and the planning process

Initial milestones for roadmap

Backlog for issues we plan to include in future releases
Next for issues we plan to include in the next (or shortly there after) release
<current> issues committed for the current release

Initial issies

Create initial set of features and stories currently in the plan

Researcher should only have network access resources within my current workspace and ...

As a Researcher I should ONLY have network access to resources within my current workspace, shared services, and whitelisted internet locations
So that I cannot upload data outside of the workspace virtual network.

Modify current TF templates to include the default NSG rules. See original conversation in PR #49

Default route directs Internet traffic through the Firewall
NSG on workspace services subnet allows traffic to shared services
NSG on workspace services subnet blocks outbound access to all other subnets

Originally posted by @deniscep in #49 (comment)

Create doc on customer pre-reqs to be able to deploy the TRE solution

Create documentation on what's needed from a customer side to be able to deploy and run the Azure TRE solution. Also provide high-level guidance on user roles.

Describe the technology requirements
- Azure Subscription
- Azure Active Directory
- ...
Describe the roles in the solution
- Cloud admin / ops
- Workspace owner
- Workspace contributor
- ...

Add bundle registry to TRE templates

User story

As a TRE service integrator
I need to place my workspace bundles in the bundle registry
So that the services can be instantiated by researchers and read by the deployment client / resource processor.

Acceptance critiera

Bundle registry defined in IaC and integrated into TRE IaC
Bundle registry deployed as part of TRE deployment
Sample bundle added to bundle registry in shared environment for demo

Explore: VM sign-in options

Can we use AAD identities directly to sign-in to a VM?

Benefits

Less complexity compared to the current Guacamole implementation, where the Researcher credentials are mapped to VM credentials stored in Key Vault.
The Researcher will be accessing other resources like storage as their own identity making auditing and permission management simpler.

Tasks:

Document initial workspace bundle authoring experience

User story

As a TRE service integrator
I want to know the rules and conventions to create a new workspace bundle with a simple workspace service
so that I can create a new bundle

Acceptance criteria

Initial documentation on how to author a workspace bundle.

How you create the bundle
How you pass and store input parameters (static, dynamic [ex. ip range], secrets)
What interfaces the bundle needs to implement
Where the output is going, and what output is expected
How templates and bundles are updated and versioned
How is the workspace bundle added to the registry on private connection. (Possible we can't use this due to how ACI and App Service work with registries on private networks)
Where are the workspace templates and bundles stored?

Document the TRE logical data model

User story

As a TRE developer
I need to know the TRE logical data model - at least in its initial version -
So that I can start implementing the API and the resource processors.

Acceptance criteria

Schema defined for workspace, workspace service and shared service
Schema documentation created incl any supporting diagrams

Baseline workspace service template

Create initial infrastructure template for a workspace service

Invoke and install workspace bundle

Initial implementation for Composition Service executing a workspace bundle to have it installed in the TRE instance from a user request.

Acceptance criteria

CNAB container is built as part of make process and pushed into ACR
Bundle installation triggered by new work detected by the resource processor
Resource change operator create a new deployment client and pass the required params
Bundle executed based on input parameters sent by resource change operator

March 2021.1 Plan

For this milestone we should focus on getting ready to implement the first workspace services. Create the baseline templates and explore how we should implement the deployment service and its siblings.

Establish repo with contributions and support guidelines. #2, #4
Start documentation with initial section on concepts. #3
Design management API contract. #12
Explore deployment service and management API and configuration store options, #13
Explore Azure Machine Learning configuration options for closed VNet setup #19, #25
Explore Virtual Desktop service options #18, #26
Build baseline resource templates. #14
Design authentication and authorization. #15

Describe the key concepts of the TRE

Describe key concepts such as trusted research environment, workspace, service etc. Let's add this as the first page to the Docs/Concepts folder.

Research Azure CAF and Azure Blueprints

What are Azure Landing Zones and Azure Blueprints?
Which features are they providing?
Any patterns we reuse?

Explore deployment service and management API and configuration store options

This issue is to explore how to implement three components which are interconnected namely the:

Deployment Service - reliable and auditable deployments of workspaces and services (aka resources)
Configuration Store - metadata and desired configuration for resources
Management API - API surface for apps and scripts to invoke the deployment service and read (and update) the configuration store.

High-level requirements

Repeatable and reliable deployment runs - Ability to re-run and retry in cases of intermittent errors etc.
Auditable deployment runs - Need to know who initiated a deployment run when.
Declarative resource definitions
Open for extensibility - Allow others to extend with additional service types etc.
Management API needs to retrieve configuration data to serve requests from callers - i.e List services in workspace 'a'
Handle operations across several control planes such as identity provider, resource providers, resource data plane.
Clean up and removal - i.e delete a workspace
Deployment services are managed by an Azure operations generalist - monitor health and troubleshoot deployments.
Also see #12

Edge cases we need to cater for

Simultaneous updates of a shared resource - Example: Two workspace deployment runs are trying to update the same shared Firewall with their required rules
Hide eventual consistency of the system to the users - show the actual resource state and not the desired state. Example: resource being updated or deleted.

Implementation

Review past implementations
Create spike

Ideas

Given above requirements, Terraform and GitHub actions would probably take us a long way. Let's discuss.

Baseline workspace template

Initial infrastructure template for workspace

Azure Machine Learning for InnerEye workspace service

High-level requirements

Deployed to VNet and adheres to the TRE workspace egress rules.
Only accessible from the Virtual Desktop service in the same workspace.
- Azure AD authentication
Ability to manage AML clusters and compute instances within the TRE workspace and not outside of the TRE workspace.
AML storage provisioned in the TRE workspace.
No public endpoints.
Pass specified level of IG (Information Governance) - see #16
- Capture potential show-stoppers.
Configuration on which services within AML should be enabled and accessible. e.g. Interactive Notebooks on/off.
Model export between AML instances , potentially in different TRE workspaces (This will be handled as part of the data sharing feature #33).
Ability to deploy models.
Ability to connect to AML compute from Virtual Machines within the same TRE workspace.

Spikes

Explore AML configuration options and requirements for closed VNet setup #25

Implementation

Refactor templates from spike #25
Create workspace service bundle #132
App Package shared service #21
Source code mirror shared service #22
#584

Airlock - workspace data export (Design)

Preventing data exfiltration is of absolute importance but there is a need to be able to export certain products of the work that have been done within the workspace such as ML models, new data sets to be pushed back to the data platform, reports, and similar artifacts.

A high level egress workflow would look like:

Researcher uploads (or links to?) data via the TRE web portal from within a workspace
Data is scanned for viruses (and potentially PII)
Data export approver receives notification. They log onto a secure VM and can view file, and results of scans.
Once approved data gets moved to a staging location
Data can be downloaded by the researcher from via the portal from outside the workspace

Package/Source mirror management

Researchers will need all kinds of packages/libraries when working with data.

What types of package feeds are needed?
How will the process of enabling a new package or updated version be handled?

Requirements: TBD

Update SUPPORT.MD with content about this project's support experience

Design authentication and authorization

High level requirements

Access to each workspace is restricted to a group of users.
One user can have access to multiple workspaces.
Support Active Directory scenarios for Conditional Access and Privileged Identity.
Auditing of auth events on workspace level.
- Check HIPAA requirements
A user can either have the role of an workspace owner of the workspace or a researcher of the workspace.
A user can be a TRE administrator to manage shared services and other aspects that spans workspaces.
The roles a user have in each workspace will depict what actions can be performed.
Users who need access to a workspace can originate from multiple organizations.

What we're not doing

User and group management - Managing roles/groups and users will initially be managed via Azure Active Directory

Shared Service

How and what will the shared services work?

Multiple instances of shared services?
Should shared services be specialized? E.g. data share, packages, egress firewall?

Requirements: TBD

Create doc on user roles

Describe the user roles of the TRE

TRE Developer
TRE Service Integrator
Azure Administrator
TRE Administrator
TRE Workspace owner
Researcher

Related to #16

Update concepts.md to describe the Composition Service

The concepts doc describes and depicts the Management API, which is only one piece of the Composition Service.
Add the Composition Service to the diagram instead of the API and update the text to describe the broader scope of the Composition Service

microsoft / azuretre Goto Github PK

azuretre's Introduction

Azure Trusted Research Environment

Background

Project Status and Support

Contributing

Trademarks

Repository structure

azuretre's People

Contributors

Stargazers

Watchers

Forkers

azuretre's Issues

User story

Acceptance critiera

Acceptance criteria

High-level requirements

Implementation

Milestone 0.2

Currently planned for milestone 0.3

Acceptance critiera

Acceptance criteria

Description

Acceptance criteria

Examples

Create workspace

Create workspace service - Virtual Desktops with shared Firewall

Create workspace service resource - Linux DS VM

High-level requirements

Implementation

Requirements

Implementation

High level flow

Shared services to-dos

Core (Shared) services

Workspace

Workspace service

User story

Acceptance critiera

User story

Acceptance criteria

User story

Acceptance criteria

Acceptance criteria

High-level requirements

Edge cases we need to cater for

Implementation

Ideas

High-level requirements

Spikes

Implementation

High level requirements

What we're not doing

Recommend Projects

Recommend Topics

Recommend Org