Coder Social home page Coder Social logo

azure-player / azure.synapse.tools Goto Github PK

View Code? Open in Web Editor NEW
19.0 2.0 7.0 213 KB

PowerShell module to deploy Synapse workspace (and more) in Microsoft Azure.

License: MIT License

PowerShell 100.00%
publish synapse azure ci-cd azure-synapse synapsetools

azure.synapse.tools's Introduction

azure.synapse.tools

What is supported

The deployment of these objects:

  • Workspace instance
  • dataset
  • dataflow
  • integration runtime
  • linked service
  • pipeline
  • KQL script *
  • SQL script *
  • notebook *
  • Spark job definition *

* via RestAPI only

What is NOT yet supported

The deployment of these objects:

  • credential
  • 'AzResource' deployment method
  • Apache Spark pools (BigDataPool - #11)

How to start

Install-Module

To install the module, open PowerShell command line window and run the following lines:

Install-Module -Name azure.synapse.tools -Scope CurrentUser
Import-Module -Name azure.synapse.tools

If you want to upgrade module from a previous version:

Update-Module -Name azure.synapse.tools

Check your currently available version of module:

Get-Module -Name azure.synapse.tools

The module is available on PowerShell Gallery.

Publish Options

  • DeleteNotInSource: Deletes objects in destination that does not exist in source.
  • IncrementalDeployment: Deployment state file to only deploy changed objects in the source.

Incremental Deployment

The Synapse service does not have global parameter capability as in Azure Data Factory (ADF). In order to maintain a deployment state of changed objects, a storage account and json file will hold the deployment state. The file will be in the naming convention: <synapse-workspace-name>_deployment_state.json. If IncrementalDeployment is used, please find the prerequisites below.

  1. Authenticated user with Storage Blob Data Contributor rbac role on the destination storage account.
  2. azure-synapse-tools container is required prior to deploying a Synapse workspace.

Release Notes

New features, bug fixes and changes can be found here.

Misc

New feature requests

Tell me your thoughts or describe your specific case or problem.
For any requests on new features please raise a new issue here: New issue

azure.synapse.tools's People

Contributors

liquorichris avatar mikegoatly-coeo avatar nowinskik avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

azure.synapse.tools's Issues

Deployment of pipeline which reference Notebook fails

I have a simple pipeline with one activity of type Notebook that reference a Spark notebook. When deploying this pipeline I get error message:
ASWT0014: Type [Notebook] is not supported.

Pipeline definition:
{ "name": "pl_execute_notebook_cicd", "properties": { "description": "Pipeline to trigger from Azure DevOps release to execute deployed Spark notebook dynamically", "activities": [ { "name": "execute_notebook", "type": "SynapseNotebook", "dependsOn": [], "policy": { "timeout": "7.00:00:00", "retry": 0, "retryIntervalInSeconds": 30, "secureOutput": false, "secureInput": false }, "userProperties": [], "typeProperties": { "notebook": { "referenceName": "generate_job_pipeline", "type": "NotebookReference" }, "snapshot": true, "sparkPool": { "referenceName": "labsparkpool01", "type": "BigDataPoolReference" } } } ], "parameters": { "notebook_name": { "type": "string" }, "sparkpool_name": { "type": "string" } }, "folder": { "name": "Lakehouse/Maintenance" }, "annotations": [] } }

Same pipeline had dynamic reference to Notebook from pipeline parameter:
"typeProperties": { "notebook": { "referenceName": { "value": "@pipeline().parameters.notebook_name", "type": "Expression" }, "type": "NotebookReference" }, "snapshot": true, "sparkPool": { "referenceName": { "value": "@pipeline().parameters.sparkpool_name", "type": "Expression" }, "type": "BigDataPoolReference" } }

With dynamic reference to Notebook I got another error message:
ASWT0029: Unknown object type: parameters.

Desired behaviour is to ignore reference if it is set by Expression.

STEP: Replacing all properties environment-related fails

When including a configfile (-Stage "$configCsvFile") I get an error.

STEP: Replacing all properties environment-related...
##[error]A parameter cannot be found that matches parameter name 'option'.

Code in Publish-SynapseFromJson.ps1:

if (![string]::IsNullOrEmpty($Stage)) {
        Update-PropertiesFromFile -synapse $synapse -stage $Stage -option $opt
    } else {
        Write-Host "Stage parameter was not provided - action skipped."
    }

Code in Update-PropertiesFromFile.ps1:

function Update-PropertiesFromFile {
    [CmdletBinding()]
    param (
        [Parameter(Mandatory)] [Synapse] $synapse,
        [Parameter(Mandatory)] [string] $stage,
        [switch] $dryRun = $false
        )

I guess this is similar/related to issue #2

SQLScript Replacement \\n

When trying to use config-prod.csv to replace the script in a sqlscript as part of a synapse deployment, I found that the library saves the file with escaped values, e.g. \n instead of \n etc. After debugging I found that it is due to the $output = ($obj.Body | ConvertTo-Json -Compress:$true -Depth 100) line in Save-SynapseObectAsFile.ps1.

Test script:

$output = "{type=SqlScript; name=Populate serverless; path=content.query; value=IF NOT EXISTS (SELECT * FROM sys.external_file_formats WHERE name = 'SynapseDeltaFor
mat') \n\t"

$output | ConvertTo-Json -Depth 100

Output:

{type=SqlScript; name=Populate serverless; path=content.query; value=IF NOT EXISTS (SELECT * FROM sys.external_file_formats WHERE name = \u0027SynapseDeltaFor\r\nmat\u0027) \\n\\t

DocDiagram Pipeline Filter

enhancement

In a large/complex project the DocDiagram generated can become cluttered and confusing to understand due to the number of dependencies.

Please add an option to filter to a specific pipeline and display that particular pipeline and all downstream dependencies.
This will allow separate diagrams to be created for each pipeline if required.

Example:
$synapse = Import-SynapseFromFolder -RootFolder $RootFolder -SynapseWorkspaceName 'whatever' -PipelineFilter 'pipelinename'

  • Include optional Pipeline filter - Default all pipelines

Include Pipeline Activities in DocDiagram

enhancement

I've found the SynapseDocDiagram function that generates code to produce a mermaid diagram very useful for understanding dependencies when taking over an existing project.

I'd like the option to also include Activities within the diagram to have a holistic view of all elements included within a pipeline.

Example of Activities option:
$synapse = Import-SynapseFromFolder -RootFolder $RootFolder -SynapseWorkspaceName 'whatever' -IncludeActivities 'Yes'

Example of Activity displayed within diagram:
[activities:type].[activities:name]

  • Options to include Activities within diagram - Default 'No'
  • Display 'Activities Type' in diagram
  • Display 'Activities Name' in diagram

STEP: Deployment of all Synapse objects fails

I have tried deploying but it keeps giving me errors. I tried with all different types of objects. I tried a single LS with dependencies, a LS without dependencies, I deployed one manually to see if it got to the point where it checks that it has already been deployed but it never gets that far. Here's an example. Unfortunately, no helpful error message.

======================================================================================
### azure.synapse.tools                                            Version 0.16.000 ###
======================================================================================
Invoking Publish-SynapseFromJson (https://github.com/SQLPlayer/azure.synapse.tools)
with the following parameters:
======================================================================================
RootFolder:         D:\a\1\a\SynapseAnalytics\
ResourceGroupName: [REDACTED]
Synapse Workspace:  [REDACTED]
Location:           WestEurope
Stage:              
Options provided:   True
Publishing method:  AzResource
======================================================================================
Publish options are provided.
STEP: Verifying whether Synapse workspace exists...
Synapse Workspace exists.
===================================================================================
STEP: Reading Synapse Workspace from JSON files...
IntegrationRuntimes: 2 object(s) loaded.
LinkedServices: 6 object(s) loaded.
Pipelines: 11 object(s) loaded.
DataSets: 4 object(s) loaded.
DataFlows: 1 object(s) loaded.
Triggers: 4 object(s) loaded.
SqlScripts: 0 object(s) loaded.
KqlScripts: 0 object(s) loaded.
Notebooks: 0 object(s) loaded.
Managed VNet: 1 object(s) loaded.
Managed Private Endpoints: 5 object(s) loaded.
# Number of objects marked as to be deployed: 1/34
- [linkedService].[LS_KEV]
===================================================================================
STEP: Replacing all properties environment-related...
Stage parameter was not provided - action skipped.
===================================================================================
STEP: Stopping triggers...
Getting triggers...
===================================================================================
STEP: Deployment of all Synapse objects...
Start deploying object: [linkedService].[LS_KEV] (0 dependency/ies)
##[error]
CorrelationId: 381e728a-bf2a-44dd-a9fb-bd68b7ebfc73
##[error]PowerShell exited with code '1'.

Unsupported Type: SparkConfigurations

Deployment throws error "ASWT0029: Unknown object type: SparkConfiguration" when deploying a notebook (example shown below) that has a reference to a custom spark configuration. I believe this may be a very similar issue to #11 where SparkConfiguration would need to be added to the allowed valid types in private/!SynapseObject.class.ps1 as well as private/Get-SynapseObjectByName.ps1 so that it can pretend that the object referenced is valid and exists.
Notebook:

{
	"name": "nb_example",
	"properties": {
		"folder": {
			"name": "utility"
		},
		"nbformat": 4,
		"nbformat_minor": 2,
		"bigDataPool": {
			"referenceName": "sparkSm",
			"type": "BigDataPoolReference"
		},
		"targetSparkConfiguration": {
			"referenceName": "CustomSparkConfig",
			"type": "SparkConfigurationReference"
		},

Failure:

**mVERBOSE: A***lyzing notebook dependencies...
**mVERBOSE: Folder: D:\a\***\drop\***\notebook
**mVERBOSE: - nb_example.json
Failure occurred while publishing artifacts to ***
##[debug]Error record:
##[debug]Exception: D:\a\_temp\a560ab36-63de-480f-888a-bc877***f23f0a.ps***:38
##[debug]     |  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
##[debug]     | ASWT0029: Unknown object type: SparkConfiguration.

azure.synapse.tools Version 0.23.000

Delete not in source

Hello,

I was trying to use the option DeleteNotInSource, but it does not look like the option is being applied to the publish step in Synapse. Can you please add that as a part of the Publish-AzSynapseFromJson function?

Thanks.

DevOps Extension

Hello @NowinskiK,

This is not really an issue but more of a question related to the synapse module. Are you planning on developing an Azure DevOps task similar to the Data Factory tasks. We currently use them and it is awesome! Just curious if any plans about Synapse.

Thanks again,
Chris

Stopping triggers fails

STEP: Stopping triggers...
Getting triggers...
##[error]The property 'RuntimeState' cannot be found on this object. Verify that the property exists.

My triggers have been successfully deployed when there weren't any present. The next time, it fails the step where it wants to stop them.

As the error message is saying, the property RuntimeState is missing for Synapse, where it is present for DataFactoryV2. Seems like a vital property to me..

Unsupported Type: Apache Spark pools

Error occurs when deploying notebooks with a spark pool defined. Sample of notebook json:

{
	"name": "my_notebook",
	"properties": {
		"folder": {
			"name": "my_folder"
		},
		"nbformat": 4,
		"nbformat_minor": 2,
		"bigDataPool": {
			"referenceName": "sspklbdp01",
			"type": "BigDataPoolReference"
		},
            ....

Returns error:

VERBOSE: Analyzing notebook dependencies...
VERBOSE: Folder: D:\a\1\b\synapse_deploy\notebook
VERBOSE: - my_notebook.json
##[debug]Error record:
##[debug]
##[debug]Exception: C:\Users\VssAdministrator\Documents\PowerShell\Modules\azure.synapse.tools\0.18.0\private\Import-SynapseObjects.ps1:20
##[debug]Line |
##[debug]  20 |      Get-ChildItem "$folder" -Filter "*.json" | Where-Object { !$_.Nam โ€ฆ
##[debug]     |      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
##[debug]     | ASWT0029: Unknown object type: BigDataPool.

azure.synapse.tools Version 0.18.000

Error on DevOps Viewer

I don't know if in DevOps is working or not

::: mermaid
graph LR
pipeline.pipeline1 --> dataset.DelimitedText1
dataset.DelimitedText1 --> linkedService.AzureBlobStorage1
IntegrationRuntime.AutoResolveIntegrationRuntime --> managedVirtualNetwork.default
:::

image

Parse error on line 4:
...nagedVirtualNetwork.default
-----------------------^
Expecting 'SEMI', 'NEWLINE', 'SPACE', 'EOF', 'SQS', 'AMP', 'STYLE_SEPARATOR', 'PS', '(-', 'STADIUMSTART', 'SUBROUTINESTART', 'CYLINDERSTART', 'DIAMOND_START', 'TAGEND', 'TRAPSTART', 'INVTRAPSTART', 'START_LINK', 'LINK', 'DOWN', 'NUM', 'COMMA', 'ALPHA', 'COLON', 'MINUS', 'BRKT', 'DOT', 'PUNCTUATION', 'UNICODE_TEXT', 'PLUS', 'EQUALS', 'MULT', 'UNDERSCORE', got 'DEFAULT'

Pipelines with a single activity fail when deployed with stage config

When running the Publish-SynapseFromJson with a csv environment configuration pipelines that contain only a single activity fail.

The resulting "~pipeline.json" file is missing the array [] syntax in the output file. I suppose this is due to Powershell unboxing arrays.

Example:

pipeline.json
{ "name": "SetVariable", "properties": { "activities": [ { "name": "SetVar", "type": "SetVariable", "dependsOn": [], "userProperties": [], "typeProperties": { "variableName": "MyDummyVar", "value": "MyDummyVal" } } ], "variables": { "MyDummyVar": { "type": "String" } }, "annotations": [] } }

environment.csv
type,name,path,value pipeline,pipeline,activities[0].typeProperties.value,"MyProductionValue"

~pipeline.json
{ "name": "SetVariable", "properties": { "activities": { "name": "SetVar", "type": "SetVariable", "dependsOn": [], "userProperties": [], "typeProperties": { "variableName": "MyDummyVar", "value": "MyProductionValue" } }, "variables": { "MyDummyVar": { "type": "String" } }, "annotations": [] } }

PL with WebActivity with Basic Authentication throws exception

I have a pipeline with a WebActivity that uses Basic Authentication, which throws an exception. I've narrowed it down to this part, because when I change authentication to MSI, it works fine.

Start deploying object: [pipeline].[PL_CHECK_DEVOPS_STATUS] (2 dependency/ies)
VERBOSE: 1) Depends on: [LinkedService].[LS_KEV]
VERBOSE: Object [linkedService].[LS_KEV] is already deployed.
VERBOSE: 2) Depends on: [IntegrationRuntime].[SelfHostedIntegrationRuntime]
Start deploying object: [IntegrationRuntime].[SelfHostedIntegrationRuntime] (0 dependency/ies)
VERBOSE: Ready to deploy from file: C:\Synapse\integrationRuntime\SelfHostedIntegrationRuntime.json
VERBOSE: Integration Runtime type detected: Self-Hosted
Finished deploying object: [IntegrationRuntime].[SelfHostedIntegrationRuntime]
VERBOSE: Ready to deploy from file: C:\Synapse\pipeline\PL_CHECK_DEVOPS_STATUS.json
Set-AzSynapsePipeline: C:\PowerShell\Modules\azure.synapse.tools\0.18.0\private\Deploy-SynapseObjectOnly.ps1:100
Line |
 100 |              Set-AzSynapsePipeline `
     |              ~~~~~~~~~~~~~~~~~~~~~~~
     | Exception has been thrown by the target of an invocation.

Finished deploying object: [pipeline].[PL_CHECK_DEVOPS_STATUS]

It's related to this part of code in the pipeline:

"authentication": {
	"type": "Basic",
	"username": {
		"value": "@pipeline().parameters.devopsUsername",
		"type": "Expression"
	},
	"password": {
		"type": "AzureKeyVaultSecret",
		"store": {
			"referenceName": "LS_KEV",
			"type": "LinkedServiceReference"
		},
		"secretName": "<mySecret>"
	}
}

When I change it to this, it works fine:

"authentication": {
	"type": "MSI",
	"resource": "https://management.azure.com/"
}

The secret exists, I selected it using the UI:
image

Deployment fails for pipeline with single element in an array after update of properties

Hi again Kamil, new customer for me and new use case for your great tool :).
I've got an error (Exception has been thrown by the target of an invocation) from Set-AzSynapsePipeline for a pipeline where I replace properties with Stage parameter. By comparing the pipeline before and after the replacement I found out that "activities" with only one child activity was missing [] in the final json file. In my case it was an foreach activity with only one child but the same error happens in a simple pipeline with one activity.
I googled a bit on ConvertTo-Json which seems to have this bug of treating arrays with single elements wrongly. But -Depth parameter should have resolved this error. So I figured my "activities" should be converted to array earlier. Since PowerShell is not my strongest side the solution became a foolish fix in form of the following code:
$arr = @()
$arr += $Item.$prop
$Item.$prop = $arr

in ConvertFrom-OrderedHashTablesToArrays in the end of if ($Item.$prop.GetType().Name -eq "OrderedDictionary") statement. You will surely find a more elegant fix.

Here is a test pipeline:
{ "name": "TestSingleActivity", "properties": { "activities": [ { "name": "Set variable1", "type": "SetVariable", "dependsOn": [], "userProperties": [], "typeProperties": { "variableName": "TestVariable", "value": "1" } } ], "variables": { "TestVariable": { "type": "String", "defaultValue": "test" } }, "annotations": [] } }

And a test config file:
type,name,path,value
pipeline,TestSingleActivity,variables.TestVariable.defaultValue,"test1"

Add Incremental Deployment

Capability of having an incremental deployment. Since there are no global parameters in Synapse, the idea is to utilize a storage account to hold state.

Add new line character to config

Hello,

I am trying to add a config replacement for a notebook parameter, but it requires a new line at the end of the string.

Example:

type,name,path,value

notebook,notebook1,$.properties.cells[0].source[14],"output_storage_account = 'myStorage'\r\n"

However, in the notebook, there is no new line and combines the next line with the following:

output_storage_account = 'myStorage'\\r\\ntemp_output="temp"

Is there a specific syntax I should be using in the csv?

Thanks.

Including an option gives an error

##[error]A parameter cannot be found that matches parameter name 'option'.

Using this code:

$opt = New-SynapsePublishOption
$opt.StopStartTriggers = $false
Publish-SynapseFromJson -RootFolder "$RootFolder" -ResourceGroupName "$ResourceGroupName" -SynapseWorkspaceName "$SynapseWorkspaceName" -Location "$Location" -Option $opt

Probably because ApplyExclusionOptions.ps1 only has the $synapse parameter.

The synapse code is different from adf:
synapse:

    # Apply Deployment Options if applicable
    if ($null -ne $Option) {
        ApplyExclusionOptions -synapse $synapse -option $opt
    }

adf:

    # Apply Deployment Options if applicable
    if ($null -ne $Option) {
        ApplyExclusionOptions -adf $adf
    }

Mermaid diagram plugin in VS Code expects three backticks instead of three colons

I've tried out this tool this morning and it give's a nice overview of the mess I've made ๐Ÿ˜„

The only thing is, I've added the Markdown Preview Mermaid Support pluging and found that your code starts with ::: mermaid. But I've found that, to show the diagram instead of the code, I needed ``` at the beginning of the file.

When I googled on Mermaid markdown, I found a lot of back ticks and not so many colons.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.