databrickslabs / cicd-templates Goto Github PK

View Code? Open in Web Editor NEW

200.0 200.0 101.0 619 KB

Manage your Databricks deployments and CI with code.

License: Other

Python 99.25% Makefile 0.75%

aws azure azure-devops cd-pipeline ci databricks github-actions gitlab mlops

cicd-templates's People

Contributors

Stargazers

Watchers

Forkers

cibelled vitorpsilva mlauradsh rramos priyanlc mallik-g dkamotsky eleuuterio renardeinside d-soft-slo kand008 germancm dexianta mindis islamtg aditya-chaturvedi lishuaijing3 yvesmauron piyushcse29 cicd-templates-test tandy-ivadolabs shasidhar casperlehmann oktab1 hazardsy mark-s-bartlett davar-playgrounds pradeep-axl data-corentinv vicchugu sdennyk ashisaraswat pingu1m tonyqiu2020 syllogy darambaris picpay gjeevanm adeelahmad drahnreb gangekouame rahimigit morristech tabdulghani equinox-platformx kiransirasani krishnanand-pandey arturomf94 dbtamisin simon-k ahenry-capgemini hagarciag karthik-hbo bngom jiteshsoni-db jeroenboeye krypc-sharath jonathanneo kivanc-databricks alcristian suthars kishore-km tayoakshogi ramiljoaquin saira2021skipq kkaarel samratbhatnagar echarso neoantonia juanromero-sage davidmaddox-saic edurdevic johnnieblas007 arafiyi barolo3 cpradobliss enriquema mikeskov codegiverazure yinxi-db pavankatta94 jiamaozheng belal-mohsen conrkuma jcmartinezovando ryanfernandes09 rmasiniexpert amitca71 raogrr pavanc027 meruguvikram laurabethstone michelinmichelle myintwl angeljaviersalazar riyafineshift saasfun

cicd-templates's Issues

Windows issue with replacing workspace_dir file forward slash with backward slashes

Running on Windows, the project.json values (+ local file locations?) seem to cause an error. Python seems to be trying to replace forward slashes with backward slashes
windows_file_error.txt

The documentation notes that the dbx commands are executed in bash. Is Windows not supported?

(dbconnect_aws) C:\Users\user1\sample_aws>dbx execute --cluster-name=test-cluster --job=sample_aws-sample-integration-test
Traceback (most recent call last):
File "c:\local\miniconda3\envs\dbconnect_aws\lib\site-packages\databricks_cli\sdk\api_client.py", line 131, in perform_query
resp.raise_for_status()
File "c:\local\miniconda3\envs\dbconnect_aws\lib\site-packages\requests\models.py", line 941, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://x.databricks.com/api/2.0/workspace/mkdirs

...

Response from server:
{ 'error_code': 'INVALID_PARAMETER_VALUE',
'message': "Path (\Shared\dbx\projects) doesn't start with '/'"}

Are there code repo for dbx, instead of only providing as a wheel

Is `Prepare profile` really necessary step in the `onpush` workflow?

In the onpush workflow we have the following step:

- name: Prepare profile
  run: |
    echo "[dbx-dev-azure]" >> ~/.databrickscfg
    echo "host = $DBX_AZURE_HOST" >> ~/.databrickscfg
    echo "token = $DBX_AZURE_TOKEN" >> ~/.databrickscfg

As I understand correctly it is required for the dbx tool. Is it possible to add support of environment variables to dbx or to pass HOST and TOKEN as an arguments?

I think it is important by two reasons:

If dbx can read environment than we can remove this step and it also means that we remove one more place when we are working with sensitive data. As I know databricks-cli works correctly with the environment.
Keep sensitive data in the custom local file could lead us to security issue. Yes, it should be cleaned up after our workflow run, but looks scary anyway. GitHub recommends pass secrets through Settings or Environment file.

move documentation to github pages

Documentation stored in README.md becomes too big to manage it manually.
A better approach would be to use GitHub Pages as the main storage for documentation.

AC:

README.md left only for reference purposes and links to the documentation.
Documentation is well-formatted and presented on GitHub Pages.

Notebook task job spec

I need to write pipeline tests as Scala Notebooks. Will the databrickslabs_cicdtemplates Python module support more job specs?

for example:

  "notebook_task": {
    "notebook_path": "will get filled automatically"
  }

Where is the source code of bundled tool dbx

Hello,

We would like play with the bundles dbx tool, is there any source available?

For instance, how would you get the "deployed full path in databricks aka dbfs://Shared/Projects/xxxxx/yourlibname" out of the "dbx deploy" command?

Is there a way to pass / export a custom path to dbx deploy command.

Thank you!

Best regards,
Andrej

What should happen to projects created using the old template?

Just noticed that the entire structure of the template has been changed. Was wondering if you could tell me what should happen to all the projects built using the old template? Should they be migrated to follow the new template?

ERROR: Could not install packages due to an EnvironmentError: [Errno 2] No such file or directory: '/home/runner/work/field_analytics/field_analytics/deployment/databrickslabs_mlflowdepl-0.2.0-py3-none-any.whl'

GitLab CI/CD Pipeline fails due to incorrect job name

When using GitLab as the CI/CD tool, the Gitlab Pipeline fails when deploying Databricks jobs. I believe this is because .gitlab-ci.yml uses {project_name} instead of {{cookiecutter.project_name}} in the deploy and launch commands.

ModuleNotFoundError: No module named 'cryptography.hazmat.backends'

Following the instructions in the readme results in an error (before adding any additional code). Run 'dbx -- version' before adding any actual application code.
Environment: Windows 10 with Anaconda. Same error for AWS and Azure cookiecutter

Describe security model for deployment objects

Security is always important, so the following explanations should be given:

How to secure the workspace_dir with the underlying mlflow experiment
How to secure the artifact_location on the underlying FS (ADLS/S3).

Acceptance criteria:

Meaning and location of workspace_dir and artifact_location are described
Workspace directory security measures are described
Artifact location security measures are described

Rename databrickslabs_mlflowdepl-0.2.0-py3-none-any.whl to databrickslabs_cicid_templates-0.2.0-py3-none-any.whl

[DISCUSSION] How should this ideally interact with Databricks "Repos for Git integration"?

Databricks Repos for Git Integration seems like a relatively new way to manage / deploy code in a Databricks environment. I understand that cicd-templates uses dbx which (this part may be wrong) packages up jobs into .whl files to be installed in Databricks environments.

From the "Repos for Git Integration" docs, one part says:

If you are using %run commands to make Python or R functions defined in a notebook available to another notebook, or are installing custom .whl files on a cluster, consider including those custom modules in a Databricks repo.

This seems like a different approach vs. the dbx approach. Just wondering if there are any guidelines here on what's appropriate when, how these approaches overlap / complement each other, and if the future of this project / dbx involves a deeper integration with the "Repos" approach.

Introduce self-hosted pypi repo scenario

Some users might use a self-hosted PyPI repository.

API changes

Introduce a new parameter for dbx deploy, which will disable the automated editing of job spec with package location:

dbx deploy --no-rebuild --no-package

This change is backward-compatible.

Doc changes

We need to add the documentation example for the self-hosted PyPI repo.

Acceptance criteria:

API change is introduced
Doc change is introduced

Giving Access to Private pypi Repo Hosted in Azure to Databricks

Hi,
I am running into what is likely authentication issues when running dbx deploy --jobs=<job> --files-only --no-package I tried the solutions below:

In deployments.json:
"libraries": [
{
"pypi": {"package": "mypckg", "repo": "pckglink"}
}
],
Excluding '--no-package' creates the deployment-result.json file under artifacts->dbx in my databricks workspace. It fails since the pypi package reference is incorrectly set up. It runs 'pip install --index-url repourl' since it splits the name as another package and this of course fails. The url has the token so this should work if we had this instead: "pypi": { "package": "package_name --index-url repourl" } . Any thoughts on
"libraries": [
{
"whl": "whllocation.whl"
},
{
"pypi": {
"package": "--index-url repourl"
}
},
{
"pypi": {
"package": "package_name"
}
}
]

Create and manage persistent clusters

Hello,

I was wondering if it was possible to define persistent cluster within the deployment.json file. Side by side with the jobs definition.
With something like the following syntax:

{
    "default": {
        "clusters": [{
            "cluster_name": "test",
            ...   
            }
        }],
        "jobs": [{
            "name": "test",
            "existing_cluster_id": "test",
            ....
        }]
    }
}

How to run jobs on an existing databricks cluster?

In the demo workflow, it seems a new databricks cluster gets created. How to appoint the job to run on an existing databricks cluster? Should I change the dev_cicd_pipeline.py in order to specify cluster id?

Btw, it seems the source code ofdev_cicd_pipeline.py is not available, although it is listed in README.md.

Deployment
│ ├── init.py
│ ├── deployment.py
│ ├── dev_cicd_pipeline.py
│ └── release_cicd_pipeline.py

Generated README sample using wrong dbx command

Was playing with the project today and I noticed a small issue in the docs that get generated after creating the project via cookiecutter. Specifically the snippet in the README can be found at https://github.com/databrickslabs/cicd-templates/blob/master/%7B%7Bcookiecutter.project_slug%7D%7D/README.md#testing for the following snippet.

For a test on a automated job cluster, use launch instead of execute:

dbx execute --job=cicd-sample-project-sample-integration-test

Based on my understanding of the project and its commands, I think the snippet should be the following

dbx launch --job=cicd-sample-project-sample-integration-test

If you guys are taking Pull Requests, I'll fork and put the fix up for review to help out. I've been enjoying go through this project, so kudos to you.

Is it possible to package Structured Streaming Applications via this CI/CD Framework

Hello -

I've recently stumbled upon this framework and so far am really enjoying the workflow put forth by the team. Thank you!

I'm working on a PySpark application that relies upon structured streaming to get data out of Kafka, and I'm running into issues getting the pipeline to complete after running a deployment to our release cluster. I have versions of this code working in Databricks notebooks, so I'm trying to decipher what it is I'm missing in the context of this ci/cd framework..

The dbx deployments via Azure Pipelines and the sample jobs execute as expected - so I can rule out a deployment issue.

I'm aware of the limitation that databricks-connect has regarding structured streaming support, but am wondering if that's getting passed along to the job cluster I'm spinning up for the job I've crafted.

Any wisdom you could impart would be greatly appreciated. Below is the full stack-trace of the standard error from the spark job.

Thanks in advance,

/databricks/python/lib/python3.7/site-packages/IPython/config.py:13: ShimWarning: The `IPython.config` package has been deprecated since IPython 4.0. You should import from traitlets.config instead.
  "You should import from traitlets.config instead.", ShimWarning)
/databricks/python/lib/python3.7/site-packages/IPython/nbconvert.py:13: ShimWarning: The `IPython.nbconvert` package has been deprecated since IPython 4.0. You should import from nbconvert instead.
  "You should import from nbconvert instead.", ShimWarning)
Thu Feb 25 00:55:55 2021 py4j imported
Thu Feb 25 00:55:55 2021 Python shell started with PID  2490  and guid  1c65e51ed47b4b95a033bef96013d3f8
Thu Feb 25 00:55:55 2021 Initialized gateway on port 46845
Thu Feb 25 00:55:56 2021 py4j imported
Thu Feb 25 00:55:56 2021 Python shell executor start
Dropped logging in PythonShell:

b'/local_disk0/tmp/1614214540907-0/PythonShell.py:1084: DeprecationWarning: The `use_readline` parameter is deprecated and ignored since IPython 6.0.\n  parent=self,\n'
/databricks/python/lib/python3.7/site-packages/IPython/config.py:13: ShimWarning: The `IPython.config` package has been deprecated since IPython 4.0. You should import from traitlets.config instead.
  "You should import from traitlets.config instead.", ShimWarning)
/databricks/python/lib/python3.7/site-packages/IPython/nbconvert.py:13: ShimWarning: The `IPython.nbconvert` package has been deprecated since IPython 4.0. You should import from nbconvert instead.
  "You should import from nbconvert instead.", ShimWarning)
Thu Feb 25 00:55:59 2021 py4j imported
Thu Feb 25 00:55:59 2021 Python shell started with PID  2514  and guid  5d2f02b085044e0e81d7961dfacf6ab7
Thu Feb 25 00:55:59 2021 Initialized gateway on port 36749
Thu Feb 25 00:56:00 2021 py4j imported
Thu Feb 25 00:56:00 2021 Python shell executor start
Dropped logging in PythonShell:

b'/local_disk0/tmp/1614214540907-0/PythonShell.py:1084: DeprecationWarning: The `use_readline` parameter is deprecated and ignored since IPython 6.0.\n  parent=self,\n'
---------------------------------------------------------------------------
Py4JJavaError                             Traceback (most recent call last)
<command--1> in <module>
     12 
     13 with open(filename, "rb") as f:
---> 14   exec(f.read())
     15 

<string> in <module>

<string> in launch(self)

<string> in run_topic_ingestion(self, topic, topic_idx)

/databricks/spark/python/pyspark/sql/streaming.py in load(self, path, format, schema, **options)
    418             return self._df(self._jreader.load(path))
    419         else:
--> 420             return self._df(self._jreader.load())
    421 
    422     @since(2.0)

/databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py in __call__(self, *args)
   1303         answer = self.gateway_client.send_command(command)
   1304         return_value = get_return_value(
-> 1305             answer, self.gateway_client, self.target_id, self.name)
   1306 
   1307         for temp_arg in temp_args:

/databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
    125     def deco(*a, **kw):
    126         try:
--> 127             return f(*a, **kw)
    128         except py4j.protocol.Py4JJavaError as e:
    129             converted = convert_exception(e.java_exception)

/databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
    326                 raise Py4JJavaError(
    327                     "An error occurred while calling {0}{1}{2}.\n".
--> 328                     format(target_id, ".", name), value)
    329             else:
    330                 raise Py4JError(

Py4JJavaError: An error occurred while calling o399.load.
: java.lang.NullPointerException
	at org.apache.spark.sql.kafka010.KafkaSourceProvider.validateGeneralOptions(KafkaSourceProvider.scala:243)
	at org.apache.spark.sql.kafka010.KafkaSourceProvider.org$apache$spark$sql$kafka010$KafkaSourceProvider$$validateStreamOptions(KafkaSourceProvider.scala:331)
	at org.apache.spark.sql.kafka010.KafkaSourceProvider.sourceSchema(KafkaSourceProvider.scala:71)
	at org.apache.spark.sql.execution.datasources.DataSource.sourceSchema(DataSource.scala:242)
	at org.apache.spark.sql.execution.datasources.DataSource.sourceInfo$lzycompute(DataSource.scala:122)
	at org.apache.spark.sql.execution.datasources.DataSource.sourceInfo(DataSource.scala:122)
	at org.apache.spark.sql.execution.streaming.StreamingRelation$.apply(StreamingRelation.scala:35)
	at org.apache.spark.sql.streaming.DataStreamReader.load(DataStreamReader.scala:221)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
	at py4j.Gateway.invoke(Gateway.java:295)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:251)
	at java.lang.Thread.run(Thread.java:748)```

Add integration testing pipelines

To make sure that new functionality is working properly, we need to add integration testing pipelines at least for basic use-cases.

File not found when running integration tests on GitHub Actions

By default, in the onpush workflow, integration tests are first deployed as a job on Databricks and then launched as a job.

cicd-templates/{{cookiecutter.project_slug}}/.github/workflows/onpush.yml

Lines 50 to 56 in 5cf90b3

    
                 - name: Deploy integration test 
        
                   run: | 
        
                     dbx deploy --jobs={project_name}-sample-integration-test 
        
                 - name: Run integration test 
        
                   run: | 
        
                     dbx launch --job={project_name}-sample-integration-test --trace

What I would like to do here is to directly execute the integration test on a specific cluster without deploying the test as a job on Databricks and executing the job.

To achieve this, I have removed:

      - name: Deploy integration test
        run: |
          dbx deploy --jobs=<job_name>

      - name: Run integration test
        run: |
          dbx launch --job=<job_name> --trace

and added into the workflow file (onpush.yml):

      - name: Run integration test
        run: |
          dbx execute --cluster-id=<id> --job=<job_name> --requirements-file=unit-requirements.txt

When the GitHub action workflow is executed, it breaks at the Run integration test job, with the error:

FileNotFoundError: [Errno 2] No such file or directory: '.dbx/lock.json'

I understand that this file contains the execution context and hence is in the .gitignore.

As a result, is there a way that an integration test can be run without deploying it as a job on Databricks?

dbx deploy not working for single node clusters

When trying to run dbx deploy on the following job its fails.

{
                "name": "anabricks_cd-sample",
                "new_cluster": {
                    "spark_version": "7.3.x-scala2.12",
                    "spark_conf": {
                        "spark.databricks.delta.preview.enabled": "true",
                        "spark.master": "local[*]",
                        "spark.databricks.cluster.profile": "singleNode"
                    },
                    "node_type_id": "Standard_DS3_v2",
                    "custom_tags": {
                        "ResourceClass": "SingleNode",
                        "job": "anabricks_sample"
                    },
                    "enable_elastic_disk": true,
                    "init_scripts": [
                        {
                            "dbfs": {
                                "destination": "dbfs:/monitoring/datadog_install_driver_only.sh"
                            }
                        }
                    ],
                    "azure_attributes": {
                        "availability": "ON_DEMAND_AZURE"
                    },
                    "num_workers": 0
                },
                "libraries": [],
                "email_notifications": {
                    "on_start": [],
                    "on_success": [],
                    "on_failure": []
                },
                "max_retries": 0,
                "spark_python_task": {
                    "python_file": "anabricks_cd/jobs/sample/entrypoint.py",
                    "parameters": [
                        "--conf-file",
                        "conf/test/sample.json"
                    ]
                }
            }

The error is

Traceback (most recent call last):
  File "c:\tools\miniconda3\envs\risbricks\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "c:\tools\miniconda3\envs\risbricks\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\tools\miniconda3\envs\risbricks\Scripts\dbx.exe\__main__.py", line 7, in <module>
  File "c:\tools\miniconda3\envs\risbricks\lib\site-packages\click\core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "c:\tools\miniconda3\envs\risbricks\lib\site-packages\click\core.py", line 782, in main
    rv = self.invoke(ctx)
  File "c:\tools\miniconda3\envs\risbricks\lib\site-packages\click\core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "c:\tools\miniconda3\envs\risbricks\lib\site-packages\click\core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "c:\tools\miniconda3\envs\risbricks\lib\site-packages\click\core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "c:\tools\miniconda3\envs\risbricks\lib\site-packages\dbx\commands\deploy.py", line 94, in deploy
    requirements_payload, package_requirement, _file_uploader)
  File "c:\tools\miniconda3\envs\risbricks\lib\site-packages\dbx\commands\deploy.py", line 191, in _adjust_job_definitions
    _walk_content(adjustment_callback, job)
  File "c:\tools\miniconda3\envs\risbricks\lib\site-packages\dbx\commands\deploy.py", line 244, in _walk_content
    _walk_content(func, item, content, key)
  File "c:\tools\miniconda3\envs\risbricks\lib\site-packages\dbx\commands\deploy.py", line 244, in _walk_content
    _walk_content(func, item, content, key)
  File "c:\tools\miniconda3\envs\risbricks\lib\site-packages\dbx\commands\deploy.py", line 244, in _walk_content
    _walk_content(func, item, content, key)
  File "c:\tools\miniconda3\envs\risbricks\lib\site-packages\dbx\commands\deploy.py", line 249, in _walk_content
    parent[index] = func(content)
  File "c:\tools\miniconda3\envs\risbricks\lib\site-packages\dbx\commands\deploy.py", line 188, in <lambda>
    adjustment_callback = lambda p: _adjust_path(p, artifact_base_uri, file_uploader)
  File "c:\tools\miniconda3\envs\risbricks\lib\site-packages\dbx\commands\deploy.py", line 256, in _adjust_path
    elif pathlib.Path(candidate).exists():
  File "c:\tools\miniconda3\envs\risbricks\lib\pathlib.py", line 1336, in exists
    self.stat()
  File "c:\tools\miniconda3\envs\risbricks\lib\pathlib.py", line 1158, in stat
    return self._accessor.stat(self)
  File "c:\tools\miniconda3\envs\risbricks\lib\pathlib.py", line 387, in wrapped
    return strfunc(str(pathobj), *args)
OSError: [WinError 123] The filename, directory name, or volume label syntax is incorrect: 'local[*]'

The _adjust_path function needs to be changed to skip spark_conf

If I remove "spark.master": "local[*]", the job will deploy however it won't start since driver will keep throwing this warning

Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

Add github actions integration test testing run_now.py

dbx launch --trace gets stuck for an hour

We are trying the azure devops flavour of the ci cd pipeline.

when we get to this stage to run integration tests it hangs for over an hour.

- script: |
    dbx launch --job=cicd_demo_1-sample-integration-test --trace
    displayName: 'Launch integration on test'

Actual behaviour:
We get the below error message repeated for over an hour until the azure pipeline times out (default is 1 hour)

Skipping this run because the limit of 1 maximum concurrent runs has been reached.

Expected behaviour:
If there is concurrency run of the integration test job then the dbx launch --trace should either fail immediately or have some configurable retry window (retry ever minute for 5 mins) otherwise exit with an error.

Unstable behaviour with --existing-runs cancel

dbx launch --existing-runs cancel must wait for a full stop of the canceled run, otherwise behavior might be unstable if the job stops too slowly.

com.databricks.backend.daemon.data.common.InvalidMountException

When Databricks runs the Job on the cluster we get the following error;

Message: Library installation failed for library due to infra fault for whl: "dbfs:/databricks/mlflow-tracking/x/x/artifacts/dist/pipeline-0.1.0-py3-none-any.whl" . Error messages: java.lang.RuntimeException: ManagedLibraryInstallFailed: com.google.common.util.concurrent.UncheckedExecutionException: com.databricks.backend.daemon.data.common.InvalidMountException: Error while using path /databricks/mlflow-tracking/x/x/artifacts/dist/pipeline-0.1.0-py3-none-any.whl for resolving path '/x/x/artifacts/dist/pipeline-0.1.0-py3-none-any.whl' within mount at '/databricks/mlflow-tracking'. for library:PythonWhlId(dbfs:/databricks/mlflow-tracking/x/x/artifacts/dist/pipeline-0.1.0-py3-none-any.whl,,NONE),isSharedLibrary=false

Is this anything to do with the below?

https://kb.databricks.com/machine-learning/mlflow-artifacts-no-client-error.html#invalid-mount-exception

Is this commit related to the issue because we pulled but it was still failing

4536390

Run testing coverage on integration tests

Hello,
I'm trying to check my tests code coverage using coverage.py, it's easy to do it for the unit tests because they are launched using pytest: pytest --cov=standard_datatamers_lib tests/unit --cov-report=xml:cobertura.xml, but I'm struggling to find a way to do it with the integration tests, because they are launched using dbx: launch --job=standard_datatamers_lib-sample-integration-test-feature --trace.
Do you have a way to do that? If not, are you planning to create something in regard to it?
Thanks in advance :)

If user provides directory for mlflow experiment on set up, create an experiment with a default name instead of throwing error.

Exception: INVALID_PARAMETER_VALUE: Expected id '3624504707471432' to be an MLflow experiment or Databricks notebook, found 'DIRECTORY'. If this experiment id was previously valid, your experiments may have been migrated. For more information, see 'https://docs.databricks.com/applications/mlflow/experiments.html#experiment-migration'.".

IndexError: string index out of range when deploying artifact

Problem running dbx execute with example

I'm not able to run dbx execute --cluster-name my-cluster --job job-sample. As far as I understand, in dbx execute, wheel was built, copied to dbfs, and jobs/sample/entrypoint.py was serialized to directly run on the interactive cluster, without providing --conf-file. However the Job class defined in project_name/common.py requires --conf-file, thus exception were thrown

========================================
So far, it seems to me that:

dbx execute is suitable to launch script without external config files, since it's using 1.2 API command execute.
if I want to run the local code on interactive cluster (for fast dev iteration), I'll need a separate job json with "exisiting_cluster_id". then run launch (which is /job/run-now underneath)

Error when using dbutils.secrets in Integration tests

Hello,
I would like to write integration tests that need to use one/many Databricks secrets. Unfortunately, for the tests that use dbutils.secrets.get(), the following error message is thrown:

This happens when I execute the test when using dbx directly from the command line, as well as when I deploy it as a job on Databricks and launch it from there.

Is there a way that secrets can be accessed when running the tests?

Deploying a script relying on multiple files

Greetings,

I am starting to play a little with what this repository offers and there is a use case I cannot seem to make work.

Basically, I have a single entrypoint that imports classes from sibling files located in the same directory, like this :

|--projectname/
|----jobs/
|------jobsfolder/
|--------jobA.py
|--------jobB.py
|--------entrypoint.py

With entrypoint being this :

from jobA import jobA
from jobB import jobB

if __name__=="__main__":
    jobA().launch()
    jobB().launch()

Relevant part of the job definition is the following :

    "spark_python_task": {
        "python_file": "projectname/jobs/jobsfolder/entrypoint.py"
    }

When trying to deploy or execute this job, I get a ModuleNotFound error on jobA. It seems only logical since only entrypoint.py was uploaded to MlFlow.

Dipping my toes into the code I unwheeled from DBX, it looks like the intended behavior but I feel like it is a fairly frequent use case when trying to have a readable and well documented code.

Am I missing something about this use case or is it just not supported as of now ?

Anyways thank you for your work on this project, it has been very useful and a lot of fun to use for now !

Build artifact task error: invalid command 'bdist_wheel'

I'm running this pipeline in Azure DevOps. The Build Artifact task exits with code '1' and the error details

Script contents:
python setup.py bdist_wheel
========================== Starting Command Output ===========================
/bin/bash --noprofile --norc /home/vsts/work/_temp/9ca428f0-653f-47d8-8a33-6ef61518c30c.sh
usage: setup.py [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
   or: setup.py --help [cmd1 cmd2 ...]
   or: setup.py --help-commands
   or: setup.py cmd --help

error: invalid command 'bdist_wheel'

##[error]Bash exited with code '1'.

The earlier task Install Python Dependencies was succesful and appears to have installed wheel.

Collecting wheel
  Downloading wheel-0.34.2-py2.py3-none-any.whl (26 kB)

Readme needs to be updated

In the readme, the 2nd point under "Short instructions" needs to be modified from cookiecutter [email protected]:databricks/mlflow-deployments.git to cookiecutter [email protected]:databrickslabs/cicd-templates.git

add run submit mechanics to the launch command

The concept of run submit is widely used and shall be also covered by dbx.

Devops pipeline doesn't fail even though testing failed in cluster job.

Hello,
when I'm running some integration tests and I get an error I can see something like this on the cluster's logs ======================================================================
FAIL: test_datawarehouse (main.DatawarehouseIntegrationTest)

Traceback (most recent call last):
File "", line 116, in test_datawarehouse
File "", line 247, in test_mergePk
AssertionError

Ran 2 tests in 210.850s

FAILED (failures=1)

Then I'm expecting to see a failure in the related Azure devops pipeline but that's not the case, how could I make the pipeline fail if there's an error in the job?

On azure devops i see the following:
[dbx][2021-01-11 15:17:10.918] Job run is not yet finished, current status message: In run
[dbx][2021-01-11 15:17:16.011] Job run finished successfully
[dbx][2021-01-11 15:17:16.011] Launch command finished

dbx execute doesn't work on ml cluster

If you attempt to run dbx execute it won't work on ml clusters.

Since

%pip install -U -r {path}

Is only supported on databricks runtime and not databricks ML Cluster

Cannot read the python file dbfs:/Shared/dbx/projects/databricks_pipelines/../artifacts/tests/integration/sample_test.py

I am following the template with Databricks hosted on AWS and running with GitHub actions but bumped into the below error.

What could cause an error like this? Cannot read the python file...

Appreciate any help. Thank you!

Run dbx launch --job=databricks-pipelines-sample-integration-test --as-run-submit --trace
  dbx launch --job=databricks-pipelines-sample-integration-test --as-run-submit --trace
  shell: /usr/bin/bash -e {0}
  env:
    DATABRICKS_HOST: ***
    DATABRICKS_TOKEN: ***
    pythonLocation: /opt/hostedtoolcache/Python/3.7.5/x64
    LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.7.5/x64/lib
[dbx][2021-05-15 09:24:53.764] Launching job databricks-pipelines-sample-integration-test on environment default
[dbx][2021-05-15 09:24:53.765] Using configuration from the environment variables
[dbx][2021-05-15 09:24:55.251] No additional tags provided
[dbx][2021-05-15 09:24:55.254] Successfully found deployment per given job name
[dbx][2021-05-15 09:24:56.436] Launching job via run submit API
[dbx][2021-05-15 09:24:56.943] Run URL: ***#job/2703892/run/1
[dbx][2021-05-15 09:24:56.943] Tracing run with id 3032139
[dbx][2021-05-15 09:25:02.037] [Run Id: 3032139] Current run status info - result state: None, lifecycle state: PENDING, state message: Installing libraries
[dbx][2021-05-15 09:25:07.129] [Run Id: 3032139] Current run status info - result state: None, lifecycle state: PENDING, state message: Installing libraries
[dbx][2021-05-15 09:25:12.227] [Run Id: 3032139] Current run status info - result state: None, lifecycle state: RUNNING, state message: In run
[dbx][2021-05-15 09:25:17.325] [Run Id: 3032139] Current run status info - result state: None, lifecycle state: RUNNING, state message: In run
[dbx][2021-05-15 09:25:22.417] [Run Id: 3032139] Current run status info - result state: None, lifecycle state: RUNNING, state message: In run
[dbx][2021-05-15 09:25:27.509] [Run Id: 3032139] Current run status info - result state: FAILED, lifecycle state: INTERNAL_ERROR, state message: Cannot read the python file dbfs:/Shared/dbx/projects/databricks_pipelines/2dc5616b50a943dc96e014e06174abda/artifacts/tests/integration/sample_test.py. Please check driver logs for more details.
[dbx][2021-05-15 09:25:27.510] Finished tracing run with id 3032139

Pipeline stuck at "Installing Library" because the library has been installed before on an existing cluster

Environment:

Databricks on Azure
CICD pipeline on Azure DevOps

I followed the cookiecutter template and successfully ran it the first time. However, on the second CI/CD run, it's stuck at the step Installing libraries.

For this, I'm using an existing cluster id instead of new cluster for each CI/CD run. When I check the cluster, I saw that the library was installed (during the 1st run) and now it's trying to install the same library again.

How should we avoid this situation?

[dbx][2021-05-18 16:49:23.342] [Run Id: 12] Current run status info - result state: None, lifecycle state: PENDING, state message: Installing libraries
[dbx][2021-05-18 16:49:28.564] [Run Id: 12] Current run status info - result state: None, lifecycle state: PENDING, state message: Installing libraries
[dbx][2021-05-18 16:49:33.787] [Run Id: 12] Current run status info - result state: None, lifecycle state: PENDING, state message: Installing libraries
[dbx][2021-05-18 16:49:39.012] [Run Id: 12] Current run status info - result state: None, lifecycle state: PENDING, state message: Installing libraries
[dbx][2021-05-18 16:49:44.276] [Run Id: 12] Current run status info - result state: None, lifecycle state: PENDING, state message: Installing libraries

Problem executing my jobs to cluster dbx deploy --jobs=project_name fails

I am keep getting this error when using this project.

response = verify_rest_response(response, endpoint)
File "/Users/charalampossouris/miniforge3/lib/python3.9/site-packages/mlflow/utils/rest_utils.py", line 137, in verify_rest_response
raise MlflowException("%s. Response body: '%s'" % (base_msg, response.text))
mlflow.exceptions.MlflowException: API request to endpoint was successful but the response body was not in a valid JSON format. Response body: '<!doctype html><title>Databricks - Sign In</title>

I am almost confident that my DATABRICKS-HOST and DATABRICKS-TOKEN are correct and placed in the right places.

I will continue debugging it but I was hoping for some hints :)

When executing "databricks workspace list" I can see the workspaces, which indicates to me that my credentials are correctly placed.

ModuleNotFoundError: No module named 'sklearn' in deploy step

See here:
https://github.com/databrickslabs/cicd-templates/runs/838470486?check_suite_focus=true#step:7:28

No module named 'path'

$ cookiecutter https://github.com/databrickslabs/cicd-templates.git

You've downloaded /home/zhen/.cookiecutters/cicd-templates before. Is it okay to delete and re-download it? [yes]: 
project_name [cicd-sample-project]: 
version [0.0.1]: 
description [Databricks Labs CICD Templates Sample Project]: 
author []: 
Select cloud:
1 - AWS
2 - Azure
3 - Google Cloud
Choose from 1, 2, 3 (1, 2, 3) [1]: 
Select cicd_tool:
1 - GitHub Actions
2 - Azure DevOps
3 - GitLab
Choose from 1, 2, 3 (1, 2, 3) [1]: 3
project_slug [cicd_sample_project]: 
workspace_dir [/Shared/dbx/cicd_sample_project]: 
artifact_location [dbfs:/Shared/dbx/projects/cicd_sample_project]: 
profile [DEFAULT]: 
Traceback (most recent call last):
  File "/tmp/tmp5scvfog5.py", line 5, in <module>
    from path import Path
ModuleNotFoundError: No module named 'path'
ERROR: Stopping generation because post_gen_project hook script didn't exit successfully
Hook script failed (exit status: 1)

	- name: Deploy integration test
	run: \|
	dbx deploy --jobs={project_name}-sample-integration-test

	- name: Run integration test
	run: \|
	dbx launch --job={project_name}-sample-integration-test --trace

databrickslabs / cicd-templates Goto Github PK

cicd-templates's People

Contributors

Stargazers

Watchers

Forkers

cicd-templates's Issues

Hello, when I'm running some integration tests and I get an error I can see something like this on the cluster's logs ====================================================================== FAIL: test_datawarehouse (main.DatawarehouseIntegrationTest)

Recommend Projects

Recommend Topics

Recommend Org

Hello,
when I'm running some integration tests and I get an error I can see something like this on the cluster's logs ======================================================================
FAIL: test_datawarehouse (main.DatawarehouseIntegrationTest)