crflynn / databricks-api Goto Github PK

View Code? Open in Web Editor NEW

60.0 6.0 15.0 105 KB

A simplified, autogenerated API client interface using the databricks-cli package

License: MIT License

Python 90.85% Makefile 9.15%

databricks api-client

databricks-api's People

Stargazers

Watchers

Forkers

tysoncung tedjt fagan2888 anogues bbertincourt youghurt biglinkage alexott ebarault skotep mukeshpatil2021 mattias-de-coninck mrmasterplan rmasiniexpert peterdowdy

databricks-api's Issues

Cannot use DatabricksAPI.workspace

According to the documentation I think I should be able to from databricks_api import DatabricksAPI and then access the DatabricksAPI.workspace, but this doesnt seem to be working. My main goal is to check if a directory structure exists, and if not create one.

In [1]: from databricks_api import DatabricksAPI

In [2]: DatabricksAPI.workspace.list()
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-2-09fa92501b4f> in <module>
----> 1 DatabricksAPI.workspace.list()

AttributeError: type object 'DatabricksAPI' has no attribute 'workspace'

In [3]: DatabricksAPI.workspaces
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-3-12099cc2227b> in <module>
----> 1 DatabricksAPI.workspaces

AttributeError: type object 'DatabricksAPI' has no attribute 'workspaces'

It looks like the DatabricksAPI object doesn't have any publicly accessible attrs:

DatabricksAPI.__dict__

mappingproxy({'__module__': 'databricks_api.databricks',
              '__init__': <function databricks_api.databricks.DatabricksAPI.__init__(self, **kwargs)>,
              '__dict__': <attribute '__dict__' of 'DatabricksAPI' objects>,
              '__weakref__': <attribute '__weakref__' of 'DatabricksAPI' objects>,
              '__doc__': None})

User creation through scim api

Hello

I forked your project and added the following to be able to create new users through the api.

https://docs.databricks.com/dev-tools/api/latest/scim/scim-users.html

def create_user(self, user_name=None, headers=None):
    _data = {}
    if user_name is not None:
        _data['schemas'] = ["urn:ietf:params:scim:schemas:core:2.0:User"]
        _data['userName'] = user_name
        _data['entitlements'] = [{'value': 'allow-cluster-create'}]		
    return self.client.perform_query('POST', '/preview/scim/v2/Users', data=_data, headers=headers)

Of course it can be improved to allow to pass groups and so on. Just in case you want it.

db.jobs.list_runs returns a different amount of runs than limit specified

Hi,

I am having issues with the method db.jobs.list_runs because it seems that the limit parameter isn't used and the number of runs returned always differs from limit.

For example:

len(db.jobs.list_runs(limit=25)['runs'])

returns 21 elements instead of the 25 expected.

Add tasks to an existing job

Hello guys, I'm implementing this Lib as an interface in the manipulations, but I couldn't understand how I'm going to add a new task to an existing job.
How could I do this?

Example:

There is the method

db.jobs.reset_job(
    "job_id",
    new_settings,
)

and I need something like:

db.jobs.update_job(
    "job_id",
    new_settings,
)

Relax databricks-cli version restriction

Right now the package depends on the databricks-cli 0.12.x only, while the latest version is 0.14.3 - it would be useful to relax dependency version to avoid dependency on the old version

init_script support in creating cluster

Hello,
Thanks for providing this nice wrapper.
I was wondering if there is anyway to set 'init_scripts' parameter in db.cluster.create_cluster command. (which exists in official website for Cluster API 2.0 here: https://docs.databricks.com/dev-tools/api/latest/clusters.html)

Thanks,

No support fot job_clusters in db.jobs.create_job (API 2.1)

how do you do multitask

Does the library support jobs api 2.1 as we want to try out multi task using this library.

Create Cluster init_script

I like your API very much and I will use it in my CI pipeline. Unfortunately I have a problem adding my init_script to the cluster.

This is my code:

cluster_json = db.cluster.create_cluster(
   num_workers=2,
   cluster_name="az-ckw-uieb-databricks-devops_test",
   spark_version="5.5.x-scala2.11",
   spark_conf=None,
   node_type_id="Standard_DS3_v2",
   spark_env_vars={
       "PYSPARK_PYTHON": "/databricks/python3/bin/python3"
   },
   autotermination_minutes=120,
   enable_elastic_disk=True,
   init_scripts=[{'dbfs': {'destination': 'dbfs:/databricks/scripts/oracle-install.sh'}}],
)

However, if I execute it, I got this error message:

TypeError: create_cluster() got an unexpected keyword argument 'init_scripts'

Any idea?

Many Thanks
Christoph

Support for Azure-Databricks

Is there any timeline to support Azure-Databricks API?

Add Support for Azure and GCP

Currently methods db.cluster.create_cluster(), db.cluster.edit_cluster(), db.instance_pool.create_instance_pool(), and db.instance_pool.edit_instance_pool() only support API calls to AWS-based databricks workspaces. I'd recommend adding azure_attributes and gcp_attributes as parameters to these functions to support API calls on all platforms.

edit: opened an issue on databricks-cli here

Usage of urllib3 is outdated specific to method_whitelist

Specific traceback:

    method_whitelist=set({'POST'}) | set(Retry.DEFAULT_METHOD_WHITELIST),
AttributeError: type object 'Retry' has no attribute 'DEFAULT_METHOD_WHITELIST'

urllib3 was updated for neutral language, therefore affecting method_whitelist. Specifically, Retry.DEFAULT_METHOD_WHITELIST was changed to Retry.DEFAULT_ALLOWED_METHODS.

New git_source parameter not in API

The JobsCreate portion of the API now has a git_source parameter in the 2.1 API. When I try to use this however, I get the following error:

TypeError: create_job() got an unexpected keyword argument 'git_source'

I'm assuming this means the parameters need to be rescanned in a new update of the package; is there a plan to do this any time soon?

compatibility with python3.8

Was not able to use edit_cluster

The snippet:
db.cluster.edit_cluster(cluster_id, spark_version="10.3.x-scala2.12")
Could you share some sample on using this edit_cluster API?

HTTPError Traceback (most recent call last)
~/Library/Python/3.7/lib/python/site-packages/databricks_cli/sdk/api_client.py in perform_query(self, method, path, data, headers, files, version)
137 try:
--> 138 resp.raise_for_status()
139 except requests.exceptions.HTTPError as e:

~/Library/Python/3.7/lib/python/site-packages/requests/models.py in raise_for_status(self)
940 if http_error_msg:
--> 941 raise HTTPError(http_error_msg, response=self)
942

HTTPError: 400 Client Error: Bad Request for url: https://adb-xyz.azuredatabricks.net/api/2.0/clusters/edit

During handling of the above exception, another exception occurred:

HTTPError Traceback (most recent call last)
/var/folders/0q/1yts_p_s4rq_1x984fxlbv600000gp/T/ipykernel_15169/3832242952.py in
1 if name == "main":
----> 2 main()

/var/folders/0q/1yts_p_s4rq_1x984fxlbv600000gp/T/ipykernel_15169/3778288670.py in main()
69 unravel_spark2_cluster_configs,
70 unravel_spark3_cluster_configs,
---> 71 output_directory_path)
72

/var/folders/0q/1yts_p_s4rq_1x984fxlbv600000gp/T/ipykernel_15169/2351089658.py in configureInteractiveClustersWithUnravel(cluster_list, workspace_id2api, workspace_spark_verisons, unravel_spark2_cluster_configs, unravel_spark3_cluster_configs, output_path)
91
92 db.cluster.edit_cluster(cluster_id,
---> 93 spark_version="10.3.x-scala2.12")
94
95
~/Library/Python/3.7/lib/python/site-packages/databricks_cli/sdk/service.py in edit_cluster(self, cluster_id, num_workers, autoscale, cluster_name, spark_version, spark_conf, aws_attributes, node_type_id, driver_node_type_id, ssh_public_keys, custom_tags, cluster_log_conf, spark_env_vars, autotermination_minutes, enable_elastic_disk, cluster_source, instance_pool_id, headers)
365 _data['instance_pool_id'] = instance_pool_id
366 print(_data)
--> 367 return self.client.perform_query('POST', '/clusters/edit', data=_data, headers=headers)
368
369 def get_cluster(self, cluster_id, headers=None):

~/Library/Python/3.7/lib/python/site-packages/databricks_cli/sdk/api_client.py in perform_query(self, method, path, data, headers, files, version)
144 except ValueError:
145 pass
--> 146 raise requests.exceptions.HTTPError(message, response=e.response)
147 return resp.json()
148

HTTPError: 400 Client Error: Bad Request for url: https://adb-xyz.azuredatabricks.net/api/2.0/clusters/edit
Response from server:
{ 'error_code': 'INVALID_PARAMETER_VALUE',
'message': 'Missing required field: Size'}

crflynn / databricks-api Goto Github PK

databricks-api's People

Stargazers

Watchers

Forkers

databricks-api's Issues

Recommend Projects

Recommend Topics

Recommend Org