crflynn / databricks-api Goto Github PK
View Code? Open in Web Editor NEWA simplified, autogenerated API client interface using the databricks-cli package
License: MIT License
A simplified, autogenerated API client interface using the databricks-cli package
License: MIT License
According to the documentation I think I should be able to from databricks_api import DatabricksAPI
and then access the DatabricksAPI.workspace
, but this doesnt seem to be working. My main goal is to check if a directory structure exists, and if not create one.
In [1]: from databricks_api import DatabricksAPI
In [2]: DatabricksAPI.workspace.list()
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-2-09fa92501b4f> in <module>
----> 1 DatabricksAPI.workspace.list()
AttributeError: type object 'DatabricksAPI' has no attribute 'workspace'
In [3]: DatabricksAPI.workspaces
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-3-12099cc2227b> in <module>
----> 1 DatabricksAPI.workspaces
AttributeError: type object 'DatabricksAPI' has no attribute 'workspaces'
It looks like the DatabricksAPI object doesn't have any publicly accessible attrs:
DatabricksAPI.__dict__
mappingproxy({'__module__': 'databricks_api.databricks',
'__init__': <function databricks_api.databricks.DatabricksAPI.__init__(self, **kwargs)>,
'__dict__': <attribute '__dict__' of 'DatabricksAPI' objects>,
'__weakref__': <attribute '__weakref__' of 'DatabricksAPI' objects>,
'__doc__': None})
Hello
I forked your project and added the following to be able to create new users through the api.
https://docs.databricks.com/dev-tools/api/latest/scim/scim-users.html
def create_user(self, user_name=None, headers=None):
_data = {}
if user_name is not None:
_data['schemas'] = ["urn:ietf:params:scim:schemas:core:2.0:User"]
_data['userName'] = user_name
_data['entitlements'] = [{'value': 'allow-cluster-create'}]
return self.client.perform_query('POST', '/preview/scim/v2/Users', data=_data, headers=headers)
Of course it can be improved to allow to pass groups and so on. Just in case you want it.
Hi,
I am having issues with the method db.jobs.list_runs
because it seems that the limit
parameter isn't used and the number of runs returned always differs from limit
.
For example:
len(db.jobs.list_runs(limit=25)['runs'])
returns 21 elements instead of the 25 expected.
Hello guys, I'm implementing this Lib as an interface in the manipulations, but I couldn't understand how I'm going to add a new task to an existing job.
How could I do this?
Example:
There is the method
db.jobs.reset_job(
"job_id",
new_settings,
)
and I need something like:
db.jobs.update_job(
"job_id",
new_settings,
)
Right now the package depends on the databricks-cli 0.12.x only, while the latest version is 0.14.3 - it would be useful to relax dependency version to avoid dependency on the old version
Hello,
Thanks for providing this nice wrapper.
I was wondering if there is anyway to set 'init_scripts' parameter in db.cluster.create_cluster command. (which exists in official website for Cluster API 2.0 here: https://docs.databricks.com/dev-tools/api/latest/clusters.html)
Thanks,
Does the library support jobs api 2.1 as we want to try out multi task using this library.
Hi
I like your API very much and I will use it in my CI pipeline. Unfortunately I have a problem adding my init_script to the cluster.
This is my code:
cluster_json = db.cluster.create_cluster(
num_workers=2,
cluster_name="az-ckw-uieb-databricks-devops_test",
spark_version="5.5.x-scala2.11",
spark_conf=None,
node_type_id="Standard_DS3_v2",
spark_env_vars={
"PYSPARK_PYTHON": "/databricks/python3/bin/python3"
},
autotermination_minutes=120,
enable_elastic_disk=True,
init_scripts=[{'dbfs': {'destination': 'dbfs:/databricks/scripts/oracle-install.sh'}}],
)
However, if I execute it, I got this error message:
TypeError: create_cluster() got an unexpected keyword argument 'init_scripts'
Any idea?
Many Thanks
Christoph
Is there any timeline to support Azure-Databricks API?
Currently methods db.cluster.create_cluster()
, db.cluster.edit_cluster()
, db.instance_pool.create_instance_pool()
, and db.instance_pool.edit_instance_pool()
only support API calls to AWS-based databricks workspaces. I'd recommend adding azure_attributes
and gcp_attributes
as parameters to these functions to support API calls on all platforms.
edit: opened an issue on databricks-cli here
Specific traceback:
method_whitelist=set({'POST'}) | set(Retry.DEFAULT_METHOD_WHITELIST),
AttributeError: type object 'Retry' has no attribute 'DEFAULT_METHOD_WHITELIST'
urllib3
was updated for neutral language, therefore affecting method_whitelist
. Specifically, Retry.DEFAULT_METHOD_WHITELIST
was changed to Retry.DEFAULT_ALLOWED_METHODS
.
The JobsCreate portion of the API now has a git_source
parameter in the 2.1 API. When I try to use this however, I get the following error:
TypeError: create_job() got an unexpected keyword argument 'git_source'
I'm assuming this means the parameters need to be rescanned in a new update of the package; is there a plan to do this any time soon?
The snippet:
db.cluster.edit_cluster(cluster_id, spark_version="10.3.x-scala2.12")
Could you share some sample on using this edit_cluster API?
HTTPError Traceback (most recent call last)
~/Library/Python/3.7/lib/python/site-packages/databricks_cli/sdk/api_client.py in perform_query(self, method, path, data, headers, files, version)
137 try:
--> 138 resp.raise_for_status()
139 except requests.exceptions.HTTPError as e:
~/Library/Python/3.7/lib/python/site-packages/requests/models.py in raise_for_status(self)
940 if http_error_msg:
--> 941 raise HTTPError(http_error_msg, response=self)
942
HTTPError: 400 Client Error: Bad Request for url: https://adb-xyz.azuredatabricks.net/api/2.0/clusters/edit
During handling of the above exception, another exception occurred:
HTTPError Traceback (most recent call last)
/var/folders/0q/1yts_p_s4rq_1x984fxlbv600000gp/T/ipykernel_15169/3832242952.py in
1 if name == "main":
----> 2 main()
/var/folders/0q/1yts_p_s4rq_1x984fxlbv600000gp/T/ipykernel_15169/3778288670.py in main()
69 unravel_spark2_cluster_configs,
70 unravel_spark3_cluster_configs,
---> 71 output_directory_path)
72
/var/folders/0q/1yts_p_s4rq_1x984fxlbv600000gp/T/ipykernel_15169/2351089658.py in configureInteractiveClustersWithUnravel(cluster_list, workspace_id2api, workspace_spark_verisons, unravel_spark2_cluster_configs, unravel_spark3_cluster_configs, output_path)
91
92 db.cluster.edit_cluster(cluster_id,
---> 93 spark_version="10.3.x-scala2.12")
94
95
~/Library/Python/3.7/lib/python/site-packages/databricks_cli/sdk/service.py in edit_cluster(self, cluster_id, num_workers, autoscale, cluster_name, spark_version, spark_conf, aws_attributes, node_type_id, driver_node_type_id, ssh_public_keys, custom_tags, cluster_log_conf, spark_env_vars, autotermination_minutes, enable_elastic_disk, cluster_source, instance_pool_id, headers)
365 _data['instance_pool_id'] = instance_pool_id
366 print(_data)
--> 367 return self.client.perform_query('POST', '/clusters/edit', data=_data, headers=headers)
368
369 def get_cluster(self, cluster_id, headers=None):
~/Library/Python/3.7/lib/python/site-packages/databricks_cli/sdk/api_client.py in perform_query(self, method, path, data, headers, files, version)
144 except ValueError:
145 pass
--> 146 raise requests.exceptions.HTTPError(message, response=e.response)
147 return resp.json()
148
HTTPError: 400 Client Error: Bad Request for url: https://adb-xyz.azuredatabricks.net/api/2.0/clusters/edit
Response from server:
{ 'error_code': 'INVALID_PARAMETER_VALUE',
'message': 'Missing required field: Size'}
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.