preset-io / backend-sdk Goto Github PK
View Code? Open in Web Editor NEWLicense: Other
License: Other
Superset allows users to use Jinja directly on the SQL query used for a virtual dataset. However, if the dataset has Jinja, importing it through the preset-cli
throws an error.
{{ "'" + "','".join(filter_values('<ColumnName>')) + "'" }}
- or {{ filter_values('ColumnName')|where_in }}
).preset-cli
.The import operation works.
Below error is thrown:
Traceback (most recent call last):
File "/home/vavila/.pyenv/versions/preset-cli/bin/preset-cli", line 8, in <module>
sys.exit(preset_cli())
File "/home/vavila/.pyenv/versions/3.9.0/envs/preset-cli/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/home/vavila/.pyenv/versions/3.9.0/envs/preset-cli/lib/python3.9/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/home/vavila/.pyenv/versions/3.9.0/envs/preset-cli/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/vavila/.pyenv/versions/3.9.0/envs/preset-cli/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/vavila/.pyenv/versions/3.9.0/envs/preset-cli/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/vavila/.pyenv/versions/3.9.0/envs/preset-cli/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/vavila/.pyenv/versions/3.9.0/envs/preset-cli/lib/python3.9/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/home/vavila/.pyenv/versions/3.9.0/envs/preset-cli/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "/home/vavila/.pyenv/versions/3.9.0/envs/preset-cli/lib/python3.9/site-packages/preset_cli/cli/superset/main.py", line 89, in new_command
ctx.invoke(command, *args, **kwargs)
File "/home/vavila/.pyenv/versions/3.9.0/envs/preset-cli/lib/python3.9/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/home/vavila/.pyenv/versions/3.9.0/envs/preset-cli/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "/home/vavila/.pyenv/versions/3.9.0/envs/preset-cli/lib/python3.9/site-packages/preset_cli/cli/superset/sync/native/command.py", line 126, in native
content = template.render(**env)
File "/home/vavila/.pyenv/versions/3.9.0/envs/preset-cli/lib/python3.9/site-packages/jinja2/environment.py", line 1301, in render
self.environment.handle_exception()
File "/home/vavila/.pyenv/versions/3.9.0/envs/preset-cli/lib/python3.9/site-packages/jinja2/environment.py", line 936, in handle_exception
raise rewrite_traceback_stack(source=source)
File "<template>", line 9, in top-level template code
File "/home/vavila/.pyenv/versions/3.9.0/envs/preset-cli/lib/python3.9/site-packages/jinja2/utils.py", line 83, in from_obj
if hasattr(obj, "jinja_pass_arg"):
jinja2.exceptions.UndefinedError: 'filter_values' is undefined
Perform the import through the UI.
Currently, the dashboard state is not included on the exported YAML file. As a consequence, dashboards are always imported in a draft state.
Provide the ability to publish (change the status from draft to published) dashboards in bulk using the CLI.
The CLI currently doesn't support key pair auth for dbt auth on Snowflake.
Support key pair auth with the CLI.
When using the exposures
option, some of the data is missing or lacks a way to control content.
As you can see in the example file below, I have the following issues:
Example exposure file:
version: 2
exposures:
- name: Number of Customers per Day [chart]
type: analysis
maturity: low
url: https://*********.app.preset.io/superset/explore/?form_data=********
description: ''
depends_on:
- ref('ref_contacts')
owner:
name: Dustin Weaver
email: unknown
- name: Customer Dashboard [dashboard]
type: dashboard
maturity: low
url: https://**********.app.preset.io/superset/dashboard/9/
description: ''
depends_on:
- ref('ref_contacts')
owner:
name: Dustin Weaver
email: unknown
Currently, it's possible to use the preset-cli
to export all assets from a Workspace. However, in some cases users might just want to export modified assets, or specific ones.
The CLI only supports exporting all assets from the source Workspace.
Implement the ability to specify which assets should be exported. A possible solution would be to specify the IDs
to be exported.
Currently we only prevent UI edits, we should also disable the deletion button.
When you run a sync command, you can specify a target
parameter. In dbt, this would set the schema for the models as they are defined in the profiles.yml
file.
However, with the preset-cli, this parameter merely sets the database label used in Preset. The actual connection is based on the manifest.json
file, which could have been compiled with a different target.
The target
parameter should set the schema in the Preset connection to match the one defined in the profiles.yml
file in dbt. This could be accomplished by either reading the YML file or recompiling dbt with the given target.
It would be very useful to be able to delete resources in bulk via the CLI. Three use-cases:
username: foo
team-role:
- user
workspace_role:
- Limited Contributor
data_access_role:
- role1
- rol2
rls:
- rls1
- rls2
Running this command (Please note that I removed company sensitive elements):
preset-cli --workspaces=https://[REMOVED].app.preset.io/ superset sync dbt target/manifest.json --project=[REMOVED] --target=dev --import-db --disallow-edits --external-url-prefix=[REMOVED] --exposures=models/exposures.yml --select [REMOVED]+
Results in the following error:
[14:19:52] INFO [[14:19:52]] INFO: preset_cli.cli.superset.sync.dbt.databases: Found an existing database, updating it databases.py:54
Traceback (most recent call last):
File "/usr/local/bin/preset-cli", line 8, in <module>
sys.exit(preset_cli())
File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.9/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/preset_cli/cli/superset/main.py", line 103, in new_command
ctx.invoke(command, *args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/preset_cli/cli/superset/sync/dbt/command.py", line 111, in dbt_core
models = apply_select(models, select, exclude)
File "/usr/local/lib/python3.9/site-packages/preset_cli/cli/superset/sync/dbt/lib.py", line 349, in apply_select
*[
File "/usr/local/lib/python3.9/site-packages/preset_cli/cli/superset/sync/dbt/lib.py", line 350, in <listcomp>
{model["unique_id"] for model in filter_models(models, condition)}
File "/usr/local/lib/python3.9/site-packages/preset_cli/cli/superset/sync/dbt/lib.py", line 235, in filter_models
return filter_plus_operator(models, condition)
File "/usr/local/lib/python3.9/site-packages/preset_cli/cli/superset/sync/dbt/lib.py", line 291, in filter_plus_operator
for child_id in model["children"]
KeyError: 'children'
CLI Version: preset-cli 0.1.0.post1.dev5+gae8587f
dbt Version: dbt 1.2.1
When syncing a project where the target is of type redshift the db ending up in the superset instance is still of postgres backend type. I'd imagine this could cause problems further down the line, or at least be confusing to admins.
If a user doesn't specify the job ID we could prompt them for an account, then project, then job ID, similar to how we do for teams and workspaces.
After the implementation of #130, users should be able to trigger a sync only for exposures, by using the --select
parameter and filtering for a tag that isn't used on dbt.
However, this approach is not working properly - after running below command, no changes are applied to the exposures.yml
file:
preset-cli --workspaces=$PRESET_WORKSPACE \
superset sync dbt-core $MANIFEST_PATH \
--project=$PROJECT_NAME --target=$TARGET_NAME --profiles=$PROFILES_PATH \
--exposures=$EXPOSURES_PATH \
--import-db \
--select tag:not-existent
Also, it would be beneficial to implement a particular command to allow this approach (something like --exposures-only
) rather than having to use the --select
filter and excluding everything.
It seems this is because our staging tables are located in a different database than the default defined in the profiles.yml
file for dbt.
The command I'm running is this (removed company sensitive info):
preset-cli --workspaces=https://[REMOVED].app.preset.io/ superset sync dbt-core target/manifest.json --project=[REMOVED] --target=dev --import-db --disallow-edits --external-url-prefix=[REMOVED] --exposures=models/exposures.yml --select [REMOVED]
When ran, we get the following result:
[09:03:49] INFO [[09:03:49]] INFO: preset_cli.cli.superset.sync.dbt.databases: Found an existing database, updating it databases.py:54
[09:03:52] INFO [[09:03:52]] INFO: preset_cli.cli.superset.sync.dbt.datasets: Creating dataset model.[REMOVED].[REMOVED] datasets.py:50
[09:03:57] ERROR [[09:03:57]] ERROR: preset_cli.lib: {"message":"Fatal error"}
Desired result: Create a new connection for the staging tables to point to the correct database.
Currently, the --import-roles
operation won't properly replace existing DARs on the destination Workspace.
Make the --import-roles
operation idempotent, so that it can be used for continuous sync.
The CLI is a very good tool to perform migrations (Superset -> Superset / Superset -> Preset). However, the current behavior has some issues:
Implement a new flag on the CLI (something like --large-migration
), that would make the CLI import asset per asset, instead of in bulk. For example:
dataset
and the database
YAML files).chart
, dataset
and database
YAML files).dashboard
, chart
, dataset
and database
YAML files).This would solve these problems:
When using the dbt sync functionality of the preset-cli, the command will fail because it's attempting to find a table for a model that does not exist, because said model was created with an alias. This is important functionality within dbt, because it allows you to have unique models across many schemas (or datasets in BigQuery) that may have the same table name.
Traceback Error:
[14:23:56] INFO [[14:23:56]] INFO: preset_cli.cli.superset.sync.dbt.datasets: Creating dataset model.metamap.verifications_all datasets.py:98
ERROR [[14:23:56]] ERROR: preset_cli.lib: {"message":{"table_name":["Table [verifications_all] could not be found, please double check lib.py:98
your database connection, schema, and table name"]}}
ERROR [[14:23:56]] ERROR: preset_cli.cli.superset.sync.dbt.datasets: Unable to create dataset datasets.py:102
Traceback (most recent call last):
File
"/Users/paxonfischer/Documents/GitHub/dbt/.venv/lib/python3.8/site-packages/preset_cli/cli/superset/sync/dbt/datasets.py",
line 100, in sync_datasets
dataset = create_dataset(client, database, model)
File
"/Users/paxonfischer/Documents/GitHub/dbt/.venv/lib/python3.8/site-packages/preset_cli/cli/superset/sync/dbt/datasets.py",
line 66, in create_dataset
return client.create_dataset(**kwargs)
File "/Users/paxonfischer/Documents/GitHub/dbt/.venv/lib/python3.8/site-packages/preset_cli/api/clients/superset.py",
line 518, in create_dataset
return self.create_resource("dataset", **kwargs)
File "/Users/paxonfischer/Documents/GitHub/dbt/.venv/lib/python3.8/site-packages/preset_cli/api/clients/superset.py",
line 450, in create_resource
validate_response(response)
File "/Users/paxonfischer/Documents/GitHub/dbt/.venv/lib/python3.8/site-packages/preset_cli/lib.py", line 99, in
validate_response
raise SupersetError(errors=errors)
preset_cli.exceptions.SupersetError
alias
different from the actual name
of the model (which is typically the name of the .sql file):-- my_model.sql
{{ config(alias='model_alias') }}
SELECT * FROM table
preset-cli --workspaces=https://{workspace_id}.{region}.app.preset.io/ superset sync dbt-core target/manifest.json --target={target} --profiles={path_to_profiles.yml} --project={name_of_project} --import-db
It seems that in create_dataset()
in line 512 of api/clients/superset.py, *kwargs
contains the following dictionary.
{"database": 4, "schema": "verifications", "table_name": "verifications_all"}
"table_name"
should be using the alias
instead of the name of the model. Unsure how far upstream this needs to be adjusted, or how it affects what the final Dataset
name is decided in Preset.
Derived metrics are not properly created on Superset. For example, below schema:
metrics:
- name: countries_count
label: "Count of unique countries"
model: ref('modeled_sales')
description: "Count distinct countries"
calculation_method: count_distinct
expression: country
timestamp: date
time_grains: [day, week, month, year]
- name: total_revenue
label: SUM of all price_each
model: ref('modeled_sales')
description: "The SUM of all price_each"
calculation_method: sum
expression: price_each
timestamp: date
time_grains: [day, week, month, year]
- name: avg_sum_per_country
label: "AVG revenue per country"
description: "Let's try to two expressions"
calculation_method: derived
expression: "{{metric('total_revenue')}} / {{metric('countries_count')}}"
timestamp: date
time_grains: [day, week, month, year]
Would create 3 metrics on Superset, however the nested SQL syntax wouldn't be added. Instead, the metric name is used:
Superset can't handle the metrics by their name, since they are not actual columns on the dataset. Using those metrics on a chart results in an error similar to this:
During the dataset creation (via the CLI), some datasets might get the is_active
attribute for each column set as null. This would cause future updates to the dataset via the CLI to fail resulting in below error:
ERROR: preset_cli.lib: {"message":{"columns":{"0":{"is_active":["Field may not be null."]},"1":{"is_active":["Field may not be null."]},"2":{"is_active":["Field may not be null."]},"3":{"is_active":["Field may not be null."]},"4":{"is_active":["Field may not be null."]},"5":{"is_active":["Field may not be null."]}}}}
Assure that is_active
attribute is set as true
for all columns on datasets created/updated via the CLI.
In longer sessions, the JWT can expire and the other requests will fail.
Currently, trying to sync a derived metric that relies on metrics from multiple models would fail with below error:
Metric {metricName} cannot be calculated because it depends on multiple models
This error prevents the sync from finishing successfully.
Handle this scenario - some suggestions:
meta
field for the metric.dbt-core 1.3 changes the metric spec as follows:
type
--> calculation_method
sql
--> expression
type: expression
--> calculation_method: derived
superset-cli
should:
Schema.from_dict
)From looking at dbt-cloud API, there seems to be know way to dynamically detect the dbt-core version of a particular dbt-cloud environment.
in this case my companion file would be named
dashboards/Jaffle_Shop_8.overrides.yaml
and only containdashboard_title: Jaffle Shop {{ " (staging)" if env.get("SUPERSET_ENV") == "staging" else "" }}
Users would like to be able to sync all exposures, but not all models from dbt to Preset. So, rather than determining which models exist in Preset, selecting those models, and syncing and showing exposures for those models, they would like to skip the first two steps and just show exposures.
If you run a dbt sync with the preset-cli, regardless of setting the --import-db
flag, if a database is found in Preset, it will overwrite the connect. The problem is that the database user used during sync may not be the same user that will be used in Preset.
Make it so if the --import-db
setting is not present, skip overwriting the database connection.
Let's make it work with Superset instances as well.
The current implementation imports all resources in manifest.json
. As dbt provides the model selection syntax, it would be great to filter imported models, sources and metrics based on passed condition. As far as I know, the model selection syntax is not applied when generating manifest.json
. But, it might not be realistic to implement the same in the CLI. So, it would be ok to start with passing tags to select imporeted targets.
The command would be used to select targets which has all passed tags.
% preset-cli --workspaces=https://abcdef12.us1a.app.preset.io/ \
> superset sync dbt /path/to/dbt/my_project/target/manifest.json \
> --project=my_project --target=dev --profile=${HOME}/.dbt/profiles.yml \
> --exposures=/path/to/dbt/my_project/models/exposures.yaml \
> --import-db \
> --external-url-prefix=http://localhost:8080/ \
> --tags tag_a --tags tag_b
When I tried to sync dbt models on BigQuery to superset instance with the following command:
superset-cli -u USER_NAME -p USER_PASS SUPERSET_URL sync dbt-core PATH/TO/manifest.json --project PROJECT_NAME --profiles PATH/TO/profiles.yml --exposures PATH/TO/exposures.yaml --import-db
,
it returned error message of {"message":"Connection failed, please check your connection settings"}
.
However, it works well when using every dbt
commands in the same project.
In api/clients/dbt.py
there's a class called MetricSchema
that's missing two important fields in order to save metrics to Preset.
Please add the following lines to 564 and 565:
calculation_method = fields.String()
expression = fields.String()
This prevents the following error: when syncing with dbt:
% preset-cli --workspaces=https://****.****.app.preset.io/ superset sync dbt-core target/manifest.json --target=local-prod-service-account --profiles=profiles.yml --project=metamap --import-db
https://****.****.app.preset.io/
[15:50:59] INFO [[15:50:59]] INFO: preset_cli.cli.superset.sync.dbt.databases: Found an existing database, updating databases.py:57
it
[15:51:00] INFO [[15:51:00]] INFO: preset_cli.cli.superset.sync.dbt.datasets: Updating dataset datasets.py:96
model.metamap.verifications
Traceback (most recent call last):
File "/Users/***/Documents/GitHub/dbt/.venv/bin/preset-cli", line 8, in <module>
sys.exit(preset_cli())
File "/Users/***/Documents/GitHub/dbt/.venv/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/Users/***/Documents/GitHub/dbt/.venv/lib/python3.8/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/Users/***/Documents/GitHub/dbt/.venv/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/Users/***/Documents/GitHub/dbt/.venv/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/Users/***/Documents/GitHub/dbt/.venv/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/Users/***/Documents/GitHub/dbt/.venv/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Users/***/Documents/GitHub/dbt/.venv/lib/python3.8/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/Users/***/Documents/GitHub/dbt/.venv/lib/python3.8/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "/Users/***/Documents/GitHub/dbt/.venv/lib/python3.8/site-packages/preset_cli/cli/superset/main.py", line 118, in new_command
ctx.invoke(command, *args, **kwargs)
File "/Users/***/Documents/GitHub/dbt/.venv/lib/python3.8/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/Users/***/Documents/GitHub/dbt/.venv/lib/python3.8/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "/Users/***/Documents/GitHub/dbt/.venv/lib/python3.8/site-packages/preset_cli/cli/superset/sync/dbt/command.py", line 171, in dbt_core
datasets = sync_datasets(
File "/Users/***/Documents/GitHub/dbt/.venv/lib/python3.8/site-packages/preset_cli/cli/superset/sync/dbt/datasets.py", line 122, in sync_datasets
"expression": get_metric_expression(name, model_metrics),
File "/Users/***/Documents/GitHub/dbt/.venv/lib/python3.8/site-packages/preset_cli/cli/superset/sync/dbt/metrics.py", line 36, in get_metric_expression
type_ = metric["type"]
KeyError: 'type'
I'm unsure if this has any effect on the rest of the package (as I'm new to using it), but this was a quick fix for me.
(Btw, would love to contribute to this project if possible, since my team will have a large stake in it going forward. I love what you all are doing with this to make it easy to sync dbt with Preset. One of the best features I've seen in a BI tool. ๐)
In order to export resources, the CLI sends a GET request to /api/v1/$resourceType
, and then use all IDs
returned on /api/v1/$resourceType/export/?q=!($ResourceIDs)
. However, in case the instance has a lot of resources, the URL would become too long to be handled, resulting in a 414 Request-URI Too Large
, which causes below error:
File "/Github/backend-sdk/src/preset_cli/cli/superset/export.py", line 42, in export
export_resource(resource, root, client, overwrite)
File "/Github/backend-sdk/src/preset_cli/cli/superset/export.py", line 58, in export_resource
with ZipFile(buf) as bundle:
File "/.pyenv/versions/3.9.1/lib/python3.9/zipfile.py", line 1257, in __init__
self._RealGetContents()
File "/.pyenv/versions/3.9.1/lib/python3.9/zipfile.py", line 1324, in _RealGetContents
raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file
Check the URL length and paginate the IDs as needed, to perform the migration in smaller batches.
Please support Redshift.
Currently the preset-cli
allows selection of dbt models using the graph selectors and tags. However, it would be nice to also allow the use of the model file names as well.
Example would be:
preset-cli ... --select models/datamarts/my_model.sql #sync only "my_model"
preset-cli ... --select models/datamarts/my_model.sql --select models/datamarts/my_other_model.sql #sync "my_model" and "my_other_model"
We currently read injected metadata to determine exposures, requiring the user to have synced datasets from dbt models. We should be able to do that by reading the metadata in physical datasets.
Hi @betodealmeida ,
I have tried sync dbt command with database / dataset on Superset, with the following syntax:
superset-cli -u USER_NAME -p USER_PASS SUPERSET_URL sync dbt-core PATH/TO/manifest.json --project PROJECT_NAME --profiles PATH/TO/profiles.yml --exposures PATH/TO/exposures.yaml --import-db,
There are two issues I encounter currently:
dbt models
(no error messages appears), datasets are not shown in Superset.Sample log info:
[15:34:30] INFO [[15:34:30]] INFO: preset_cli.cli.superset.sync.dbt.datasets: Creating dataset model.jaffle_shop.my_first_dbt_model datasets.py:84
INFO [[15:34:30]] INFO: preset_cli.cli.superset.sync.dbt.datasets: Creating dataset model.jaffle_shop.my_second_dbt_model datasets.py:84
Still figuring out why dataset doesn't show up...
Error could come from here:
Changing connection_params.get("encrypted_extra")
to connection_params.get("masked_encrypted_extra")
should fix this issue.
On Postgres, creating new datasets from superset-cli sync dbt
quietly fails. No error message is displayed either in superset-cli
or superset
.
Datasets are just not created.
Hi. Was very excited to find this project via the conversation here #apache/superset#18098
Is this a WIP and/or being kept under wraps for now?
I know people in the dbt slack community are still using dbt-superset-lineage, which can do about 10% of what is described in this project's README.
Is backend-sdk
stable enough to use for internal projects today?
If the API credentials previously configured with the preset-cli
are no longer valid, preset-cli auth --overwrite
won't work properly, preventing users to enter new valid values.
preset-cli
to use it.preset-cli auth --overwrite
.User can update the key using this command.
Below exception is thrown:
Traceback (most recent call last):
File "/home/vavila/.pyenv/versions/preset-cli/bin/preset-cli", line 8, in <module>
sys.exit(preset_cli())
File "/home/vavila/.pyenv/versions/3.9.0/envs/preset-cli/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/home/vavila/.pyenv/versions/3.9.0/envs/preset-cli/lib/python3.9/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/home/vavila/.pyenv/versions/3.9.0/envs/preset-cli/lib/python3.9/site-packages/click/core.py", line 1654, in invoke
super().invoke(ctx)
File "/home/vavila/.pyenv/versions/3.9.0/envs/preset-cli/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/vavila/.pyenv/versions/3.9.0/envs/preset-cli/lib/python3.9/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/home/vavila/.pyenv/versions/3.9.0/envs/preset-cli/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "/home/vavila/.pyenv/versions/3.9.0/envs/preset-cli/lib/python3.9/site-packages/preset_cli/cli/main.py", line 144, in preset_cli
jwt_token = get_access_token(manager_url, api_token, api_secret)
File "/home/vavila/.pyenv/versions/3.9.0/envs/preset-cli/lib/python3.9/site-packages/preset_cli/auth/lib.py", line 29, in get_access_token
return payload["payload"]["access_token"]
KeyError: 'payload'
The CLI can be a very powerful tool to migrate from Superset (on prem) to Preset. However, the current export flow doesn't include ownership data on the ZIP file, so as a result when the files are imported on the destination, everything gets mapped to the user that's performing the operation.
Implement to the CLI the ability to automatically re-map the content on the destination, based on the user info.
When syncing models that are in different databases, I get the following error:
[[09:32:48]] ERROR: preset_cli.cli.superset.sync.dbt.datasets: Unable to create dataset datasets.py:99
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/preset_cli/cli/superset/sync/dbt/datasets.py", line 97, in sync_datasets
dataset = create_dataset(client, database, model)
File "/usr/local/lib/python3.9/site-packages/preset_cli/cli/superset/sync/dbt/datasets.py", line 63, in create_dataset
return client.create_dataset(**kwargs)
File "/usr/local/lib/python3.9/site-packages/preset_cli/api/clients/superset.py", line 534, in create_dataset
elif column["type"].lower() == "string":
AttributeError: 'NoneType' object has no attribute 'lower'
The models in the same database sync OK.
I'm wondering if the fact that we have Snowflake, and that the profiles.yml
file requires a database
key, is throwing things off?
For example, on BigQuery, the profiles.yml
file does not require a database
to be defined. But in Snowflake, you must have this defined. However, you can override this database in the dbt_project.yml
file.
It seems like the preset-cli is using the database from the profiles.yml
file and not what's defined in the dbt_project.yml
file.
Setup is as follows:
OS: MacOS 12.6
dbt version: 1.2.1
preset-cli version: 0.1.1.post1.dev74+g77b5cfd
Users are able to specify metrics on the schema.yml
file, that should be created on Superset as dataset metrics. However, if the metric is an expression
, since they don't have a model
key, they don't get associated to any dataset on Superset.
schema.yml
file. For example:metrics:
- name: total_revenue
label: SUM of all price_each
model: ref('modeled_sales')
description: "The SUM of all price_each"
type: sum
sql: price_each
timestamp: date
time_grains: [day, week, month, year]
- name: revenue_multiplied
label: "Revenue calculated differently"
description: "Let's try to use an expression"
type: expression
sql: "{{metric('total_revenue')}} * 1.25"
timestamp: date
time_grains: [day, week, month, year]
Both total_revenue
and revenue_multiplied
should be created on the modeled_sales
dataset in Superset.
Only total_revenue
is created on the modeled_sales
dataset in Superset.
Hi Beto,
First off, big kudos to you for the last three weeks work on this project. I hadn't been paying attention to the project in the last three weeks and was just catching up with all the updates. This is really quickly turning into an indispensable tool for me!
Question:
I was wondering about the choice of using the the DBT metric 'name' field as the value for the Metric Label in Superset instead of using the DBT metric's 'label' field value.
Is there a specific reason for that? At first glance through the preset-cli codebase I couldn't spot if this is something that's even happening in the preset-cli codebase, or if this might just be a bug in the Superset API?
The CLI is working as expected but the assets that are produced sometimes have diffs without making any changes. Its challenging for the developer to have a clean development + deployment workflow using git. I'm not totally sure what pieces of this are CLI related or Superset related so I can make other issues in the Superset repo if thats needed. I'm also happy to give more feedback, contribute, or test any changes out!
The ideal workflow that I'm hoping to implement:
Challenges:
The CLI allow users to export resources from a Preset Workspace/Superset tenant. However, it currently doesn't handle pagination, so only 20 items are exported from each type.
The operation should export all assets from the Workspace.
The operation only exports up to 20 items per type.
This happens because the CLI initially sends a GET request to /api/v1/{{asset_type}}/?q=(filters:!())
to get the id
s that would be exported, which by default would return only 20 results. This could be increased to 100 by sending ?q=(page_size:100)
, but in case there are more than 100 items, pagination support is required.
Regardless of the page_size
, the payload always include a count
, so if count > page_size
then a new request should be sent (?q=(page_size:100,page:2)
) and so on until count =< total_page_size
.
Users are able to create a schema.yml
file, to declare/explain/specify columns from their models. However, if a temporal column is included on the file, the Is Temporal flag would be removed from the column on Superset.
DATETIME
/TIMESTAMP
column).schema.yml
.The Is Temporal flag is kept on Superset side.
The Is Temporal flag is removed on Superset.
This is only reproduced on version 0.1.1.post1.dev146+g1905869
.
Users are able to manually flag the column as Is Temporal.
In the Preset UI, the database connections front end, the rule for the name of a database is 'Copy the name of the database you are trying to connect to.' Because of that, the database where the final models are stored in Snowflake is the default name of the database connection in Preset. For example, in Snowflake, my database that I read my models off of is called 'ANALYTICS_PROD'
In sync_database, it seems that the database_name being searched for is:
database_name = meta.pop("database_name", f"{project_name}_{target_name}")
When I run dbt-core, it's not finding the database and I am curious if it's looking for that database_name. I think it should be looking for --> dbt_project --> profile --> target --> outputs --> database
Currently, the dashboard state is not included on the exported YAML file. As a consequence, dashboards are always imported in a draft state.
Provide the ability to publish (change the status from draft to published) dashboards in bulk using the CLI.
Currently, the superset-cli
allows users to interact with a Superset local installation. However, it gets the csrf_token
from the login HTML content, and then work on getting a session/cookie to perform future requests.
Implement support to handling the authentication directly through the API:
POST
request to /api/v1/security/login
; and/or--jwt-token
parameter to be specified on the command.A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.