preset-io / backend-sdk Goto Github PK

View Code? Open in Web Editor NEW

35.0 35.0 24.0 1.07 MB

License: Other

Makefile 0.09% Python 99.91%

backend-sdk's People

Contributors

Stargazers

Watchers

backend-sdk's Issues

Unable to import virtual datasets that has Jinja on their SQL query

Description

Superset allows users to use Jinja directly on the SQL query used for a virtual dataset. However, if the dataset has Jinja, importing it through the preset-cli throws an error.

Steps to reproduce

Create a Virtual Dataset using Jinja on its query (for example, {{ "'" + "','".join(filter_values('<ColumnName>')) + "'" }} - or {{ filter_values('ColumnName')|where_in }}).
Use it on a dashboard.
Export the dashboard.
Try to import the dashboard using the preset-cli.

Expected behavior

The import operation works.

Actual behavior

Below error is thrown:

Traceback (most recent call last):
  File "/home/vavila/.pyenv/versions/preset-cli/bin/preset-cli", line 8, in <module>
    sys.exit(preset_cli())
  File "/home/vavila/.pyenv/versions/3.9.0/envs/preset-cli/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/home/vavila/.pyenv/versions/3.9.0/envs/preset-cli/lib/python3.9/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/home/vavila/.pyenv/versions/3.9.0/envs/preset-cli/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/vavila/.pyenv/versions/3.9.0/envs/preset-cli/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/vavila/.pyenv/versions/3.9.0/envs/preset-cli/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/vavila/.pyenv/versions/3.9.0/envs/preset-cli/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/vavila/.pyenv/versions/3.9.0/envs/preset-cli/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/home/vavila/.pyenv/versions/3.9.0/envs/preset-cli/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/home/vavila/.pyenv/versions/3.9.0/envs/preset-cli/lib/python3.9/site-packages/preset_cli/cli/superset/main.py", line 89, in new_command
    ctx.invoke(command, *args, **kwargs)
  File "/home/vavila/.pyenv/versions/3.9.0/envs/preset-cli/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/home/vavila/.pyenv/versions/3.9.0/envs/preset-cli/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/home/vavila/.pyenv/versions/3.9.0/envs/preset-cli/lib/python3.9/site-packages/preset_cli/cli/superset/sync/native/command.py", line 126, in native
    content = template.render(**env)
  File "/home/vavila/.pyenv/versions/3.9.0/envs/preset-cli/lib/python3.9/site-packages/jinja2/environment.py", line 1301, in render
    self.environment.handle_exception()
  File "/home/vavila/.pyenv/versions/3.9.0/envs/preset-cli/lib/python3.9/site-packages/jinja2/environment.py", line 936, in handle_exception
    raise rewrite_traceback_stack(source=source)
  File "<template>", line 9, in top-level template code
  File "/home/vavila/.pyenv/versions/3.9.0/envs/preset-cli/lib/python3.9/site-packages/jinja2/utils.py", line 83, in from_obj
    if hasattr(obj, "jinja_pass_arg"):
jinja2.exceptions.UndefinedError: 'filter_values' is undefined

Workaround

Perform the import through the UI.

Ability to publish dashboards in bulk via the CLI

Current scenario

Currently, the dashboard state is not included on the exported YAML file. As a consequence, dashboards are always imported in a draft state.

Suggested improvement

Provide the ability to publish (change the status from draft to published) dashboards in bulk using the CLI.

Support Key Pair authentication for dbt <> Snowflake

Description

The CLI currently doesn't support key pair auth for dbt auth on Snowflake.

Suggested improvement

Support key pair auth with the CLI.

Exposure option YML data

When using the exposures option, some of the data is missing or lacks a way to control content.

As you can see in the example file below, I have the following issues:

Email is missing
Unable to change the following fields:

Type
Maturity
Description

Example exposure file:

version: 2
exposures:
- name: Number of Customers per Day [chart]
  type: analysis
  maturity: low
  url: https://*********.app.preset.io/superset/explore/?form_data=********
  description: ''
  depends_on:
  - ref('ref_contacts')
  owner:
    name: Dustin Weaver
    email: unknown
- name: Customer Dashboard [dashboard]
  type: dashboard
  maturity: low
  url: https://**********.app.preset.io/superset/dashboard/9/
  description: ''
  depends_on:
  - ref('ref_contacts')
  owner:
    name: Dustin Weaver
    email: unknown

Ability to export specific resources from a Workspace using the preset-cli

Motivation

Currently, it's possible to use the preset-cli to export all assets from a Workspace. However, in some cases users might just want to export modified assets, or specific ones.

The CLI only supports exporting all assets from the source Workspace.

Proposed solution

Implement the ability to specify which assets should be exported. A possible solution would be to specify the IDs to be exported.

Prevent deletion of resources marked with `is_managed_externally`

Currently we only prevent UI edits, we should also disable the deletion button.

dbt sync "target" parameter does not work as expected

When you run a sync command, you can specify a target parameter. In dbt, this would set the schema for the models as they are defined in the profiles.yml file.

However, with the preset-cli, this parameter merely sets the database label used in Preset. The actual connection is based on the manifest.json file, which could have been compiled with a different target.

Desired Behavior

The target parameter should set the schema in the Preset connection to match the one defined in the profiles.yml file in dbt. This could be accomplished by either reading the YML file or recompiling dbt with the given target.

Ability to delete elements in bulk via the CLI

It would be very useful to be able to delete resources in bulk via the CLI. Three use-cases:

Delete examples data only
Delete all assets (useful when working with stg/dev workspaces, or performing migration tests)
Delete specific assets (in bulk) using their ID

Improve user import format

username: foo
team-role:
- user
workspace_role:
- Limited Contributor
data_access_role:
- role1
- rol2
rls:
- rls1
- rls2

Error when using the children/parent selection method for dbt syncing

Running this command (Please note that I removed company sensitive elements):

preset-cli --workspaces=https://[REMOVED].app.preset.io/ superset sync dbt target/manifest.json --project=[REMOVED] --target=dev --import-db --disallow-edits --external-url-prefix=[REMOVED] --exposures=models/exposures.yml --select [REMOVED]+

Results in the following error:

[14:19:52] INFO     [[14:19:52]] INFO: preset_cli.cli.superset.sync.dbt.databases: Found an existing database, updating it                                databases.py:54
Traceback (most recent call last):
  File "/usr/local/bin/preset-cli", line 8, in <module>
    sys.exit(preset_cli())
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/preset_cli/cli/superset/main.py", line 103, in new_command
    ctx.invoke(command, *args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/preset_cli/cli/superset/sync/dbt/command.py", line 111, in dbt_core
    models = apply_select(models, select, exclude)
  File "/usr/local/lib/python3.9/site-packages/preset_cli/cli/superset/sync/dbt/lib.py", line 349, in apply_select
    *[
  File "/usr/local/lib/python3.9/site-packages/preset_cli/cli/superset/sync/dbt/lib.py", line 350, in <listcomp>
    {model["unique_id"] for model in filter_models(models, condition)}
  File "/usr/local/lib/python3.9/site-packages/preset_cli/cli/superset/sync/dbt/lib.py", line 235, in filter_models
    return filter_plus_operator(models, condition)
  File "/usr/local/lib/python3.9/site-packages/preset_cli/cli/superset/sync/dbt/lib.py", line 291, in filter_plus_operator
    for child_id in model["children"]
KeyError: 'children'

CLI Version: preset-cli 0.1.0.post1.dev5+gae8587f
dbt Version: dbt 1.2.1

dbt sync of redshift instance gives postgres backend for superset database

When syncing a project where the target is of type redshift the db ending up in the superset instance is still of postgres backend type. I'd imagine this could cause problems further down the line, or at least be confusing to admins.

Sync column metadata

Prompt for account/project/job on dbt sync

If a user doesn't specify the job ID we could prompt them for an account, then project, then job ID, similar to how we do for teams and workspaces.

Sync only exposures is not working properly

After the implementation of #130, users should be able to trigger a sync only for exposures, by using the --select parameter and filtering for a tag that isn't used on dbt.

However, this approach is not working properly - after running below command, no changes are applied to the exposures.yml file:

preset-cli --workspaces=$PRESET_WORKSPACE \
superset sync dbt-core $MANIFEST_PATH \
--project=$PROJECT_NAME --target=$TARGET_NAME --profiles=$PROFILES_PATH \
--exposures=$EXPOSURES_PATH \
--import-db \
--select tag:not-existent

Also, it would be beneficial to implement a particular command to allow this approach (something like --exposures-only) rather than having to use the --select filter and excluding everything.

Syncing dbt model that is in different database than defined in profiles.yml file

It seems this is because our staging tables are located in a different database than the default defined in the profiles.yml file for dbt.

The command I'm running is this (removed company sensitive info):

preset-cli --workspaces=https://[REMOVED].app.preset.io/ superset sync dbt-core target/manifest.json --project=[REMOVED] --target=dev --import-db --disallow-edits --external-url-prefix=[REMOVED] --exposures=models/exposures.yml --select [REMOVED]

When ran, we get the following result:

[09:03:49] INFO     [[09:03:49]] INFO: preset_cli.cli.superset.sync.dbt.databases: Found an existing database, updating it                                databases.py:54
[09:03:52] INFO     [[09:03:52]] INFO: preset_cli.cli.superset.sync.dbt.datasets: Creating dataset model.[REMOVED].[REMOVED]                     datasets.py:50
[09:03:57] ERROR    [[09:03:57]] ERROR: preset_cli.lib: {"message":"Fatal error"}

Desired result: Create a new connection for the staging tables to point to the correct database.

Make import-roles idempotent

Current Scenario

Currently, the --import-roles operation won't properly replace existing DARs on the destination Workspace.

Suggested improvement

Make the --import-roles operation idempotent, so that it can be used for continuous sync.

Support large imports with thousands of files

Description

The CLI is a very good tool to perform migrations (Superset -> Superset / Superset -> Preset). However, the current behavior has some issues:

Superset imports are atomic - if a file fails, the entire operation is rolled back
When trying to import thousands of assets, Superset might take longer to execute the operation and not respond in time the import request
It's difficult to follow the progress of the import operation

dbt sync does not recognize models with "alias" config as the name of the table

When using the dbt sync functionality of the preset-cli, the command will fail because it's attempting to find a table for a model that does not exist, because said model was created with an alias. This is important functionality within dbt, because it allows you to have unique models across many schemas (or datasets in BigQuery) that may have the same table name.

Traceback Error:

[14:23:56] INFO     [[14:23:56]] INFO: preset_cli.cli.superset.sync.dbt.datasets: Creating dataset model.metamap.verifications_all              datasets.py:98
           ERROR    [[14:23:56]] ERROR: preset_cli.lib: {"message":{"table_name":["Table [verifications_all] could not be found, please double check lib.py:98
                    your database connection, schema, and table name"]}}                                                                                      
                                                                                                                                                              
           ERROR    [[14:23:56]] ERROR: preset_cli.cli.superset.sync.dbt.datasets: Unable to create dataset                                    datasets.py:102
                    Traceback (most recent call last):                                                                                                        
                      File                                                                                                                                    
                    "/Users/paxonfischer/Documents/GitHub/dbt/.venv/lib/python3.8/site-packages/preset_cli/cli/superset/sync/dbt/datasets.py",                
                    line 100, in sync_datasets                                                                                                                
                        dataset = create_dataset(client, database, model)                                                                                     
                      File                                                                                                                                    
                    "/Users/paxonfischer/Documents/GitHub/dbt/.venv/lib/python3.8/site-packages/preset_cli/cli/superset/sync/dbt/datasets.py",                
                    line 66, in create_dataset                                                                                                                
                        return client.create_dataset(**kwargs)                                                                                                
                      File "/Users/paxonfischer/Documents/GitHub/dbt/.venv/lib/python3.8/site-packages/preset_cli/api/clients/superset.py",                   
                    line 518, in create_dataset                                                                                                               
                        return self.create_resource("dataset", **kwargs)                                                                                      
                      File "/Users/paxonfischer/Documents/GitHub/dbt/.venv/lib/python3.8/site-packages/preset_cli/api/clients/superset.py",                   
                    line 450, in create_resource                                                                                                              
                        validate_response(response)                                                                                                           
                      File "/Users/paxonfischer/Documents/GitHub/dbt/.venv/lib/python3.8/site-packages/preset_cli/lib.py", line 99, in                        
                    validate_response                                                                                                                         
                        raise SupersetError(errors=errors)                                                                                                    
                    preset_cli.exceptions.SupersetError

Replication steps:

Use a dbt project with a model that has an alias different from the actual name of the model (which is typically the name of the .sql file):

-- my_model.sql
{{ config(alias='model_alias') }}
SELECT * FROM table

Use the preset-cli to attempt to sync dbt with a Superset/Preset instance.

preset-cli --workspaces=https://{workspace_id}.{region}.app.preset.io/ superset sync dbt-core target/manifest.json --target={target} --profiles={path_to_profiles.yml} --project={name_of_project} --import-db

Possible Solution

It seems that in create_dataset() in line 512 of api/clients/superset.py, *kwargs contains the following dictionary.

{"database": 4, "schema": "verifications", "table_name": "verifications_all"}

"table_name" should be using the alias instead of the name of the model. Unsure how far upstream this needs to be adjusted, or how it affects what the final Dataset name is decided in Preset.

Derived metrics are not created properly on Superset

Derived metrics are not properly created on Superset. For example, below schema:

metrics:
  - name: countries_count
    label: "Count of unique countries"
    model: ref('modeled_sales')
    description: "Count distinct countries"
    calculation_method: count_distinct
    expression: country
    timestamp: date
    time_grains: [day, week, month, year]
  - name: total_revenue
    label: SUM of all price_each
    model: ref('modeled_sales')
    description: "The SUM of all price_each"
    calculation_method: sum
    expression: price_each
    timestamp: date
    time_grains: [day, week, month, year]
  - name: avg_sum_per_country
    label: "AVG revenue per country"
    description: "Let's try to two expressions"
    calculation_method: derived
    expression: "{{metric('total_revenue')}} / {{metric('countries_count')}}"
    timestamp: date
    time_grains: [day, week, month, year]

Would create 3 metrics on Superset, however the nested SQL syntax wouldn't be added. Instead, the metric name is used:

Superset can't handle the metrics by their name, since they are not actual columns on the dataset. Using those metrics on a chart results in an error similar to this:

Datasets are intermittently created with `is_active` set as null

Description

During the dataset creation (via the CLI), some datasets might get the is_active attribute for each column set as null. This would cause future updates to the dataset via the CLI to fail resulting in below error:

ERROR: preset_cli.lib: {"message":{"columns":{"0":{"is_active":["Field may not be null."]},"1":{"is_active":["Field may not be null."]},"2":{"is_active":["Field may not be null."]},"3":{"is_active":["Field may not be null."]},"4":{"is_active":["Field may not be null."]},"5":{"is_active":["Field may not be null."]}}}}

Possible fix

Assure that is_active attribute is set as true for all columns on datasets created/updated via the CLI.

Refresh JWT token when needed

In longer sessions, the JWT can expire and the other requests will fail.

Handle derived metrics related to multiple models

Current scenario

Currently, trying to sync a derived metric that relies on metrics from multiple models would fail with below error:

Metric {metricName} cannot be calculated because it depends on multiple models

This error prevents the sync from finishing successfully.

Support dbt-core 1.3

dbt-core 1.3 changes the metric spec as follows:

type --> calculation_method
sql --> expression
type: expression --> calculation_method: derived

superset-cli should:

Either know which version of dbt-core is used and use the correct marshmallow MetricSchema version depending on version number
Or dynamically infer the MetricSchema during runtime (with Schema.from_dict)

From looking at dbt-cloud API, there seems to be know way to dynamically detect the dbt-core version of a particular dbt-cloud environment.

Companion YAML templates

in this case my companion file would be named dashboards/Jaffle_Shop_8.overrides.yaml and only contain dashboard_title: Jaffle Shop {{ " (staging)" if env.get("SUPERSET_ENV") == "staging" else "" }}

Ability to sync only exposures from dbt, without syncing models

Users would like to be able to sync all exposures, but not all models from dbt to Preset. So, rather than determining which models exist in Preset, selecting those models, and syncing and showing exposures for those models, they would like to skip the first two steps and just show exposures.

The import-db flag is not functioning as expected

Current Problem:

If you run a dbt sync with the preset-cli, regardless of setting the --import-db flag, if a database is found in Preset, it will overwrite the connect. The problem is that the database user used during sync may not be the same user that will be used in Preset.

Solution:

Make it so if the --import-db setting is not present, skip overwriting the database connection.

`import_role` only works with Preset

Let's make it work with Superset instances as well.

import only dbt resources which match passed conditions

Motivation

The current implementation imports all resources in manifest.json. As dbt provides the model selection syntax, it would be great to filter imported models, sources and metrics based on passed condition. As far as I know, the model selection syntax is not applied when generating manifest.json. But, it might not be realistic to implement the same in the CLI. So, it would be ok to start with passing tags to select imporeted targets.

Proposal

The command would be used to select targets which has all passed tags.

% preset-cli --workspaces=https://abcdef12.us1a.app.preset.io/ \
> superset sync dbt /path/to/dbt/my_project/target/manifest.json \
> --project=my_project --target=dev --profile=${HOME}/.dbt/profiles.yml \
> --exposures=/path/to/dbt/my_project/models/exposures.yaml \
> --import-db \
> --external-url-prefix=http://localhost:8080/ \
> --tags tag_a --tags tag_b

Sync dbt with bigquery failed using gcp service account

When I tried to sync dbt models on BigQuery to superset instance with the following command:
superset-cli -u USER_NAME -p USER_PASS SUPERSET_URL sync dbt-core PATH/TO/manifest.json --project PROJECT_NAME --profiles PATH/TO/profiles.yml --exposures PATH/TO/exposures.yaml --import-db,
it returned error message of {"message":"Connection failed, please check your connection settings"}.

However, it works well when using every dbt commands in the same project.

Support for dbt-core 1.3: Missing fields in `MetricSchema` Class

In api/clients/dbt.py there's a class called MetricSchema that's missing two important fields in order to save metrics to Preset.

Please add the following lines to 564 and 565:

calculation_method = fields.String()
expression = fields.String()

This prevents the following error: when syncing with dbt:

% preset-cli --workspaces=https://****.****.app.preset.io/ superset sync dbt-core target/manifest.json --target=local-prod-service-account --profiles=profiles.yml --project=metamap --import-db

https://****.****.app.preset.io/
[15:50:59] INFO     [[15:50:59]] INFO: preset_cli.cli.superset.sync.dbt.databases: Found an existing database, updating databases.py:57
                    it                                                                                                                 
[15:51:00] INFO     [[15:51:00]] INFO: preset_cli.cli.superset.sync.dbt.datasets: Updating dataset                       datasets.py:96
                    model.metamap.verifications 
                                                                                                           
Traceback (most recent call last):
  File "/Users/***/Documents/GitHub/dbt/.venv/bin/preset-cli", line 8, in <module>
    sys.exit(preset_cli())
  File "/Users/***/Documents/GitHub/dbt/.venv/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/Users/***/Documents/GitHub/dbt/.venv/lib/python3.8/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/Users/***/Documents/GitHub/dbt/.venv/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/***/Documents/GitHub/dbt/.venv/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/***/Documents/GitHub/dbt/.venv/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/***/Documents/GitHub/dbt/.venv/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/***/Documents/GitHub/dbt/.venv/lib/python3.8/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/Users/***/Documents/GitHub/dbt/.venv/lib/python3.8/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/Users/***/Documents/GitHub/dbt/.venv/lib/python3.8/site-packages/preset_cli/cli/superset/main.py", line 118, in new_command
    ctx.invoke(command, *args, **kwargs)
  File "/Users/***/Documents/GitHub/dbt/.venv/lib/python3.8/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/Users/***/Documents/GitHub/dbt/.venv/lib/python3.8/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/Users/***/Documents/GitHub/dbt/.venv/lib/python3.8/site-packages/preset_cli/cli/superset/sync/dbt/command.py", line 171, in dbt_core
    datasets = sync_datasets(
  File "/Users/***/Documents/GitHub/dbt/.venv/lib/python3.8/site-packages/preset_cli/cli/superset/sync/dbt/datasets.py", line 122, in sync_datasets
    "expression": get_metric_expression(name, model_metrics),
  File "/Users/***/Documents/GitHub/dbt/.venv/lib/python3.8/site-packages/preset_cli/cli/superset/sync/dbt/metrics.py", line 36, in get_metric_expression
    type_ = metric["type"]
KeyError: 'type'

I'm unsure if this has any effect on the rest of the package (as I'm new to using it), but this was a quick fix for me.

(Btw, would love to contribute to this project if possible, since my team will have a large stake in it going forward. I love what you all are doing with this to make it easy to sync dbt with Preset. One of the best features I've seen in a BI tool. 😄)

The CLI export function doesn't work in case the source tenant has a lot of resources

Description

In order to export resources, the CLI sends a GET request to /api/v1/$resourceType, and then use all IDs returned on /api/v1/$resourceType/export/?q=!($ResourceIDs). However, in case the instance has a lot of resources, the URL would become too long to be handled, resulting in a 414 Request-URI Too Large, which causes below error:

File "/Github/backend-sdk/src/preset_cli/cli/superset/export.py", line 42, in export
    export_resource(resource, root, client, overwrite)
  File "/Github/backend-sdk/src/preset_cli/cli/superset/export.py", line 58, in export_resource
    with ZipFile(buf) as bundle:
  File "/.pyenv/versions/3.9.1/lib/python3.9/zipfile.py", line 1257, in __init__
    self._RealGetContents()
  File "/.pyenv/versions/3.9.1/lib/python3.9/zipfile.py", line 1324, in _RealGetContents
    raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file

Description

The CLI can be a very powerful tool to migrate from Superset (on prem) to Preset. However, the current export flow doesn't include ownership data on the ZIP file, so as a result when the files are imported on the destination, everything gets mapped to the user that's performing the operation.

Suggested Solution

Implement to the CLI the ability to automatically re-map the content on the destination, based on the user info.

Syncing to different Snowflake databases produces an error

When syncing models that are in different databases, I get the following error:

[[09:32:48]] ERROR: preset_cli.cli.superset.sync.dbt.datasets: Unable to create dataset                                                                                                           datasets.py:99
                    Traceback (most recent call last):                                                                                                                                                                              
                      File "/usr/local/lib/python3.9/site-packages/preset_cli/cli/superset/sync/dbt/datasets.py", line 97, in sync_datasets                                                                                         
                        dataset = create_dataset(client, database, model)                                                                                                                                                           
                      File "/usr/local/lib/python3.9/site-packages/preset_cli/cli/superset/sync/dbt/datasets.py", line 63, in create_dataset                                                                                        
                        return client.create_dataset(**kwargs)                                                                                                                                                                      
                      File "/usr/local/lib/python3.9/site-packages/preset_cli/api/clients/superset.py", line 534, in create_dataset                                                                                                 
                        elif column["type"].lower() == "string":                                                                                                                                                                    
                    AttributeError: 'NoneType' object has no attribute 'lower'

The models in the same database sync OK.

I'm wondering if the fact that we have Snowflake, and that the profiles.yml file requires a database key, is throwing things off?

For example, on BigQuery, the profiles.yml file does not require a database to be defined. But in Snowflake, you must have this defined. However, you can override this database in the dbt_project.yml file.

It seems like the preset-cli is using the database from the profiles.yml file and not what's defined in the dbt_project.yml file.

Setup is as follows:
OS: MacOS 12.6
dbt version: 1.2.1
preset-cli version: 0.1.1.post1.dev74+g77b5cfd

Expression metrics are not imported to Superset

Description

Users are able to specify metrics on the schema.yml file, that should be created on Superset as dataset metrics. However, if the metric is an expression, since they don't have a model key, they don't get associated to any dataset on Superset.

How to reproduce the issue

Create expression metrics on your schema.yml file. For example:

metrics:
  - name: total_revenue
    label: SUM of all price_each
    model: ref('modeled_sales')
    description: "The SUM of all price_each"
    type: sum
    sql: price_each
    timestamp: date
    time_grains: [day, week, month, year]
  - name: revenue_multiplied
    label: "Revenue calculated differently"
    description: "Let's try to use an expression"
    type: expression
    sql: "{{metric('total_revenue')}} * 1.25"
    timestamp: date
    time_grains: [day, week, month, year]

Sync data to Superset.

Expected Results

Both total_revenue and revenue_multiplied should be created on the modeled_sales dataset in Superset.

Actual Results

Only total_revenue is created on the modeled_sales dataset in Superset.

Discussion: DBT Metrics Label vs Name

Hi Beto,

First off, big kudos to you for the last three weeks work on this project. I hadn't been paying attention to the project in the last three weeks and was just catching up with all the updates. This is really quickly turning into an indispensable tool for me!

Question:

I was wondering about the choice of using the the DBT metric 'name' field as the value for the Metric Label in Superset instead of using the DBT metric's 'label' field value.

Is there a specific reason for that? At first glance through the preset-cli codebase I couldn't spot if this is something that's even happening in the preset-cli codebase, or if this might just be a bug in the Superset API?

Exported files should only have git diffs when assets change

The CLI is working as expected but the assets that are produced sometimes have diffs without making any changes. Its challenging for the developer to have a clean development + deployment workflow using git. I'm not totally sure what pieces of this are CLI related or Superset related so I can make other issues in the Superset repo if thats needed. I'm also happy to give more feedback, contribute, or test any changes out!

The ideal workflow that I'm hoping to implement:

clone git repo containing Superset assets
spin up local Superset instance
import Superset assets to local instance
make changes in the UI
export Superset assets to local directory
optionally make edits to the exported assets if custom behavior is needed like using Jinja templating
commit changes to git branch
get code reviews from teammates. Potentially have staging instances to share with reviewers.
merge changes which kicks off a CICD pipeline that pushes the updated assets to the production instance of open source Superset or Preset

Challenges:

Chart assets have an ID in their file name and that ID changes sometimes.
Related to number 1, when importing charts their IDs look like they might be changing. Maybe they get imported in a different order. It ends up causing all the chart assets to have diffs when nothing has changed in them. Also the dashboards that reference those charts get diffs because they have chart ids that need updating to match the new ids.
Dashboard assets have a list of charts by position that doesnt seem to be consistently ordered. Git shows diffs in the charts but really its just a reordering.
When renaming a chart it becomes a totally new asset. So the diff looks like a file was deleted and another was added. This isnt ideal when you update the name and also make a change to the contents because the code reviewer wont be able to view the changes because git thinks the whole file is new. It might be challenging since the file name is the chart name but ideally a chart name update would just be an update to a line in the yaml vs file deletion and addition.
[Nice to have] If I cloned my repo and imported my assets into a local instance, then delete a few charts in my dashboard using the UI then I export again the assets still persist in my directory. It would be up to the developer to make sure they remove the stale assets. Having a way to detect a deletion would be ideal so the developer would have less of a chance of leaving stale assets in the repo.
[Nice to have] When importing my assets to the production instance, I would want to delete any assets that no longer exists in the git repo. Having a way to detect deletions on import would be great.

cc @betodealmeida

Make `import-users` idempotent

The export process doesn't include all assets from the Workspace/tenant

Description

The CLI allow users to export resources from a Preset Workspace/Superset tenant. However, it currently doesn't handle pagination, so only 20 items are exported from each type.

How to reproduce the issue

Spin up a Superset installation/Preset workspace.
Run the appropriate command to export resources from it.

Expected behavior

The operation should export all assets from the Workspace.

Actual behavior

The operation only exports up to 20 items per type.

More details

This happens because the CLI initially sends a GET request to /api/v1/{{asset_type}}/?q=(filters:!()) to get the ids that would be exported, which by default would return only 20 results. This could be increased to 100 by sending ?q=(page_size:100), but in case there are more than 100 items, pagination support is required.

Regardless of the page_size, the payload always include a count, so if count > page_size then a new request should be sent (?q=(page_size:100,page:2)) and so on until count =< total_page_size.

Declaring a temporal table on the schema.yml file removes the Is Temporal flag from Superset

Description

Users are able to create a schema.yml file, to declare/explain/specify columns from their models. However, if a temporal column is included on the file, the Is Temporal flag would be removed from the column on Superset.

How to reproduce the issue

Create a dbt model (make sure it has a DATETIME/TIMESTAMP column).
Declare this column on the schema.yml.
Run/compile your project.
Sync it to Superset using the CLI.

Expected Results

The Is Temporal flag is kept on Superset side.

Actual results

The Is Temporal flag is removed on Superset.

Version

This is only reproduced on version 0.1.1.post1.dev146+g1905869.

Workaround

Users are able to manually flag the column as Is Temporal.

sync_database doesn't find existing database

In the Preset UI, the database connections front end, the rule for the name of a database is 'Copy the name of the database you are trying to connect to.' Because of that, the database where the final models are stored in Snowflake is the default name of the database connection in Preset. For example, in Snowflake, my database that I read my models off of is called 'ANALYTICS_PROD'

In sync_database, it seems that the database_name being searched for is:
database_name = meta.pop("database_name", f"{project_name}_{target_name}")

When I run dbt-core, it's not finding the database and I am curious if it's looking for that database_name. I think it should be looking for --> dbt_project --> profile --> target --> outputs --> database

Ability to publish dashboards in bulk via the CLI

Current scenario

Currently, the dashboard state is not included on the exported YAML file. As a consequence, dashboards are always imported in a draft state.

Suggested improvement

Provide the ability to publish (change the status from draft to published) dashboards in bulk using the CLI.

Handle authentication (JWT generation) when interacting with Superset (local)

Description

Currently, the superset-cli allows users to interact with a Superset local installation. However, it gets the csrf_token from the login HTML content, and then work on getting a session/cookie to perform future requests.

Suggested Improvement

Implement support to handling the authentication directly through the API:

Sending a POST request to /api/v1/security/login; and/or
Accept a --jwt-token parameter to be specified on the command.

preset-io / backend-sdk Goto Github PK

backend-sdk's People

Contributors

Stargazers

Watchers

Forkers

backend-sdk's Issues

Description

Steps to reproduce

Expected behavior

Actual behavior

Workaround

Current scenario

Suggested improvement

Description

Suggested improvement

Motivation

Proposed solution

Desired Behavior

Current Scenario

Suggested improvement

Description

Suggested Solution

Replication steps:

Possible Solution

Description

Possible fix

Current scenario

Suggested solution

Current Problem:

Solution:

Motivation

Proposal

Description

Suggested Solution

Description

How to reproduce the bug

Expected Results

Actual Results

Description

Suggested Solution

Description

How to reproduce the issue

Expected Results

Actual Results

Description

How to reproduce the issue

Expected behavior

Actual behavior

More details

Description

How to reproduce the issue

Expected Results

Actual results

Version

Workaround

Current scenario

Suggested improvement

Description

Suggested Improvement

Recommend Projects

Recommend Topics

Recommend Org