Coder Social home page Coder Social logo

backend-sdk's People

Contributors

betodealmeida avatar craig-rueda avatar eschutho avatar hughhhh avatar mtrentz avatar vitor-avila avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

backend-sdk's Issues

Unable to import virtual datasets that has Jinja on their SQL query

Description

Superset allows users to use Jinja directly on the SQL query used for a virtual dataset. However, if the dataset has Jinja, importing it through the preset-cli throws an error.

Steps to reproduce

  1. Create a Virtual Dataset using Jinja on its query (for example, {{ "'" + "','".join(filter_values('<ColumnName>')) + "'" }} - or {{ filter_values('ColumnName')|where_in }}).
  2. Use it on a dashboard.
  3. Export the dashboard.
  4. Try to import the dashboard using the preset-cli.

Expected behavior

The import operation works.

Actual behavior

Below error is thrown:

Traceback (most recent call last):
  File "/home/vavila/.pyenv/versions/preset-cli/bin/preset-cli", line 8, in <module>
    sys.exit(preset_cli())
  File "/home/vavila/.pyenv/versions/3.9.0/envs/preset-cli/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/home/vavila/.pyenv/versions/3.9.0/envs/preset-cli/lib/python3.9/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/home/vavila/.pyenv/versions/3.9.0/envs/preset-cli/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/vavila/.pyenv/versions/3.9.0/envs/preset-cli/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/vavila/.pyenv/versions/3.9.0/envs/preset-cli/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/vavila/.pyenv/versions/3.9.0/envs/preset-cli/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/vavila/.pyenv/versions/3.9.0/envs/preset-cli/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/home/vavila/.pyenv/versions/3.9.0/envs/preset-cli/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/home/vavila/.pyenv/versions/3.9.0/envs/preset-cli/lib/python3.9/site-packages/preset_cli/cli/superset/main.py", line 89, in new_command
    ctx.invoke(command, *args, **kwargs)
  File "/home/vavila/.pyenv/versions/3.9.0/envs/preset-cli/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/home/vavila/.pyenv/versions/3.9.0/envs/preset-cli/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/home/vavila/.pyenv/versions/3.9.0/envs/preset-cli/lib/python3.9/site-packages/preset_cli/cli/superset/sync/native/command.py", line 126, in native
    content = template.render(**env)
  File "/home/vavila/.pyenv/versions/3.9.0/envs/preset-cli/lib/python3.9/site-packages/jinja2/environment.py", line 1301, in render
    self.environment.handle_exception()
  File "/home/vavila/.pyenv/versions/3.9.0/envs/preset-cli/lib/python3.9/site-packages/jinja2/environment.py", line 936, in handle_exception
    raise rewrite_traceback_stack(source=source)
  File "<template>", line 9, in top-level template code
  File "/home/vavila/.pyenv/versions/3.9.0/envs/preset-cli/lib/python3.9/site-packages/jinja2/utils.py", line 83, in from_obj
    if hasattr(obj, "jinja_pass_arg"):
jinja2.exceptions.UndefinedError: 'filter_values' is undefined

Workaround

Perform the import through the UI.

Ability to publish dashboards in bulk via the CLI

Current scenario

Currently, the dashboard state is not included on the exported YAML file. As a consequence, dashboards are always imported in a draft state.

Suggested improvement

Provide the ability to publish (change the status from draft to published) dashboards in bulk using the CLI.

Exposure option YML data

When using the exposures option, some of the data is missing or lacks a way to control content.

As you can see in the example file below, I have the following issues:

  1. Email is missing
  2. Unable to change the following fields:
  • Type
  • Maturity
  • Description

Example exposure file:

version: 2
exposures:
- name: Number of Customers per Day [chart]
  type: analysis
  maturity: low
  url: https://*********.app.preset.io/superset/explore/?form_data=********
  description: ''
  depends_on:
  - ref('ref_contacts')
  owner:
    name: Dustin Weaver
    email: unknown
- name: Customer Dashboard [dashboard]
  type: dashboard
  maturity: low
  url: https://**********.app.preset.io/superset/dashboard/9/
  description: ''
  depends_on:
  - ref('ref_contacts')
  owner:
    name: Dustin Weaver
    email: unknown

Ability to export specific resources from a Workspace using the preset-cli

Motivation

Currently, it's possible to use the preset-cli to export all assets from a Workspace. However, in some cases users might just want to export modified assets, or specific ones.

The CLI only supports exporting all assets from the source Workspace.

Proposed solution

Implement the ability to specify which assets should be exported. A possible solution would be to specify the IDs to be exported.

dbt sync "target" parameter does not work as expected

When you run a sync command, you can specify a target parameter. In dbt, this would set the schema for the models as they are defined in the profiles.yml file.

However, with the preset-cli, this parameter merely sets the database label used in Preset. The actual connection is based on the manifest.json file, which could have been compiled with a different target.

Desired Behavior

The target parameter should set the schema in the Preset connection to match the one defined in the profiles.yml file in dbt. This could be accomplished by either reading the YML file or recompiling dbt with the given target.

Ability to delete elements in bulk via the CLI

It would be very useful to be able to delete resources in bulk via the CLI. Three use-cases:

  • Delete examples data only
  • Delete all assets (useful when working with stg/dev workspaces, or performing migration tests)
  • Delete specific assets (in bulk) using their ID

Improve user import format

username: foo
team-role:
- user
workspace_role:
- Limited Contributor
data_access_role:
- role1
- rol2
rls:
- rls1
- rls2

Error when using the children/parent selection method for dbt syncing

Running this command (Please note that I removed company sensitive elements):

preset-cli --workspaces=https://[REMOVED].app.preset.io/ superset sync dbt target/manifest.json --project=[REMOVED] --target=dev --import-db --disallow-edits --external-url-prefix=[REMOVED] --exposures=models/exposures.yml --select [REMOVED]+

Results in the following error:

[14:19:52] INFO     [[14:19:52]] INFO: preset_cli.cli.superset.sync.dbt.databases: Found an existing database, updating it                                databases.py:54
Traceback (most recent call last):
  File "/usr/local/bin/preset-cli", line 8, in <module>
    sys.exit(preset_cli())
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/preset_cli/cli/superset/main.py", line 103, in new_command
    ctx.invoke(command, *args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/preset_cli/cli/superset/sync/dbt/command.py", line 111, in dbt_core
    models = apply_select(models, select, exclude)
  File "/usr/local/lib/python3.9/site-packages/preset_cli/cli/superset/sync/dbt/lib.py", line 349, in apply_select
    *[
  File "/usr/local/lib/python3.9/site-packages/preset_cli/cli/superset/sync/dbt/lib.py", line 350, in <listcomp>
    {model["unique_id"] for model in filter_models(models, condition)}
  File "/usr/local/lib/python3.9/site-packages/preset_cli/cli/superset/sync/dbt/lib.py", line 235, in filter_models
    return filter_plus_operator(models, condition)
  File "/usr/local/lib/python3.9/site-packages/preset_cli/cli/superset/sync/dbt/lib.py", line 291, in filter_plus_operator
    for child_id in model["children"]
KeyError: 'children'

CLI Version: preset-cli 0.1.0.post1.dev5+gae8587f
dbt Version: dbt 1.2.1

Sync only exposures is not working properly

After the implementation of #130, users should be able to trigger a sync only for exposures, by using the --select parameter and filtering for a tag that isn't used on dbt.

However, this approach is not working properly - after running below command, no changes are applied to the exposures.yml file:

preset-cli --workspaces=$PRESET_WORKSPACE \
superset sync dbt-core $MANIFEST_PATH \
--project=$PROJECT_NAME --target=$TARGET_NAME --profiles=$PROFILES_PATH \
--exposures=$EXPOSURES_PATH \
--import-db \
--select tag:not-existent

Also, it would be beneficial to implement a particular command to allow this approach (something like --exposures-only) rather than having to use the --select filter and excluding everything.

Syncing dbt model that is in different database than defined in profiles.yml file

It seems this is because our staging tables are located in a different database than the default defined in the profiles.yml file for dbt.

The command I'm running is this (removed company sensitive info):

preset-cli --workspaces=https://[REMOVED].app.preset.io/ superset sync dbt-core target/manifest.json --project=[REMOVED] --target=dev --import-db --disallow-edits --external-url-prefix=[REMOVED] --exposures=models/exposures.yml --select [REMOVED] 

When ran, we get the following result:

[09:03:49] INFO     [[09:03:49]] INFO: preset_cli.cli.superset.sync.dbt.databases: Found an existing database, updating it                                databases.py:54
[09:03:52] INFO     [[09:03:52]] INFO: preset_cli.cli.superset.sync.dbt.datasets: Creating dataset model.[REMOVED].[REMOVED]                     datasets.py:50
[09:03:57] ERROR    [[09:03:57]] ERROR: preset_cli.lib: {"message":"Fatal error"}   

Desired result: Create a new connection for the staging tables to point to the correct database.

Make import-roles idempotent

Current Scenario

Currently, the --import-roles operation won't properly replace existing DARs on the destination Workspace.

Suggested improvement

Make the --import-roles operation idempotent, so that it can be used for continuous sync.

Support large imports with thousands of files

Description

The CLI is a very good tool to perform migrations (Superset -> Superset / Superset -> Preset). However, the current behavior has some issues:

  • Superset imports are atomic - if a file fails, the entire operation is rolled back
  • When trying to import thousands of assets, Superset might take longer to execute the operation and not respond in time the import request
  • It's difficult to follow the progress of the import operation

Suggested Solution

Implement a new flag on the CLI (something like --large-migration), that would make the CLI import asset per asset, instead of in bulk. For example:

  1. Start with the databases - import database per database, logging which db is being imported and its status
  2. Then move to the datasets - import dataset per dataset, logging which dataset is being imported and its status (the ZIP file must include the dataset and the database YAML files).
  3. Then move to charts - import chart per chart, logging which chart is being imported and its status (the ZIP file must include the chart, dataset and database YAML files).
  4. Lastly, move to dashboards - import dashboard per dashboard, logging which dashboard is being imported and its status (the ZIP file must include the dashboard, chart, dataset and database YAML files).

This would solve these problems:

  • If the import fails, the previous imports would be kept since they were isolated operations
  • Superset would be able to process the request and reply in time
  • Users can easily follow the progress via the logs

dbt sync does not recognize models with "alias" config as the name of the table

When using the dbt sync functionality of the preset-cli, the command will fail because it's attempting to find a table for a model that does not exist, because said model was created with an alias. This is important functionality within dbt, because it allows you to have unique models across many schemas (or datasets in BigQuery) that may have the same table name.

Traceback Error:

[14:23:56] INFO     [[14:23:56]] INFO: preset_cli.cli.superset.sync.dbt.datasets: Creating dataset model.metamap.verifications_all              datasets.py:98
           ERROR    [[14:23:56]] ERROR: preset_cli.lib: {"message":{"table_name":["Table [verifications_all] could not be found, please double check lib.py:98
                    your database connection, schema, and table name"]}}                                                                                      
                                                                                                                                                              
           ERROR    [[14:23:56]] ERROR: preset_cli.cli.superset.sync.dbt.datasets: Unable to create dataset                                    datasets.py:102
                    Traceback (most recent call last):                                                                                                        
                      File                                                                                                                                    
                    "/Users/paxonfischer/Documents/GitHub/dbt/.venv/lib/python3.8/site-packages/preset_cli/cli/superset/sync/dbt/datasets.py",                
                    line 100, in sync_datasets                                                                                                                
                        dataset = create_dataset(client, database, model)                                                                                     
                      File                                                                                                                                    
                    "/Users/paxonfischer/Documents/GitHub/dbt/.venv/lib/python3.8/site-packages/preset_cli/cli/superset/sync/dbt/datasets.py",                
                    line 66, in create_dataset                                                                                                                
                        return client.create_dataset(**kwargs)                                                                                                
                      File "/Users/paxonfischer/Documents/GitHub/dbt/.venv/lib/python3.8/site-packages/preset_cli/api/clients/superset.py",                   
                    line 518, in create_dataset                                                                                                               
                        return self.create_resource("dataset", **kwargs)                                                                                      
                      File "/Users/paxonfischer/Documents/GitHub/dbt/.venv/lib/python3.8/site-packages/preset_cli/api/clients/superset.py",                   
                    line 450, in create_resource                                                                                                              
                        validate_response(response)                                                                                                           
                      File "/Users/paxonfischer/Documents/GitHub/dbt/.venv/lib/python3.8/site-packages/preset_cli/lib.py", line 99, in                        
                    validate_response                                                                                                                         
                        raise SupersetError(errors=errors)                                                                                                    
                    preset_cli.exceptions.SupersetError   

Replication steps:

  1. Use a dbt project with a model that has an alias different from the actual name of the model (which is typically the name of the .sql file):
-- my_model.sql
{{ config(alias='model_alias') }}
SELECT * FROM table
  1. Use the preset-cli to attempt to sync dbt with a Superset/Preset instance.
preset-cli --workspaces=https://{workspace_id}.{region}.app.preset.io/ superset sync dbt-core target/manifest.json --target={target} --profiles={path_to_profiles.yml} --project={name_of_project} --import-db

Possible Solution

It seems that in create_dataset() in line 512 of api/clients/superset.py, *kwargs contains the following dictionary.

{"database": 4, "schema": "verifications", "table_name": "verifications_all"}

"table_name" should be using the alias instead of the name of the model. Unsure how far upstream this needs to be adjusted, or how it affects what the final Dataset name is decided in Preset.

Derived metrics are not created properly on Superset

Derived metrics are not properly created on Superset. For example, below schema:

metrics:
  - name: countries_count
    label: "Count of unique countries"
    model: ref('modeled_sales')
    description: "Count distinct countries"
    calculation_method: count_distinct
    expression: country
    timestamp: date
    time_grains: [day, week, month, year]
  - name: total_revenue
    label: SUM of all price_each
    model: ref('modeled_sales')
    description: "The SUM of all price_each"
    calculation_method: sum
    expression: price_each
    timestamp: date
    time_grains: [day, week, month, year]
  - name: avg_sum_per_country
    label: "AVG revenue per country"
    description: "Let's try to two expressions"
    calculation_method: derived
    expression: "{{metric('total_revenue')}} / {{metric('countries_count')}}"
    timestamp: date
    time_grains: [day, week, month, year]

Would create 3 metrics on Superset, however the nested SQL syntax wouldn't be added. Instead, the metric name is used:
image

Superset can't handle the metrics by their name, since they are not actual columns on the dataset. Using those metrics on a chart results in an error similar to this:
image

Datasets are intermittently created with `is_active` set as null

Description

During the dataset creation (via the CLI), some datasets might get the is_active attribute for each column set as null. This would cause future updates to the dataset via the CLI to fail resulting in below error:

ERROR: preset_cli.lib: {"message":{"columns":{"0":{"is_active":["Field may not be null."]},"1":{"is_active":["Field may not be null."]},"2":{"is_active":["Field may not be null."]},"3":{"is_active":["Field may not be null."]},"4":{"is_active":["Field may not be null."]},"5":{"is_active":["Field may not be null."]}}}} 

Possible fix

Assure that is_active attribute is set as true for all columns on datasets created/updated via the CLI.

Handle derived metrics related to multiple models

Current scenario

Currently, trying to sync a derived metric that relies on metrics from multiple models would fail with below error:

Metric {metricName} cannot be calculated because it depends on multiple models

This error prevents the sync from finishing successfully.

Suggested solution

Handle this scenario - some suggestions:

  • Notify the user as a warning that the metric wasn't created because of this reason, but properly finish the operation;
  • Create the metric on all target/involved datasets
  • Ability to specify the desired dataset that should receive this metric on Preset, using the meta field for the metric.

Support dbt-core 1.3

dbt-core 1.3 changes the metric spec as follows:

  • type --> calculation_method
  • sql --> expression
  • type: expression --> calculation_method: derived

superset-cli should:

  1. Either know which version of dbt-core is used and use the correct marshmallow MetricSchema version depending on version number
  2. Or dynamically infer the MetricSchema during runtime (with Schema.from_dict)

From looking at dbt-cloud API, there seems to be know way to dynamically detect the dbt-core version of a particular dbt-cloud environment.

Companion YAML templates

in this case my companion file would be named dashboards/Jaffle_Shop_8.overrides.yaml and only contain dashboard_title: Jaffle Shop {{ " (staging)" if env.get("SUPERSET_ENV") == "staging" else "" }}

Ability to sync only exposures from dbt, without syncing models

Users would like to be able to sync all exposures, but not all models from dbt to Preset. So, rather than determining which models exist in Preset, selecting those models, and syncing and showing exposures for those models, they would like to skip the first two steps and just show exposures.

The import-db flag is not functioning as expected

Current Problem:

If you run a dbt sync with the preset-cli, regardless of setting the --import-db flag, if a database is found in Preset, it will overwrite the connect. The problem is that the database user used during sync may not be the same user that will be used in Preset.

Solution:

Make it so if the --import-db setting is not present, skip overwriting the database connection.

import only dbt resources which match passed conditions

Motivation

The current implementation imports all resources in manifest.json. As dbt provides the model selection syntax, it would be great to filter imported models, sources and metrics based on passed condition. As far as I know, the model selection syntax is not applied when generating manifest.json. But, it might not be realistic to implement the same in the CLI. So, it would be ok to start with passing tags to select imporeted targets.

Proposal

The command would be used to select targets which has all passed tags.

% preset-cli --workspaces=https://abcdef12.us1a.app.preset.io/ \
> superset sync dbt /path/to/dbt/my_project/target/manifest.json \
> --project=my_project --target=dev --profile=${HOME}/.dbt/profiles.yml \
> --exposures=/path/to/dbt/my_project/models/exposures.yaml \
> --import-db \
> --external-url-prefix=http://localhost:8080/ \
> --tags tag_a --tags tag_b

Sync dbt with bigquery failed using gcp service account

When I tried to sync dbt models on BigQuery to superset instance with the following command:
superset-cli -u USER_NAME -p USER_PASS SUPERSET_URL sync dbt-core PATH/TO/manifest.json --project PROJECT_NAME --profiles PATH/TO/profiles.yml --exposures PATH/TO/exposures.yaml --import-db,
it returned error message of {"message":"Connection failed, please check your connection settings"}.

However, it works well when using every dbt commands in the same project.

Support for dbt-core 1.3: Missing fields in `MetricSchema` Class

In api/clients/dbt.py there's a class called MetricSchema that's missing two important fields in order to save metrics to Preset.

Please add the following lines to 564 and 565:

calculation_method = fields.String()
expression = fields.String()

This prevents the following error: when syncing with dbt:

% preset-cli --workspaces=https://****.****.app.preset.io/ superset sync dbt-core target/manifest.json --target=local-prod-service-account --profiles=profiles.yml --project=metamap --import-db

https://****.****.app.preset.io/
[15:50:59] INFO     [[15:50:59]] INFO: preset_cli.cli.superset.sync.dbt.databases: Found an existing database, updating databases.py:57
                    it                                                                                                                 
[15:51:00] INFO     [[15:51:00]] INFO: preset_cli.cli.superset.sync.dbt.datasets: Updating dataset                       datasets.py:96
                    model.metamap.verifications 
                                                                                                           
Traceback (most recent call last):
  File "/Users/***/Documents/GitHub/dbt/.venv/bin/preset-cli", line 8, in <module>
    sys.exit(preset_cli())
  File "/Users/***/Documents/GitHub/dbt/.venv/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/Users/***/Documents/GitHub/dbt/.venv/lib/python3.8/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/Users/***/Documents/GitHub/dbt/.venv/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/***/Documents/GitHub/dbt/.venv/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/***/Documents/GitHub/dbt/.venv/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/***/Documents/GitHub/dbt/.venv/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/***/Documents/GitHub/dbt/.venv/lib/python3.8/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/Users/***/Documents/GitHub/dbt/.venv/lib/python3.8/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/Users/***/Documents/GitHub/dbt/.venv/lib/python3.8/site-packages/preset_cli/cli/superset/main.py", line 118, in new_command
    ctx.invoke(command, *args, **kwargs)
  File "/Users/***/Documents/GitHub/dbt/.venv/lib/python3.8/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/Users/***/Documents/GitHub/dbt/.venv/lib/python3.8/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/Users/***/Documents/GitHub/dbt/.venv/lib/python3.8/site-packages/preset_cli/cli/superset/sync/dbt/command.py", line 171, in dbt_core
    datasets = sync_datasets(
  File "/Users/***/Documents/GitHub/dbt/.venv/lib/python3.8/site-packages/preset_cli/cli/superset/sync/dbt/datasets.py", line 122, in sync_datasets
    "expression": get_metric_expression(name, model_metrics),
  File "/Users/***/Documents/GitHub/dbt/.venv/lib/python3.8/site-packages/preset_cli/cli/superset/sync/dbt/metrics.py", line 36, in get_metric_expression
    type_ = metric["type"]
KeyError: 'type'

I'm unsure if this has any effect on the rest of the package (as I'm new to using it), but this was a quick fix for me.

(Btw, would love to contribute to this project if possible, since my team will have a large stake in it going forward. I love what you all are doing with this to make it easy to sync dbt with Preset. One of the best features I've seen in a BI tool. ๐Ÿ˜„)

The CLI export function doesn't work in case the source tenant has a lot of resources

Description

In order to export resources, the CLI sends a GET request to /api/v1/$resourceType, and then use all IDs returned on /api/v1/$resourceType/export/?q=!($ResourceIDs). However, in case the instance has a lot of resources, the URL would become too long to be handled, resulting in a 414 Request-URI Too Large, which causes below error:

File "/Github/backend-sdk/src/preset_cli/cli/superset/export.py", line 42, in export
    export_resource(resource, root, client, overwrite)
  File "/Github/backend-sdk/src/preset_cli/cli/superset/export.py", line 58, in export_resource
    with ZipFile(buf) as bundle:
  File "/.pyenv/versions/3.9.1/lib/python3.9/zipfile.py", line 1257, in __init__
    self._RealGetContents()
  File "/.pyenv/versions/3.9.1/lib/python3.9/zipfile.py", line 1324, in _RealGetContents
    raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file

Suggested Solution

Check the URL length and paginate the IDs as needed, to perform the migration in smaller batches.

Allow selection of dbt models based on file name

Currently the preset-cli allows selection of dbt models using the graph selectors and tags. However, it would be nice to also allow the use of the model file names as well.

Example would be:

preset-cli ... --select models/datamarts/my_model.sql                                                  #sync only "my_model"
preset-cli ... --select models/datamarts/my_model.sql  --select models/datamarts/my_other_model.sql    #sync "my_model" and "my_other_model"

Infer exposures from datasets

We currently read injected metadata to determine exposures, requiring the user to have synced datasets from dbt models. We should be able to do that by reading the metadata in physical datasets.

Sync dbt issues with superset-cli commands

Hi @betodealmeida ,

I have tried sync dbt command with database / dataset on Superset, with the following syntax:

superset-cli -u USER_NAME -p USER_PASS SUPERSET_URL sync dbt-core PATH/TO/manifest.json --project PROJECT_NAME --profiles PATH/TO/profiles.yml --exposures PATH/TO/exposures.yaml --import-db,

There are two issues I encounter currently:

  • After successfully creating datasets from dbt models (no error messages appears), datasets are not shown in Superset.

Sample log info:

[15:34:30] INFO     [[15:34:30]] INFO: preset_cli.cli.superset.sync.dbt.datasets: Creating dataset model.jaffle_shop.my_first_dbt_model                                                      datasets.py:84
           INFO     [[15:34:30]] INFO: preset_cli.cli.superset.sync.dbt.datasets: Creating dataset model.jaffle_shop.my_second_dbt_model                                                     datasets.py:84

Still figuring out why dataset doesn't show up...

  • It fails when there is already DB connection of BQ.

Error could come from here:

masked_encrypted_extra=connection_params.get("encrypted_extra"),

Changing connection_params.get("encrypted_extra") to connection_params.get("masked_encrypted_extra") should fix this issue.

Is this project active and alive?

Hi. Was very excited to find this project via the conversation here #apache/superset#18098
Is this a WIP and/or being kept under wraps for now?

I know people in the dbt slack community are still using dbt-superset-lineage, which can do about 10% of what is described in this project's README.

Is backend-sdk stable enough to use for internal projects today?

Unable to replace API token in case it's no longer valid

Description

If the API credentials previously configured with the preset-cli are no longer valid, preset-cli auth --overwrite won't work properly, preventing users to enter new valid values.

How to reproduce the bug

  1. Generate an API key.
  2. Configure the preset-cli to use it.
  3. Delete/deactivate this API key on Superset/Preset.
  4. Run preset-cli auth --overwrite.

Expected Results

User can update the key using this command.

Actual Results

Below exception is thrown:

Traceback (most recent call last):
  File "/home/vavila/.pyenv/versions/preset-cli/bin/preset-cli", line 8, in <module>
    sys.exit(preset_cli())
  File "/home/vavila/.pyenv/versions/3.9.0/envs/preset-cli/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/home/vavila/.pyenv/versions/3.9.0/envs/preset-cli/lib/python3.9/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/home/vavila/.pyenv/versions/3.9.0/envs/preset-cli/lib/python3.9/site-packages/click/core.py", line 1654, in invoke
    super().invoke(ctx)
  File "/home/vavila/.pyenv/versions/3.9.0/envs/preset-cli/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/vavila/.pyenv/versions/3.9.0/envs/preset-cli/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/home/vavila/.pyenv/versions/3.9.0/envs/preset-cli/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/home/vavila/.pyenv/versions/3.9.0/envs/preset-cli/lib/python3.9/site-packages/preset_cli/cli/main.py", line 144, in preset_cli
    jwt_token = get_access_token(manager_url, api_token, api_secret)
  File "/home/vavila/.pyenv/versions/3.9.0/envs/preset-cli/lib/python3.9/site-packages/preset_cli/auth/lib.py", line 29, in get_access_token
    return payload["payload"]["access_token"]
KeyError: 'payload'

Handle Ownership re-mapping on Migrations

Description

The CLI can be a very powerful tool to migrate from Superset (on prem) to Preset. However, the current export flow doesn't include ownership data on the ZIP file, so as a result when the files are imported on the destination, everything gets mapped to the user that's performing the operation.

Suggested Solution

Implement to the CLI the ability to automatically re-map the content on the destination, based on the user info.

Syncing to different Snowflake databases produces an error

When syncing models that are in different databases, I get the following error:

[[09:32:48]] ERROR: preset_cli.cli.superset.sync.dbt.datasets: Unable to create dataset                                                                                                           datasets.py:99
                    Traceback (most recent call last):                                                                                                                                                                              
                      File "/usr/local/lib/python3.9/site-packages/preset_cli/cli/superset/sync/dbt/datasets.py", line 97, in sync_datasets                                                                                         
                        dataset = create_dataset(client, database, model)                                                                                                                                                           
                      File "/usr/local/lib/python3.9/site-packages/preset_cli/cli/superset/sync/dbt/datasets.py", line 63, in create_dataset                                                                                        
                        return client.create_dataset(**kwargs)                                                                                                                                                                      
                      File "/usr/local/lib/python3.9/site-packages/preset_cli/api/clients/superset.py", line 534, in create_dataset                                                                                                 
                        elif column["type"].lower() == "string":                                                                                                                                                                    
                    AttributeError: 'NoneType' object has no attribute 'lower'                                  

The models in the same database sync OK.

I'm wondering if the fact that we have Snowflake, and that the profiles.yml file requires a database key, is throwing things off?

For example, on BigQuery, the profiles.yml file does not require a database to be defined. But in Snowflake, you must have this defined. However, you can override this database in the dbt_project.yml file.

It seems like the preset-cli is using the database from the profiles.yml file and not what's defined in the dbt_project.yml file.

Setup is as follows:
OS: MacOS 12.6
dbt version: 1.2.1
preset-cli version: 0.1.1.post1.dev74+g77b5cfd

Expression metrics are not imported to Superset

Description

Users are able to specify metrics on the schema.yml file, that should be created on Superset as dataset metrics. However, if the metric is an expression, since they don't have a model key, they don't get associated to any dataset on Superset.

How to reproduce the issue

  1. Create expression metrics on your schema.yml file. For example:
metrics:
  - name: total_revenue
    label: SUM of all price_each
    model: ref('modeled_sales')
    description: "The SUM of all price_each"
    type: sum
    sql: price_each
    timestamp: date
    time_grains: [day, week, month, year]
  - name: revenue_multiplied
    label: "Revenue calculated differently"
    description: "Let's try to use an expression"
    type: expression
    sql: "{{metric('total_revenue')}} * 1.25"
    timestamp: date
    time_grains: [day, week, month, year]
  1. Sync data to Superset.

Expected Results

Both total_revenue and revenue_multiplied should be created on the modeled_sales dataset in Superset.

Actual Results

Only total_revenue is created on the modeled_sales dataset in Superset.

Discussion: DBT Metrics Label vs Name

Hi Beto,

First off, big kudos to you for the last three weeks work on this project. I hadn't been paying attention to the project in the last three weeks and was just catching up with all the updates. This is really quickly turning into an indispensable tool for me!

Question:

I was wondering about the choice of using the the DBT metric 'name' field as the value for the Metric Label in Superset instead of using the DBT metric's 'label' field value.

Is there a specific reason for that? At first glance through the preset-cli codebase I couldn't spot if this is something that's even happening in the preset-cli codebase, or if this might just be a bug in the Superset API?

Exported files should only have git diffs when assets change

The CLI is working as expected but the assets that are produced sometimes have diffs without making any changes. Its challenging for the developer to have a clean development + deployment workflow using git. I'm not totally sure what pieces of this are CLI related or Superset related so I can make other issues in the Superset repo if thats needed. I'm also happy to give more feedback, contribute, or test any changes out!

The ideal workflow that I'm hoping to implement:

  1. clone git repo containing Superset assets
  2. spin up local Superset instance
  3. import Superset assets to local instance
  4. make changes in the UI
  5. export Superset assets to local directory
  6. optionally make edits to the exported assets if custom behavior is needed like using Jinja templating
  7. commit changes to git branch
  8. get code reviews from teammates. Potentially have staging instances to share with reviewers.
  9. merge changes which kicks off a CICD pipeline that pushes the updated assets to the production instance of open source Superset or Preset

Challenges:

  1. Chart assets have an ID in their file name and that ID changes sometimes.
  2. Related to number 1, when importing charts their IDs look like they might be changing. Maybe they get imported in a different order. It ends up causing all the chart assets to have diffs when nothing has changed in them. Also the dashboards that reference those charts get diffs because they have chart ids that need updating to match the new ids.
  3. Dashboard assets have a list of charts by position that doesnt seem to be consistently ordered. Git shows diffs in the charts but really its just a reordering.
  4. When renaming a chart it becomes a totally new asset. So the diff looks like a file was deleted and another was added. This isnt ideal when you update the name and also make a change to the contents because the code reviewer wont be able to view the changes because git thinks the whole file is new. It might be challenging since the file name is the chart name but ideally a chart name update would just be an update to a line in the yaml vs file deletion and addition.
  5. [Nice to have] If I cloned my repo and imported my assets into a local instance, then delete a few charts in my dashboard using the UI then I export again the assets still persist in my directory. It would be up to the developer to make sure they remove the stale assets. Having a way to detect a deletion would be ideal so the developer would have less of a chance of leaving stale assets in the repo.
  6. [Nice to have] When importing my assets to the production instance, I would want to delete any assets that no longer exists in the git repo. Having a way to detect deletions on import would be great.

cc @betodealmeida

The export process doesn't include all assets from the Workspace/tenant

Description

The CLI allow users to export resources from a Preset Workspace/Superset tenant. However, it currently doesn't handle pagination, so only 20 items are exported from each type.

How to reproduce the issue

  1. Spin up a Superset installation/Preset workspace.
  2. Run the appropriate command to export resources from it.

Expected behavior

The operation should export all assets from the Workspace.

Actual behavior

The operation only exports up to 20 items per type.

More details

This happens because the CLI initially sends a GET request to /api/v1/{{asset_type}}/?q=(filters:!()) to get the ids that would be exported, which by default would return only 20 results. This could be increased to 100 by sending ?q=(page_size:100), but in case there are more than 100 items, pagination support is required.

Regardless of the page_size, the payload always include a count, so if count > page_size then a new request should be sent (?q=(page_size:100,page:2)) and so on until count =< total_page_size.

Declaring a temporal table on the schema.yml file removes the Is Temporal flag from Superset

Description

Users are able to create a schema.yml file, to declare/explain/specify columns from their models. However, if a temporal column is included on the file, the Is Temporal flag would be removed from the column on Superset.

How to reproduce the issue

  1. Create a dbt model (make sure it has a DATETIME/TIMESTAMP column).
  2. Declare this column on the schema.yml.
  3. Run/compile your project.
  4. Sync it to Superset using the CLI.

Expected Results

The Is Temporal flag is kept on Superset side.

Actual results

The Is Temporal flag is removed on Superset.

Version

This is only reproduced on version 0.1.1.post1.dev146+g1905869.

Workaround

Users are able to manually flag the column as Is Temporal.

sync_database doesn't find existing database

In the Preset UI, the database connections front end, the rule for the name of a database is 'Copy the name of the database you are trying to connect to.' Because of that, the database where the final models are stored in Snowflake is the default name of the database connection in Preset. For example, in Snowflake, my database that I read my models off of is called 'ANALYTICS_PROD'

In sync_database, it seems that the database_name being searched for is:
database_name = meta.pop("database_name", f"{project_name}_{target_name}")

When I run dbt-core, it's not finding the database and I am curious if it's looking for that database_name. I think it should be looking for --> dbt_project --> profile --> target --> outputs --> database

Ability to publish dashboards in bulk via the CLI

Current scenario

Currently, the dashboard state is not included on the exported YAML file. As a consequence, dashboards are always imported in a draft state.

Suggested improvement

Provide the ability to publish (change the status from draft to published) dashboards in bulk using the CLI.

Handle authentication (JWT generation) when interacting with Superset (local)

Description

Currently, the superset-cli allows users to interact with a Superset local installation. However, it gets the csrf_token from the login HTML content, and then work on getting a session/cookie to perform future requests.

Suggested Improvement

Implement support to handling the authentication directly through the API:

  • Sending a POST request to /api/v1/security/login; and/or
  • Accept a --jwt-token parameter to be specified on the command.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.