Coder Social home page Coder Social logo

connectors's Introduction

Metaphor Connectors

Codecov CodeQL PyPI Version Python version 3.8+ PyPI Downloads Docker Pulls License

This repository contains a collection of Python-based "connectors" that extract metadata from various sources to ingest into the Metaphor platform.

Installation

This package requires Python 3.8+ installed. You can verify the version on your system by running the following command,

python -V  # or python3 on some systems

Once verified, you can install the package using pip,

pip install "metaphor-connectors[all]"  # or pip3 on some systems

This will install all the connectors and required dependencies. You can also choose to install only a subset of the dependencies by installing the specific extra, e.g.

pip install "metaphor-connectors[snowflake]"

Similarly, you can also install the package using requirements.txt or pyproject.toml.

Docker

We automatically push a docker image to Docker Hub as part of the CI/CD. See this page for more details.

GitHub Action

You can also run the connectors in your CI/CD pipeline using the Metaphor Connectors GitHub Action.

Connectors

Each connector is placed under its own directory under metaphor and extends the metaphor.common.BaseExtractor class.

Connector Name Metadata
azure_data_factory Lineage, Pipeline
bigquery Schema, description, statistics, queries
bigquery.lineage Lineage
bigquery.profile Data profile
confluence Document embeddings
custom.data_quality Data quality
custom.governance Ownership, tags, description
custom.lineage Lineage
custom.metadata Custom metadata
custom.query_attributions Query attritutions
datahub Description, tag, ownership
dbt dbt model, test, lineage
dbt.cloud dbt model, test, lineage
fivetran Lineage, Pipeline
glue Schema, description
informatica Lineage, Pipeline
looker Looker view, explore, dashboard, lineage
kafka Schema, description
metabase Dashboard, lineage
monte_carlo Data monitor
mssql Schema
mysql Schema, description
oracle Schema, description, queries
notion Document embeddings
postgresql Schema, description, statistics
postgresql.profile Data profile
postgresql.usage Usage
power_bi Dashboard, lineage
redshift Schema, description, statistics, queries
redshift.profile Data profile
sharepoint Document embeddings
snowflake Schema, description, statistics, queries
snowflake.lineage Lineage
snowflake.profile Data profile
static_web Document embeddings
synapse Schema, queries
tableau Dashboard, lineage
thought_spot Dashboard, lineage
trino Schema, description, queries
unity_catalog Schema, description
unity_catalog.profile Data profile, statistics

Development

See Development Environment for more instructions on how to set up your local development environment.

Custom Connectors

See Adding a Custom Connector for instructions and a full example of creating your custom connectors.

connectors's People

Contributors

mars-lan avatar alyiwang avatar elic-eon avatar usefulalgorithm avatar dependabot[bot] avatar rishimo avatar

Stargazers

 avatar Jerry Huang avatar Georvic Tur avatar Woodrow Keifenheim avatar  avatar  avatar Ye Liu avatar  avatar

Watchers

Ye Liu avatar  avatar Seyi Adebajo avatar Pardhu Gunnam avatar  avatar w1lliams avatar Prithvi Nuthanakalva avatar Ivan Perepelitca avatar Jerry Huang avatar

connectors's Issues

Tableau Dataclass Use Incompatible with Python 3.11

Running the Tableau crawler in a Python 3.11 environment & metaphor-connectors==0.13.158 yields the following error on startup:

Traceback (most recent call last):
  File "/usr/local/bin/metaphor", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/usr/local/lib/python3.11/site-packages/metaphor/__main__.py", line 30, in main
    package_main = getattr(import_module(f"metaphor.{args.name}"), "main", None)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1204, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1176, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1147, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 940, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/usr/local/lib/python3.11/site-packages/metaphor/tableau/__init__.py", line 2, in <module>
    from metaphor.tableau.extractor import TableauExtractor
  File "/usr/local/lib/python3.11/site-packages/metaphor/tableau/extractor.py", line 49, in <module>
    from metaphor.tableau.config import PERSONAL_SPACE_PROJECT_NAME, TableauRunConfig
  File "/usr/local/lib/python3.11/site-packages/metaphor/tableau/config.py", line 43, in <module>
    @dataclass(config=ConnectorConfig)
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pydantic/dataclasses.py", line 225, in create_dataclass
    cls = dataclasses.dataclass(  # type: ignore[call-overload]
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dataclasses.py", line 1230, in dataclass
    return wrap(cls)
           ^^^^^^^^^
  File "/usr/local/lib/python3.11/dataclasses.py", line 1220, in wrap
    return _process_class(cls, init, repr, eq, order, unsafe_hash,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dataclasses.py", line 958, in _process_class
    cls_fields.append(_get_field(cls, name, type, kw_only))
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dataclasses.py", line 815, in _get_field
    raise ValueError(f'mutable default {type(f.default)} for field '
ValueError: mutable default <class 'metaphor.tableau.config.TableauProjectConfig'> for field projects_filter is not allowed: use default_factory

Switching to Python 3.10 works without error. I have deployed using Python 3.10 since that currently works, but figured I would post this issue as the pyproject.toml file targets support for Python versions < 3.12.

I did not attempt running any newer versions than 0.13.158 since that is the most recent with any changes to the Tableau crawler specifically.

Breaking changes in 0.6

  • Rename PostgresqlExtractor & PostgresqlRunConfig to PostgreSQLExtractor & PostgreSQLRunConfig
  • Replace single db filtering with multiple for Snowflake connector
  • Drop dataset/virtual view dual writing in Looker connector
  • Drop dual writing in dbt connector

Breaking changes in 0.15

  • Python 3.9+
  • Rename all snowflake_account config to account
  • Drop max_results config in unity_catalog
  • Rename unity_catalog connector to databricks
  • Drop where_clauses, case_clauses and when_not_matched_insert_clauses fields from RedactPIILiteralsConfig

Breaking changes in 0.10

  • Use pydantic to verify configs
  • Drop support for JSON-based config files
  • Output file path only specify directory
  • Use new sampling config for all profilers
  • Use new filter config for all connectors

Upgrade dbt integration to support 1.8

Hi folks!

A customer of dbt Cloud & Metaphor has tried our new "keep on latest version" feature; which sounds like a nightmare for ya'll, but actually we are going to do a much better job in the future of not creating breaking changes between point releases

https://docs.getdbt.com/docs/dbt-versions/upgrade-dbt-version-in-cloud#keep-on-latest-version

It does mean that you'll need to look into supporting our new manifest:
https://schemas.getdbt.com/dbt/manifest/v12

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.