Coder Social home page Coder Social logo

Comments (3)

sam-goodwin avatar sam-goodwin commented on May 31, 2024

This also needs to be updated:

VALID_DATAFRAME_CLASSES = (pd.DataFrame,)

from dagster.

sam-goodwin avatar sam-goodwin commented on May 31, 2024

Here's a monkey patch that seems to work:

import dagster_pandera
import dask.dataframe as dd
import pandas as pd
import pandera as pa
from dagster import DagsterType, MetadataValue
from dagster_pandera import (
    _extract_name_from_pandera_schema,
    _pandera_schema_to_table_schema,
    _pandera_schema_to_type_check_fn,
)

dagster_pandera.VALID_DATAFRAME_CLASSES = (pd.DataFrame, dd.DataFrame)


# works around dagster_pandera's lack of support for dask https://github.com/dagster-io/dagster/issues/21017
def custom_pandera_schema_to_dagster_type(
    schema: pa.DataFrameSchema | type[pa.SchemaModel] | type[pa.DataFrameModel],
) -> DagsterType:
    name = _extract_name_from_pandera_schema(schema)
    norm_schema = schema.to_schema() if isinstance(schema, type) and issubclass(schema, pa.SchemaModel) else schema
    tschema = _pandera_schema_to_table_schema(norm_schema)
    type_check_fn = _pandera_schema_to_type_check_fn(norm_schema, tschema)

    return DagsterType(
        type_check_fn=type_check_fn,
        name=name,
        description=norm_schema.description,
        metadata={
            "schema": MetadataValue.table_schema(tschema),
        },
        typing_type=pd.DataFrame | dd.DataFrame,
    )


dagster_pandera.pandera_schema_to_dagster_type = custom_pandera_schema_to_dagster_type

from dagster.

sam-goodwin avatar sam-goodwin commented on May 31, 2024

My use-case also required the original pandera schema to remain on the DagsterType for use within the IO Manager. I use it to generate a pyarrow schema. So, I also added a custom class to tunnel that information through to the IO manager.

# needed to piggyback the pandera_schema through to our IO manager
# the pandera_schema is needed because we use it to convert to PyArrow
class PanderaDagsterType(DagsterType):
    def __init__(
        self,
        pandera_schema: Schema,
        type_check_fn: TypeCheckFn,
        name: str,
        description: str | None,
        metadata: Mapping[str, RawMetadataValue] | None,
        typing_type: Any = None,
    ):
        super().__init__(
            type_check_fn=type_check_fn,
            name=name,
            description=description,
            metadata=metadata,
            typing_type=typing_type,
        )
        self.pandera_schema = pandera_schema

from dagster.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.