Comments (3)
This also needs to be updated:
from dagster.
Here's a monkey patch that seems to work:
import dagster_pandera
import dask.dataframe as dd
import pandas as pd
import pandera as pa
from dagster import DagsterType, MetadataValue
from dagster_pandera import (
_extract_name_from_pandera_schema,
_pandera_schema_to_table_schema,
_pandera_schema_to_type_check_fn,
)
dagster_pandera.VALID_DATAFRAME_CLASSES = (pd.DataFrame, dd.DataFrame)
# works around dagster_pandera's lack of support for dask https://github.com/dagster-io/dagster/issues/21017
def custom_pandera_schema_to_dagster_type(
schema: pa.DataFrameSchema | type[pa.SchemaModel] | type[pa.DataFrameModel],
) -> DagsterType:
name = _extract_name_from_pandera_schema(schema)
norm_schema = schema.to_schema() if isinstance(schema, type) and issubclass(schema, pa.SchemaModel) else schema
tschema = _pandera_schema_to_table_schema(norm_schema)
type_check_fn = _pandera_schema_to_type_check_fn(norm_schema, tschema)
return DagsterType(
type_check_fn=type_check_fn,
name=name,
description=norm_schema.description,
metadata={
"schema": MetadataValue.table_schema(tschema),
},
typing_type=pd.DataFrame | dd.DataFrame,
)
dagster_pandera.pandera_schema_to_dagster_type = custom_pandera_schema_to_dagster_type
from dagster.
My use-case also required the original pandera schema to remain on the DagsterType for use within the IO Manager. I use it to generate a pyarrow schema. So, I also added a custom class to tunnel that information through to the IO manager.
# needed to piggyback the pandera_schema through to our IO manager
# the pandera_schema is needed because we use it to convert to PyArrow
class PanderaDagsterType(DagsterType):
def __init__(
self,
pandera_schema: Schema,
type_check_fn: TypeCheckFn,
name: str,
description: str | None,
metadata: Mapping[str, RawMetadataValue] | None,
typing_type: Any = None,
):
super().__init__(
type_check_fn=type_check_fn,
name=name,
description=description,
metadata=metadata,
typing_type=typing_type,
)
self.pandera_schema = pandera_schema
from dagster.
Related Issues (20)
- Sub-subsetting does not work for dbt-cloud assets with dbt version after 1.5.0 HOT 1
- Add MaterializeIfSomeParentsUpdated AutoMaterializeRule
- in UI, populate asset event
- in UI, populate asset event "Source data" using event tags HOT 1
- surface freshness checks more prominently on asset overview HOT 1
- Favicon missing when Dagster is run with a path prefix
- Additional option for TimeWindowPartitionMapping between different granularity PartitionsDefinitions
- frequent DNS queries in case of unresolvable code location (container)
- Ingest dbt data contracts as asset checks
- context.partition_key always equals 'true' rather than the date range specified in WeeklyPartitionDefinition HOT 1
- DagsterInvalidConfigError not return in the UI
- Sqlmesh integration
- Persist filter from asset list view when clicking into lineage view
- `DirectOpExecutionContext` uses wrong attribute when using `context.asset_partition_key_range_for_output()` HOT 2
- Auto-materialize daemon caught an error: KeyError for an old non-existent asset HOT 2
- [dagster-graphql] Add method for terminating job runs to DagsterGraphQLClient HOT 4
- Dagster does not allow specifying time zone along hour_of_day and minute_of_hour in scheduled partitioned jobs HOT 2
- "dagster code-server start" and "dagster api grpc" do _not_ take the same arguments
- How to modify environment variables for Kubernetes job after switching from PostgreSQL to MySQL in Dagster HOT 1
- Please close issue. HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dagster.