Comments (4)
Hi @noklam, I think there are several sub tasks to this ticket but not all with the same priority.
- I think (to be verified) that if we create a custom resolver which can parse a
DataFrameModel
it will be enough for the hook to work "as is" with the exact same syntax. Something like :
my_data:
type: ...
filepath: ...
metadata:
pandera:
schema: ${pa.python: my_kedro_package.schemas.my_data.MyDataSchema} # we should "just" create the resovler which will import and instantiate the class
Does the design look ok for you? Do you have time to work on this one?
- We can add a CLI to infer this schema and generate a
my_kedro_package.schemas.my_data.py
file (pseudo code below):
# my_kedro_package.schemas.my_data.py
from pandera import DataframeModel
from pandera.io import Field
class MyDataSchema(DataframeModel):
var1: str = Field()
var2: <var_typ>e= Field()
...
- This second step is optional and low priority.
- I don't think we should create default test. My goal is to have a helper to generate the file and eventually loop over the variables (it is cumbersome to create dozens of entries if the dataset has many variables).
- This may be a template with a jinja loop over variables?
from kedro-pandera.
How does 2. different from the current infer CLI? I'll work on 1.
from kedro-pandera.
The current CLI has a flag --python
for this but it is not implemented. The little difference is that the infer
CLI for yaml use a built-in pandera function which creates basic tests and the file, but there is no such function for python so we should create it on our own. that is why I want to keep it really simple, I want to avoid creating boilerplate code, not really infering advanced tests.
from kedro-pandera.
The current CLI has a flag --python for this but it is not implemented
I am not sure what do you mean, I thought this function exist already? isn't it using theschema.to_script()
method?
Pandera natively support convert DataFrameModel -> DataFrameSchema, but not the other way round.
from kedro-pandera.
Related Issues (20)
- Add data validation to terminal outputs HOT 1
- Enable lazy validation at a dataset level
- Add kedro catalog validate command
- Run the pipeline with fake pandera-generated data
- Add a preview of the schema in kedro-viz
- Generate HTML documentation from schema HOT 3
- Generate metadata catalog entry from annotated functions HOT 3
- Temporarily deactivate runtime validation HOT 3
- Enable multiple schema validation
- Upgrade requirements with a valid kedro version HOT 1
- `kedro pandera coverage` HOT 2
- Release v0.1.0 HOT 7
- Remove leftover print statement in `resolve_dataframe_model` HOT 1
- Add kedro~=0.19.0 compatability HOT 2
- fix for kedro >= 0.19.0 HOT 3
- Allow converting the Dataframe according to the defined schema
- AttributeError due to missing `metadata` parameter in datasets
- Validating output data fails on a MemoryDataset HOT 1
- Raising errors for pyspark dataframe validation HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kedro-pandera.