Comments (6)
Hi @parulML, the easiest approach would be to create a custom dataset (guide provided in the link). That way you can use whatever approach you are already using to load and save images.
I think there are many ways to read and interpret images for learning so would be interesting to learn what you are using now and how it could contribute to Kedro.
from kedro.
@penguinpompom if the Excel example in the tutorial is confusing, you could look at the AbstractDataSet
implementation in core.py.
I've removed all the parts you can ignore:
class AbstractDataSet(abc.ABC):
"""``AbstractDataSet`` is the base class for all data set implementations.
All data set implementations should extend this abstract class
and implement the methods marked as abstract.
Example:
::
>>> from kedro.io import AbstractDataSet
>>> import pandas as pd
>>>
>>> class MyOwnDataSet(AbstractDataSet):
>>> def __init__(self, param1, param2):
>>> self._param1 = param1
>>> self._param2 = param2
>>>
>>> def _load(self) -> pd.DataFrame:
>>> print("Dummy load: {}".format(self._param1))
>>> return pd.DataFrame()
>>>
>>> def _save(self, df: pd.DataFrame) -> None:
>>> print("Dummy save: {}".format(self._param2))
>>>
>>> def _describe(self):
>>> return dict(param1=self._param1, param2=self._param2)
"""
@abc.abstractmethod
def _load(self) -> Any:
raise NotImplementedError(
"`{}` is a subclass of AbstractDataSet and"
"it must implement the `_load` method".format(self.__class__.__name__)
)
@abc.abstractmethod
def _save(self, data: Any) -> None:
raise NotImplementedError(
"`{}` is a subclass of AbstractDataSet and"
"it must implement the `_save` method".format(self.__class__.__name__)
)
@abc.abstractmethod
def _describe(self) -> Dict[str, Any]:
raise NotImplementedError(
"`{}` is a subclass of AbstractDataSet and"
"it must implement the `_describe` method".format(self.__class__.__name__)
)
def _exists(self) -> bool:
logging.getLogger(__name__).warning(
"`exists()` not implemented for `%s`. Assuming output does not exist.",
self.__class__.__name__,
)
return False
All you need to do is make a new class that inherits from AbstractDataSet
and implements all the abstract methods with _exists
being optional. You get to specify your own arguments as Kedro leans on i/o routines available in other libraries and a data set is really like a wrapper for that functionality.
A simple example is the pickle local example: here.
Some tips as you do that:
- make sure you use the same arguments for the
__init__
function - make sure
load
usesload_args
andsave
usessave_args
If still confusing don't hesitate to reach out.
from kedro.
Hi @parulML @Pet3ris thanks for the amazing package! I have a similar question as @parulML and i am lost even after reading the guide. Appreciate if anyone could give a rough idea of how to write the function? Thanks in advance!
from kedro.
@Pet3ris Got it thanks!
from kedro.
@parulML @penguinpompom I will close this issue as answered. Feel free to re-open if you still have trouble with this answer. Thank you!
from kedro.
The issue I still see here is how to handle a dataset where individual rows should only be loaded from disk as-they-are-used. Where do I need to build in this logic? I can imagine having a json dataset and then allowing the pipeline functions to open the data from a filesystem path, but where does the actual data go in that case? How is the actual image data stored and handled by the data catalog?
from kedro.
Related Issues (20)
- Improve Developer Experience
- Improve logging experience
- %load_node truncates import statements HOT 2
- ci: Nightly build failure on `main` HOT 1
- Upgrade Pluggy depdendency version (<1.4) - Preventing upgrade of Pytest 8.1 that requires pluggy >=1.4 HOT 1
- Monthly issue metrics report
- Update CONTRIBUTING.md and other instructions with new usage of Discussions vs Issues
- Release `kedro` 0.19.4 HOT 3
- Can't build docs in starter - need to update sphinx version HOT 2
- Improve `kedro jupyter setup` with options from `ipykernel install` HOT 1
- Kedro new starter CLI : user_input.lower() HOT 4
- Deprecate (mark for future removal) `get_pkg_version` from the public API HOT 5
- Decouple starters from framework in tool selection flow
- Rethink the TSC HOT 3
- Error when executing `kedro run` HOT 7
- Toposort fails when using transcoded datasets HOT 14
- Make `logging.yml` read by default
- `ParallelRunner` raises `AttributeError: The following data sets cannot be used by multiprocessing...` on datasets not involved in `--pipeline` being run HOT 5
- ci: Nightly build failure on `main` HOT 4
- [Configuration Management] Allow config in config or config in code. HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kedro.