Coder Social home page Coder Social logo

Comments (6)

Pet3ris avatar Pet3ris commented on May 18, 2024 2

Hi @parulML, the easiest approach would be to create a custom dataset (guide provided in the link). That way you can use whatever approach you are already using to load and save images.

I think there are many ways to read and interpret images for learning so would be interesting to learn what you are using now and how it could contribute to Kedro.

from kedro.

Pet3ris avatar Pet3ris commented on May 18, 2024 1

@penguinpompom if the Excel example in the tutorial is confusing, you could look at the AbstractDataSet implementation in core.py.

I've removed all the parts you can ignore:

class AbstractDataSet(abc.ABC):
    """``AbstractDataSet`` is the base class for all data set implementations.
    All data set implementations should extend this abstract class
    and implement the methods marked as abstract.
    Example:
    ::
        >>> from kedro.io import AbstractDataSet
        >>> import pandas as pd
        >>>
        >>> class MyOwnDataSet(AbstractDataSet):
        >>>     def __init__(self, param1, param2):
        >>>         self._param1 = param1
        >>>         self._param2 = param2
        >>>
        >>>     def _load(self) -> pd.DataFrame:
        >>>         print("Dummy load: {}".format(self._param1))
        >>>         return pd.DataFrame()
        >>>
        >>>     def _save(self, df: pd.DataFrame) -> None:
        >>>         print("Dummy save: {}".format(self._param2))
        >>>
        >>>     def _describe(self):
        >>>         return dict(param1=self._param1, param2=self._param2)
    """

    @abc.abstractmethod
    def _load(self) -> Any:
        raise NotImplementedError(
            "`{}` is a subclass of AbstractDataSet and"
            "it must implement the `_load` method".format(self.__class__.__name__)
        )

    @abc.abstractmethod
    def _save(self, data: Any) -> None:
        raise NotImplementedError(
            "`{}` is a subclass of AbstractDataSet and"
            "it must implement the `_save` method".format(self.__class__.__name__)
        )

    @abc.abstractmethod
    def _describe(self) -> Dict[str, Any]:
        raise NotImplementedError(
            "`{}` is a subclass of AbstractDataSet and"
            "it must implement the `_describe` method".format(self.__class__.__name__)
        )

    def _exists(self) -> bool:
        logging.getLogger(__name__).warning(
            "`exists()` not implemented for `%s`. Assuming output does not exist.",
            self.__class__.__name__,
        )
        return False

All you need to do is make a new class that inherits from AbstractDataSet and implements all the abstract methods with _exists being optional. You get to specify your own arguments as Kedro leans on i/o routines available in other libraries and a data set is really like a wrapper for that functionality.

A simple example is the pickle local example: here.

Some tips as you do that:

  • make sure you use the same arguments for the __init__ function
  • make sure load uses load_args and save uses save_args

If still confusing don't hesitate to reach out.

from kedro.

penguinpompom avatar penguinpompom commented on May 18, 2024

Hi @parulML @Pet3ris thanks for the amazing package! I have a similar question as @parulML and i am lost even after reading the guide. Appreciate if anyone could give a rough idea of how to write the function? Thanks in advance!

from kedro.

penguinpompom avatar penguinpompom commented on May 18, 2024

@Pet3ris Got it thanks!

from kedro.

lorenabalan avatar lorenabalan commented on May 18, 2024

@parulML @penguinpompom I will close this issue as answered. Feel free to re-open if you still have trouble with this answer. Thank you!

from kedro.

dasturge avatar dasturge commented on May 18, 2024

The issue I still see here is how to handle a dataset where individual rows should only be loaded from disk as-they-are-used. Where do I need to build in this logic? I can imagine having a json dataset and then allowing the pipeline functions to open the data from a filesystem path, but where does the actual data go in that case? How is the actual image data stored and handled by the data catalog?

from kedro.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.