I am moving my image segmentation projects to kedro but kedro does not support this da

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Creating custom dataset for image segmentation. about kedro HOT 6 CLOSED

kedro-org commented on May 18, 2024 2

Creating custom dataset for image segmentation.

from kedro.

Comments (6)

Pet3ris commented on May 18, 2024 2

Hi @parulML, the easiest approach would be to create a custom dataset (guide provided in the link). That way you can use whatever approach you are already using to load and save images.

I think there are many ways to read and interpret images for learning so would be interesting to learn what you are using now and how it could contribute to Kedro.

from kedro.

Pet3ris commented on May 18, 2024 1

@penguinpompom if the Excel example in the tutorial is confusing, you could look at the AbstractDataSet implementation in core.py.

I've removed all the parts you can ignore:

class AbstractDataSet(abc.ABC):
    """``AbstractDataSet`` is the base class for all data set implementations.
    All data set implementations should extend this abstract class
    and implement the methods marked as abstract.
    Example:
    ::
        >>> from kedro.io import AbstractDataSet
        >>> import pandas as pd
        >>>
        >>> class MyOwnDataSet(AbstractDataSet):
        >>>     def __init__(self, param1, param2):
        >>>         self._param1 = param1
        >>>         self._param2 = param2
        >>>
        >>>     def _load(self) -> pd.DataFrame:
        >>>         print("Dummy load: {}".format(self._param1))
        >>>         return pd.DataFrame()
        >>>
        >>>     def _save(self, df: pd.DataFrame) -> None:
        >>>         print("Dummy save: {}".format(self._param2))
        >>>
        >>>     def _describe(self):
        >>>         return dict(param1=self._param1, param2=self._param2)
    """

    @abc.abstractmethod
    def _load(self) -> Any:
        raise NotImplementedError(
            "`{}` is a subclass of AbstractDataSet and"
            "it must implement the `_load` method".format(self.__class__.__name__)
        )

    @abc.abstractmethod
    def _save(self, data: Any) -> None:
        raise NotImplementedError(
            "`{}` is a subclass of AbstractDataSet and"
            "it must implement the `_save` method".format(self.__class__.__name__)
        )

    @abc.abstractmethod
    def _describe(self) -> Dict[str, Any]:
        raise NotImplementedError(
            "`{}` is a subclass of AbstractDataSet and"
            "it must implement the `_describe` method".format(self.__class__.__name__)
        )

    def _exists(self) -> bool:
        logging.getLogger(__name__).warning(
            "`exists()` not implemented for `%s`. Assuming output does not exist.",
            self.__class__.__name__,
        )
        return False

All you need to do is make a new class that inherits from AbstractDataSet and implements all the abstract methods with _exists being optional. You get to specify your own arguments as Kedro leans on i/o routines available in other libraries and a data set is really like a wrapper for that functionality.

A simple example is the pickle local example: here.

Some tips as you do that:

make sure you use the same arguments for the __init__ function
make sure load uses load_args and save uses save_args

If still confusing don't hesitate to reach out.

from kedro.

penguinpompom commented on May 18, 2024

Hi @parulML @Pet3ris thanks for the amazing package! I have a similar question as @parulML and i am lost even after reading the guide. Appreciate if anyone could give a rough idea of how to write the function? Thanks in advance!

from kedro.

penguinpompom commented on May 18, 2024

@Pet3ris Got it thanks!

from kedro.

lorenabalan commented on May 18, 2024

@parulML @penguinpompom I will close this issue as answered. Feel free to re-open if you still have trouble with this answer. Thank you!

from kedro.

dasturge commented on May 18, 2024

The issue I still see here is how to handle a dataset where individual rows should only be loaded from disk as-they-are-used. Where do I need to build in this logic? I can imagine having a json dataset and then allowing the pipeline functions to open the data from a filesystem path, but where does the actual data go in that case? How is the actual image data stored and handled by the data catalog?

from kedro.

Creating custom dataset for image segmentation. about kedro HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent