airctic / icedata Goto Github PK

IceData: Datasets Hub for the *IceVision* Framework

Home Page: https://airctic.github.io/icedata/

License: Apache License 2.0

Python 100.00%

annotation-parsers annotations-formats coco coco-dataset coco-parser computer-vision-datasets custom-parser dataset deep-learning fastai object-detection pycoco pycocotools pytorch pytorch-lightning voc-dataset voc-parser

icedata's People

Contributors

Stargazers

Watchers

Forkers

ai-fast-track davanstrien frapochetti fstroth ribenamaplesyrup ganesh3 jerbly jpoberhauser rsomani95 rafael-f-brito yrodriguezmd ljcoopz robbie-palmer

icedata's Issues

Generating Docs Page Not Found

📓 Documentation Update

Clicking on the Contributing Guide link from the side-bar on https://airctic.github.io/icedata/ directs to https://airctic.github.io/icedata/readme.md
The body of the page displays a 404 - Not found error

Add a destination directory to `load_data()`

🚀 Feature

Is your feature request related to a problem? Please describe.
Presently, load_data() automatically saves the downloaded data to the /root/.icevision/data directory. The new features allows the user to choose the destination directory.

Describe the solution you'd like
Add an new argument called dest_dir to load_data() to let the user chosse the destination directory. dest_dir should be initiliazed to None in order to preserve the compatibility with existing API. If the user does not provide the dest_dir argument (dest_dir =None), the load_data() automatically saves the downloaded data to the /root/.icevision/data directory

Add a Parser CheatSheet Document

📓 Documentation Update

What part of documentation was unclear or wrong?
The goal is to create a resource that will help both beginners and advanced users to easily create their parsers by providing some frequently used code snippets and best practices. code snippets are divide into sections:

Section 1: files related code snippets
Section 2: parsing related code snippets

Useful example

Custom Parser

Use template generator

The first step is to create a class that inherits from these smaller building blocks:

This is just an example, choose the mixins that are relevant to your use case

class WheatParser(parsers.FasterRCNN, parsers.FilepathMixin, parsers.SizeMixin):
    pass

We use a method called generate_template that will print out all the necessary methods we have to implement.

WheatParser.generate_template()

Output:

def __iter__(self) -> Any:
def height(self, o) -> int:
def width(self, o) -> int:
def filepath(self, o) -> Union[str, Path]:
def bboxes(self, o) -> List[BBox]:
def labels(self, o) -> List[int]:
def imageid(self, o) -> Hashable:

If, for example, all the images are .jpg and located in the data_dir folder, the image_paths attribute will be set as follow:

def __init__(self, data_dir):
        self.image_paths = get_files(data_dir, extensions=[".jpg"])

Files related code snippets

Let's suppose we have the follwoing fname variable:
`fname = Path("PennFudanPed/PNGImages/FudanPed00002.png")

fname	PennFudanPed/PNGImages/FudanPed00002.png
fname.exists()	True
fname.with_suffix('.txt')	PennFudanPed/PNGImages/FudanPed00002.txt
fname.stem	FudanPed00002

Parsing related code snippets

Read a CSV file using pandas

import pandas as pd
df = pd.read_csv("path/to/csv/file")
df.head() # or df.sample()

Then, the bboxes attribute, will be created this way

bbox = "[834.0, 222.0, 56.0, 36.0]"`
xywh = np.fromstring(bbox[1:-1], sep=",")

Output: array([834., 222., 56., 36.])

coordinates as a text with separators

label = "2 0.527267 0.702972 0.945466 0.467218"
xywh = np.fromstring(label, sep=" ")[1:]
Output: array([0.527267, 0.702972, 0.945466, 0.467218])

Add COCO, VOC, and Birds README

📓 Documentation Update

What part of documentation was unclear or wrong?
Add the COCO, VOC, and Birds README, and create their corresponding documentation

Pennfudan broken colab badge link

📓 Documentation Update

We recently discovered a typo in Pennfundan and fixed the occurrences to Pennfudan (without the n), but the colab badge links in the tutorials still contain the typo and end up pointed to a non-existent notebook.

We should update the badge links to the correct name.

Referred file: https://github.com/airctic/icedata/blob/master/icedata/datasets/pennfudan/README.md

Add Fridge README

📓 Documentation Update

Add Fridge README as well as its corresponding documentation

Add documentation

📓 Documentation Update

Add the documentation first draft using MkDocs

How to filter predictions of specified classes

📓 New <Tutorial/Example>

Is this a request for a tutorial or for an example?
Tutorial.

What is the task?
Filter predictions of object detection model according to specified classes.

Is this example for a specific model?
FasterRCNN

Is this example for a specific dataset?
Dataset is COCO.

Don't remove
Main issue for examples: #39

Add a README template

📓 Documentation Update

We need to a README template that we will be added when generating a new dataset using template

Add dataset template test for load_data

🚀 Feature

A simple test to check if the given url is reachable would suffice.

Check if dataset name already exists in generate dataset

🚀 Feature

Dataset names must be unique, first check if a dataset name already exists before creating a new one.

Update README

📓 Documentation Update

Update README by replacing icevision references by icedata ones.

PennFundan - replace labels by label_ids

PennFundan - replace labels by label_ids

Pennfudan: replace imageid by record id

imageid has been replaced record id. Update PennFudan dataset

Incorrect links on badges

📓 Documentation Update

Fix links on the badges

Fix Colab URL

AttributeError: module 'icedata' has no attribute 'voc'

🐛 Bug

Install icedata
pip install icedata or pip install git+https://github.com/airctic/icedata.git@master

Expected behavior
voc should be imported properly

Env: colab

Update both plantdoc and bccd notebooks

Fix OCHuman Colab Badge

📓 Documentation Update

The OCHuman Colab Badge points to Francesco branch. We just need to update to point to icedata/notebooks/dev

Add PETS README

📓 Documentation Update

Add PETS README as well as its corresponding documentation

Adds samples and tests to dataset template

🚀 Feature

When generating the dataset also add a folder sample_data and tests.

Maybe we can also add a readme file to explain what to do there.

Add docs to dataset template generator

🚀 Feature

Automatically create/update the necessary documentation files when using the generator.

autogen.py
Automate the readme creation, currently we have to copy each file individually:

    # Copy Birds README
    shutil.copyfile(icedata_dir / "icedata/datasets/birds/README.md", dest_dir / "birds.md")

    # Copy COCO README
    shutil.copyfile(icedata_dir / "icedata/datasets/coco/README.md", dest_dir / "coco.md")

mkdocs.yml
Update the Datasets section:

  - Datasets:
    - Birds: birds.md
    - COCO: coco.md

Remove poetry.lock from repo

lock files should not be committed to libraries

load_data incorrectly check if data is already downloaded

🐛 Bug

Describe the bug
This can be seen at the icevision quickstart tutorial, where we had to pass force_download=True to download the dataset.

Probably the checking if the dataset directory exists is wrong.

Fix CI workflow: mk-docs-build.yml

🐛 Bug

Describe the bug
The documentation is not deployed.

To Reproduce
1- Make a some changes to the documentation (.md files)
2- Merge a pull request
3- Check if the documentation is built and deployed

Expected behavior
The documentation should be built and deployed.

Template generation for new datasets

🚀 Feature

Automate the process for adding a new dataset.

All datasets follow a very similar structure, a simple script can be made to automatically generate the initial skeleton for implementing a new dataset.

pip install is not working

🐛 Bug

Describe the bug
icedata cannot be pip installed.

Solution
Add setup.py and settings.ini

Update README and add link to IceVision

📓 Documentation Update

We need to add a link to our IceData repo

Automatically generates sample data

🚀 Feature

When creating a new dataset it's not necessary to manually add data to sample_data, can we somehow create a script that automates the process?

I'm not sure this can be possible for all annotation formats, but we can start with the most common ones: COCO and VOC

[dev-install] Co-developing with icevision

Icedata and Icevision are developed hand-in-hand, currently it's tricky to get a editable "master" installation of icevision while also installing icedata. This use case is also being discussed here.

Currenly what we have to do, is to manually modify pyproject.toml to use a path dependency instead of pypi dependency for local installation.

icedata.coco.parser is a module not a function

📓 Documentation Update

What part of documentation was unclear or wrong?
I think the docs for icedata are incorrect, based on the version that is installed when icevision is installed from master with the bash script installer.

When I try this from the main front page https://airctic.github.io/icedata/coco/:

# COCO parser: provided out-of-the-box
parser = icedata.coco.parser(data_dir=path, class_map=class_map)

I get this error

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_15591/3116397168.py in <module>
      1 import icedata
      2 
----> 3 icedata.coco.parser(data_dir=f'{path}/instances_slick_train.json', class_map=class_map)

TypeError: 'module' object is not callable

Describe the solution you'd like
I think the correct way to load annotations is defined in this test file: https://github.com/airctic/icevision/blob/f20a938956663d1aa8a17320caed815830dc4cd0/tests/parsers/test_coco_parser.py

Add remaining tests

🚀 Feature

Some datasets still need to be finished and tested:

coco
voc
birds

Update notebooks in docs

📓 Documentation Update

We need to update the notebooks found in the current version of the icedata documentation

update PennFundan dataset to 0.7.0

🚀 Feature

update PennFundan dataset to 0.7.0 and add dataset() method

Update the annotation tools list

📓 Documentation Update

Add other interesting annotation tools to the list

voc parser

🐛 Bug

When running the starting codes for the VOC, runs into error at the parser step.

To Reproduce
Steps to reproduce the behavior:

Go to https://airctic.github.io/icedata/examples/voc_exp/
Click on '....'
Scroll down to '....'
See error

Expected behavior

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):
Mac

Additional context
Add any other context about the problem here.

Simple automatic tests for templates

🚀 Feature

We eventually will have automatic tests to check the validity of the data structure, if the download link is working correctly and etc.

But let's start with simple tests, simply checking if the new added dataset can be imported and all the necessary functions are present is enough for now.

Fastcore 1.1 breaking changes

🐛 Bug

Describe the bug
Fastcore 1.1 contains a breaking change described here:

Remove Path.{read,write} (use Path.{read_text,write_text} instead) and change Path.{load,save} to functions load_pickle and save_pickle (#121)

We need to update our calls to Path.read to fix this issue.

Add pennfudan README

📓 Documentation Update

Add pennfudan README as well as its corresponding documentation

Updates install guide with poetry

📓 Documentation Update

Updates the installation instruction and other relevant parts of the documentation with poetry.

For this library, it's important to use a poetry version > 1.1.x which is currently only in pre-release.

To update poetry to this version do:

poetry self update --preview

Allowing custom path for `icedata.load_data()`

🚀 Feature

Currently, icedata.load_data() calls the get_data_dir() function from icevision which always return Path.home()/".icevision"/"data". It would be helpful if we can choose a different location to store the datasets.

Update examples in docs

📓 Documentation Update

What part of documentation was unclear or wrong?
We need to update the examples found in the current version of the icedata documentation

Add `dataset` function for each dataset

🚀 Feature

For each dataset we can also add a dataset function that returns a train,valid dataset with default transforms included.

This would also reduce the amount of repeated code on icevision tutorials.

HOW TO

Follow the structure of datasets/fridge/dataset.py or datasets/pennfudan/dataset.py:

Create dataset.py file
Add from icedata.datasets.<DATASET_FOLDER>.dataset import * to datasets/<DATASET_FOLDER/__init__.py

TODO (no specific order)

Add trained_models to the pets dataset

🚀 Feature

Add Faster RCNN pretrained models

Slow import times

🚀 Feature

Importing icedata takes a lot of time, this happens because interally we are importing unecessary stuff from icevision (heavy libraries like pytorch).

I think the problem mostly comes from icevision.imports

It might be a good idea to curate our own imports on icedata.imports to alliviate this issue.

Test package with icevision master

Icevision and icedata versions go hand in hand together, some changes made on icedata will depend on changes on icevision not yet released, so we need to also test with icevision master.

ssl error when trying to download the model weights for mmdet.fcos

🐛 Bug

Describe the bug
ssl error while trying to download the weights of the mmdet.fcos model like so ...

SSLError: HTTPSConnectionPool(host='openmmlab.oss-cn-hangzhou.aliyuncs.com', port=443): Max retries exceeded with url: /mmdetection/v2.0/fcos/fcos_r101_caffe_fpn_gn-head_1x_coco/fcos_r101_caffe_fpn_gn-head_1x_coco-0e37b982.pth (Caused by SSLError(SSLError(1, '[SSL] unknown error (_ssl.c:1123)')))

To Reproduce
Steps to reproduce the behavior:
1.model_type = models.mmdet.fcos
2. backbone = model_type.backbones.resnet101_caffe_fpn_gn_head_1x_coco
3. model = model_type.model(backbone=backbone(pretrained=True), num_classes=len(parser.class_map), **extra_args)
4.

Expected behavior
Downloading of the weights and instantiation of the model

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

OS: [e.g. ubuntu 18.04]

Additional context
Add any other context about the problem here.

Rename parsers.py to parser.py

The old datasets use the parsers.py convention but the template uses parser.py (which I believe is a better name)

So we need to rename the files to parser.py and update their respective __init__.py file.

airctic / icedata Goto Github PK

icedata's People

Contributors

Stargazers

Watchers

Forkers

icedata's Issues

📓 Documentation Update

🚀 Feature

📓 Documentation Update

Useful example

Use template generator

This is just an example, choose the mixins that are relevant to your use case

Files related code snippets

Parsing related code snippets

Read a CSV file using pandas

coordinates as a text with separators

📓 Documentation Update

📓 Documentation Update

📓 Documentation Update

📓 Documentation Update

📓 New <Tutorial/Example>

📓 Documentation Update

🚀 Feature

🚀 Feature

📓 Documentation Update

📓 Documentation Update

🐛 Bug

📓 Documentation Update

📓 Documentation Update

🚀 Feature

🚀 Feature

🐛 Bug

🐛 Bug

🚀 Feature

🐛 Bug

📓 Documentation Update

🚀 Feature

📓 Documentation Update

🚀 Feature

📓 Documentation Update

🚀 Feature

📓 Documentation Update

🐛 Bug

🚀 Feature

🐛 Bug

📓 Documentation Update

📓 Documentation Update

🚀 Feature

📓 Documentation Update

🚀 Feature

HOW TO

TODO (no specific order)

🚀 Feature

🚀 Feature

🐛 Bug

Recommend Projects

Recommend Topics

Recommend Org