Coder Social home page Coder Social logo

airctic / icedata Goto Github PK

View Code? Open in Web Editor NEW
49.0 49.0 13.0 163.12 MB

IceData: Datasets Hub for the *IceVision* Framework

Home Page: https://airctic.github.io/icedata/

License: Apache License 2.0

Python 100.00%
annotation-parsers annotations-formats coco coco-dataset coco-parser computer-vision-datasets custom-parser dataset deep-learning fastai object-detection pycoco pycocotools pytorch pytorch-lightning voc-dataset voc-parser

icedata's People

Contributors

adamfarquhar avatar ai-fast-track avatar buckley-w-david avatar burntcarrot avatar davanstrien avatar frapochetti avatar fstroth avatar ganesh3 avatar jpoberhauser avatar lgvaz avatar ribenamaplesyrup avatar yrodriguezmd avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

icedata's Issues

Add a destination directory to `load_data()`

๐Ÿš€ Feature

Is your feature request related to a problem? Please describe.
Presently, load_data() automatically saves the downloaded data to the /root/.icevision/data directory. The new features allows the user to choose the destination directory.

Describe the solution you'd like
Add an new argument called dest_dir to load_data() to let the user chosse the destination directory. dest_dir should be initiliazed to None in order to preserve the compatibility with existing API. If the user does not provide the dest_dir argument (dest_dir =None), the load_data() automatically saves the downloaded data to the /root/.icevision/data directory

Add a Parser CheatSheet Document

๐Ÿ““ Documentation Update

What part of documentation was unclear or wrong?
The goal is to create a resource that will help both beginners and advanced users to easily create their parsers by providing some frequently used code snippets and best practices. code snippets are divide into sections:

Section 1: files related code snippets
Section 2: parsing related code snippets

Useful example

Custom Parser

Use template generator

The first step is to create a class that inherits from these smaller building blocks:

This is just an example, choose the mixins that are relevant to your use case

class WheatParser(parsers.FasterRCNN, parsers.FilepathMixin, parsers.SizeMixin):
    pass

We use a method called generate_template that will print out all the necessary methods we have to implement.

WheatParser.generate_template()

Output:

def __iter__(self) -> Any:
def height(self, o) -> int:
def width(self, o) -> int:
def filepath(self, o) -> Union[str, Path]:
def bboxes(self, o) -> List[BBox]:
def labels(self, o) -> List[int]:
def imageid(self, o) -> Hashable:

If, for example, all the images are .jpg and located in the data_dir folder, the image_paths attribute will be set as follow:

def __init__(self, data_dir):
        self.image_paths = get_files(data_dir, extensions=[".jpg"])

Files related code snippets

Let's suppose we have the follwoing fname variable:
`fname = Path("PennFudanPed/PNGImages/FudanPed00002.png")

fname PennFudanPed/PNGImages/FudanPed00002.png
fname.exists() True
fname.with_suffix('.txt') PennFudanPed/PNGImages/FudanPed00002.txt
fname.stem FudanPed00002

Parsing related code snippets

Read a CSV file using pandas

import pandas as pd
df = pd.read_csv("path/to/csv/file")
df.head() # or df.sample()

Then, the bboxes attribute, will be created this way

bbox = "[834.0, 222.0, 56.0, 36.0]"`
xywh = np.fromstring(bbox[1:-1], sep=",")

Output: array([834., 222., 56., 36.])

coordinates as a text with separators

label = "2 0.527267 0.702972 0.945466 0.467218"
xywh = np.fromstring(label, sep=" ")[1:]
Output: array([0.527267, 0.702972, 0.945466, 0.467218])

Add COCO, VOC, and Birds README

๐Ÿ““ Documentation Update

What part of documentation was unclear or wrong?
Add the COCO, VOC, and Birds README, and create their corresponding documentation

Add Fridge README

๐Ÿ““ Documentation Update

Add Fridge README as well as its corresponding documentation

Add documentation

๐Ÿ““ Documentation Update

Add the documentation first draft using MkDocs

How to filter predictions of specified classes

๐Ÿ““ New <Tutorial/Example>

Is this a request for a tutorial or for an example?
Tutorial.

What is the task?
Filter predictions of object detection model according to specified classes.

Is this example for a specific model?
FasterRCNN

Is this example for a specific dataset?
Dataset is COCO.


Don't remove
Main issue for examples: #39

Add a README template

๐Ÿ““ Documentation Update

We need to a README template that we will be added when generating a new dataset using template

Update README

๐Ÿ““ Documentation Update

Update README by replacing icevision references by icedata ones.

Fix OCHuman Colab Badge

๐Ÿ““ Documentation Update

The OCHuman Colab Badge points to Francesco branch. We just need to update to point to icedata/notebooks/dev

Add PETS README

๐Ÿ““ Documentation Update

Add PETS README as well as its corresponding documentation

Add docs to dataset template generator

๐Ÿš€ Feature

Automatically create/update the necessary documentation files when using the generator.

autogen.py
Automate the readme creation, currently we have to copy each file individually:

    # Copy Birds README
    shutil.copyfile(icedata_dir / "icedata/datasets/birds/README.md", dest_dir / "birds.md")

    # Copy COCO README
    shutil.copyfile(icedata_dir / "icedata/datasets/coco/README.md", dest_dir / "coco.md")

mkdocs.yml
Update the Datasets section:

  - Datasets:
    - Birds: birds.md
    - COCO: coco.md

Fix CI workflow: mk-docs-build.yml

๐Ÿ› Bug

Describe the bug
The documentation is not deployed.

To Reproduce
1- Make a some changes to the documentation (.md files)
2- Merge a pull request
3- Check if the documentation is built and deployed

Expected behavior
The documentation should be built and deployed.

Template generation for new datasets

๐Ÿš€ Feature

Automate the process for adding a new dataset.

All datasets follow a very similar structure, a simple script can be made to automatically generate the initial skeleton for implementing a new dataset.

pip install is not working

๐Ÿ› Bug

Describe the bug
icedata cannot be pip installed.

Solution
Add setup.py and settings.ini

Automatically generates sample data

๐Ÿš€ Feature

When creating a new dataset it's not necessary to manually add data to sample_data, can we somehow create a script that automates the process?

I'm not sure this can be possible for all annotation formats, but we can start with the most common ones: COCO and VOC

[dev-install] Co-developing with icevision

Icedata and Icevision are developed hand-in-hand, currently it's tricky to get a editable "master" installation of icevision while also installing icedata. This use case is also being discussed here.

Currenly what we have to do, is to manually modify pyproject.toml to use a path dependency instead of pypi dependency for local installation.

icedata.coco.parser is a module not a function

๐Ÿ““ Documentation Update

What part of documentation was unclear or wrong?
I think the docs for icedata are incorrect, based on the version that is installed when icevision is installed from master with the bash script installer.

When I try this from the main front page https://airctic.github.io/icedata/coco/:

# COCO parser: provided out-of-the-box
parser = icedata.coco.parser(data_dir=path, class_map=class_map)

I get this error

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_15591/3116397168.py in <module>
      1 import icedata
      2 
----> 3 icedata.coco.parser(data_dir=f'{path}/instances_slick_train.json', class_map=class_map)

TypeError: 'module' object is not callable

Describe the solution you'd like
I think the correct way to load annotations is defined in this test file: https://github.com/airctic/icevision/blob/f20a938956663d1aa8a17320caed815830dc4cd0/tests/parsers/test_coco_parser.py

Add remaining tests

๐Ÿš€ Feature

Some datasets still need to be finished and tested:

  • coco
  • voc
  • birds

Update notebooks in docs

๐Ÿ““ Documentation Update

We need to update the notebooks found in the current version of the icedata documentation

voc parser

๐Ÿ› Bug

When running the starting codes for the VOC, runs into error at the parser step.

To Reproduce
Steps to reproduce the behavior:

  1. Go to https://airctic.github.io/icedata/examples/voc_exp/
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior

Screenshots
If applicable, add screenshots to help explain your problem.
Screen Shot 2021-08-19 at 11 50 55 AM
Screen Shot 2021-08-19 at 11 51 05 AM

Desktop (please complete the following information):
Mac

Additional context
Add any other context about the problem here.

Simple automatic tests for templates

๐Ÿš€ Feature

We eventually will have automatic tests to check the validity of the data structure, if the download link is working correctly and etc.

But let's start with simple tests, simply checking if the new added dataset can be imported and all the necessary functions are present is enough for now.

Fastcore 1.1 breaking changes

๐Ÿ› Bug

Describe the bug
Fastcore 1.1 contains a breaking change described here:

  • Remove Path.{read,write} (use Path.{read_text,write_text} instead) and change Path.{load,save} to functions load_pickle and save_pickle (#121)

We need to update our calls to Path.read to fix this issue.

Add pennfudan README

๐Ÿ““ Documentation Update

Add pennfudan README as well as its corresponding documentation

Updates install guide with poetry

๐Ÿ““ Documentation Update

Updates the installation instruction and other relevant parts of the documentation with poetry.

For this library, it's important to use a poetry version > 1.1.x which is currently only in pre-release.

To update poetry to this version do:

poetry self update --preview

Allowing custom path for `icedata.load_data()`

๐Ÿš€ Feature

  • Currently, icedata.load_data() calls the get_data_dir() function from icevision which always return Path.home()/".icevision"/"data". It would be helpful if we can choose a different location to store the datasets.

Update examples in docs

๐Ÿ““ Documentation Update

What part of documentation was unclear or wrong?
We need to update the examples found in the current version of the icedata documentation

Add `dataset` function for each dataset

๐Ÿš€ Feature

For each dataset we can also add a dataset function that returns a train,valid dataset with default transforms included.

This would also reduce the amount of repeated code on icevision tutorials.

HOW TO

Follow the structure of datasets/fridge/dataset.py or datasets/pennfudan/dataset.py:

  1. Create dataset.py file
  2. Add from icedata.datasets.<DATASET_FOLDER>.dataset import * to datasets/<DATASET_FOLDER/__init__.py

TODO (no specific order)

  • fridge
  • pennfudan
  • birds
  • biwi
  • coco (In progress by: @jpoberhauser)
  • ochuman
  • pets (in progress by: @ganesh3)
  • voc

Slow import times

๐Ÿš€ Feature

Importing icedata takes a lot of time, this happens because interally we are importing unecessary stuff from icevision (heavy libraries like pytorch).

I think the problem mostly comes from icevision.imports

It might be a good idea to curate our own imports on icedata.imports to alliviate this issue.

Test package with icevision master

Icevision and icedata versions go hand in hand together, some changes made on icedata will depend on changes on icevision not yet released, so we need to also test with icevision master.

ssl error when trying to download the model weights for mmdet.fcos

๐Ÿ› Bug

Describe the bug
ssl error while trying to download the weights of the mmdet.fcos model like so ...

SSLError: HTTPSConnectionPool(host='openmmlab.oss-cn-hangzhou.aliyuncs.com', port=443): Max retries exceeded with url: /mmdetection/v2.0/fcos/fcos_r101_caffe_fpn_gn-head_1x_coco/fcos_r101_caffe_fpn_gn-head_1x_coco-0e37b982.pth (Caused by SSLError(SSLError(1, '[SSL] unknown error (_ssl.c:1123)')))

To Reproduce
Steps to reproduce the behavior:
1.model_type = models.mmdet.fcos
2. backbone = model_type.backbones.resnet101_caffe_fpn_gn_head_1x_coco
3. model = model_type.model(backbone=backbone(pretrained=True), num_classes=len(parser.class_map), **extra_args)
4.

Expected behavior
Downloading of the weights and instantiation of the model

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. ubuntu 18.04]

Additional context
Add any other context about the problem here.

Rename parsers.py to parser.py

The old datasets use the parsers.py convention but the template uses parser.py (which I believe is a better name)

So we need to rename the files to parser.py and update their respective __init__.py file.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.