airctic / icedata Goto Github PK
View Code? Open in Web Editor NEWIceData: Datasets Hub for the *IceVision* Framework
Home Page: https://airctic.github.io/icedata/
License: Apache License 2.0
IceData: Datasets Hub for the *IceVision* Framework
Home Page: https://airctic.github.io/icedata/
License: Apache License 2.0
Clicking on the Contributing Guide
link from the side-bar on https://airctic.github.io/icedata/ directs to https://airctic.github.io/icedata/readme.md
The body of the page displays a 404 - Not found
error
Is your feature request related to a problem? Please describe.
Presently, load_data()
automatically saves the downloaded data to the /root/.icevision/data
directory. The new features allows the user to choose the destination directory.
Describe the solution you'd like
Add an new argument called dest_dir
to load_data()
to let the user chosse the destination directory. dest_dir
should be initiliazed to None
in order to preserve the compatibility with existing API. If the user does not provide the dest_dir
argument (dest_dir
=None), the load_data()
automatically saves the downloaded data to the /root/.icevision/data
directory
What part of documentation was unclear or wrong?
The goal is to create a resource that will help both beginners and advanced users to easily create their parsers by providing some frequently used code snippets and best practices. code snippets are divide into sections:
Section 1: files related code snippets
Section 2: parsing related code snippets
The first step is to create a class that inherits from these smaller building blocks:
class WheatParser(parsers.FasterRCNN, parsers.FilepathMixin, parsers.SizeMixin):
pass
We use a method called generate_template
that will print out all the necessary methods we have to implement.
WheatParser.generate_template()
Output:
def __iter__(self) -> Any:
def height(self, o) -> int:
def width(self, o) -> int:
def filepath(self, o) -> Union[str, Path]:
def bboxes(self, o) -> List[BBox]:
def labels(self, o) -> List[int]:
def imageid(self, o) -> Hashable:
If, for example, all the images are .jpg
and located in the data_dir
folder, the image_paths
attribute will be set as follow:
def __init__(self, data_dir):
self.image_paths = get_files(data_dir, extensions=[".jpg"])
Let's suppose we have the follwoing fname
variable:
`fname = Path("PennFudanPed/PNGImages/FudanPed00002.png")
fname | PennFudanPed/PNGImages/FudanPed00002.png |
---|---|
fname.exists() | True |
fname.with_suffix('.txt') | PennFudanPed/PNGImages/FudanPed00002.txt |
fname.stem | FudanPed00002 |
import pandas as pd
df = pd.read_csv("path/to/csv/file")
df.head() # or df.sample()
Then, the bboxes attribute, will be created this way
bbox = "[834.0, 222.0, 56.0, 36.0]"`
xywh = np.fromstring(bbox[1:-1], sep=",")
Output: array([834., 222., 56., 36.])
label = "2 0.527267 0.702972 0.945466 0.467218"
xywh = np.fromstring(label, sep=" ")[1:]
Output: array([0.527267, 0.702972, 0.945466, 0.467218])
What part of documentation was unclear or wrong?
Add the COCO, VOC, and Birds README, and create their corresponding documentation
We recently discovered a typo in Pennfundan and fixed the occurrences to Pennfudan (without the n), but the colab badge links in the tutorials still contain the typo and end up pointed to a non-existent notebook.
We should update the badge links to the correct name.
Referred file: https://github.com/airctic/icedata/blob/master/icedata/datasets/pennfudan/README.md
Add Fridge README as well as its corresponding documentation
Add the documentation first draft using MkDocs
Is this a request for a tutorial or for an example?
Tutorial.
What is the task?
Filter predictions of object detection model according to specified classes.
Is this example for a specific model?
FasterRCNN
Is this example for a specific dataset?
Dataset is COCO.
Don't remove
Main issue for examples: #39
We need to a README template that we will be added when generating a new dataset using template
A simple test to check if the given url is reachable would suffice.
Dataset names must be unique, first check if a dataset name already exists before creating a new one.
Update README by replacing icevision references by icedata ones.
PennFundan - replace labels
by label_ids
imageid has been replaced record id. Update PennFudan dataset
Fix links on the badges
Install icedata
pip install icedata
or pip install git+https://github.com/airctic/icedata.git@master
Expected behavior
voc
should be imported properly
Env: colab
Update both plantdoc and bccd notebooks
The OCHuman Colab Badge points to Francesco branch. We just need to update to point to icedata/notebooks/dev
Add PETS README as well as its corresponding documentation
When generating the dataset also add a folder sample_data
and tests
.
Maybe we can also add a readme file to explain what to do there.
Automatically create/update the necessary documentation files when using the generator.
autogen.py
Automate the readme creation, currently we have to copy each file individually:
# Copy Birds README
shutil.copyfile(icedata_dir / "icedata/datasets/birds/README.md", dest_dir / "birds.md")
# Copy COCO README
shutil.copyfile(icedata_dir / "icedata/datasets/coco/README.md", dest_dir / "coco.md")
mkdocs.yml
Update the Datasets
section:
- Datasets:
- Birds: birds.md
- COCO: coco.md
lock files should not be committed to libraries
Describe the bug
This can be seen at the icevision quickstart tutorial, where we had to pass force_download=True
to download the dataset.
Probably the checking if the dataset directory exists is wrong.
Describe the bug
The documentation is not deployed.
To Reproduce
1- Make a some changes to the documentation (.md files)
2- Merge a pull request
3- Check if the documentation is built and deployed
Expected behavior
The documentation should be built and deployed.
Automate the process for adding a new dataset.
All datasets follow a very similar structure, a simple script can be made to automatically generate the initial skeleton for implementing a new dataset.
Describe the bug
icedata cannot be pip installed.
Solution
Add setup.py and settings.ini
We need to add a link to our IceData repo
When creating a new dataset it's not necessary to manually add data to sample_data
, can we somehow create a script that automates the process?
I'm not sure this can be possible for all annotation formats, but we can start with the most common ones: COCO and VOC
Icedata and Icevision are developed hand-in-hand, currently it's tricky to get a editable "master" installation of icevision while also installing icedata. This use case is also being discussed here.
Currenly what we have to do, is to manually modify pyproject.toml
to use a path dependency instead of pypi dependency for local installation.
What part of documentation was unclear or wrong?
I think the docs for icedata are incorrect, based on the version that is installed when icevision is installed from master with the bash script installer.
When I try this from the main front page https://airctic.github.io/icedata/coco/:
# COCO parser: provided out-of-the-box
parser = icedata.coco.parser(data_dir=path, class_map=class_map)
I get this error
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/tmp/ipykernel_15591/3116397168.py in <module>
1 import icedata
2
----> 3 icedata.coco.parser(data_dir=f'{path}/instances_slick_train.json', class_map=class_map)
TypeError: 'module' object is not callable
Describe the solution you'd like
I think the correct way to load annotations is defined in this test file: https://github.com/airctic/icevision/blob/f20a938956663d1aa8a17320caed815830dc4cd0/tests/parsers/test_coco_parser.py
Some datasets still need to be finished and tested:
We need to update the notebooks found in the current version of the icedata documentation
update PennFundan dataset to 0.7.0 and add dataset()
method
Add other interesting annotation tools to the list
When running the starting codes for the VOC, runs into error at the parser step.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Screenshots
If applicable, add screenshots to help explain your problem.
Desktop (please complete the following information):
Mac
Additional context
Add any other context about the problem here.
We eventually will have automatic tests to check the validity of the data structure, if the download link is working correctly and etc.
But let's start with simple tests, simply checking if the new added dataset can be imported and all the necessary functions are present is enough for now.
Add pennfudan README as well as its corresponding documentation
Updates the installation instruction and other relevant parts of the documentation with poetry.
For this library, it's important to use a poetry version > 1.1.x which is currently only in pre-release.
To update poetry to this version do:
poetry self update --preview
icedata.load_data()
calls the get_data_dir()
function from icevision
which always return Path.home()/".icevision"/"data"
. It would be helpful if we can choose a different location to store the datasets.What part of documentation was unclear or wrong?
We need to update the examples found in the current version of the icedata documentation
For each dataset we can also add a dataset
function that returns a train,valid
dataset with default transforms included.
This would also reduce the amount of repeated code on icevision tutorials.
Follow the structure of datasets/fridge/dataset.py
or datasets/pennfudan/dataset.py
:
dataset.py
filefrom icedata.datasets.<DATASET_FOLDER>.dataset import *
to datasets/<DATASET_FOLDER/__init__.py
Add Faster RCNN pretrained models
Importing icedata takes a lot of time, this happens because interally we are importing unecessary stuff from icevision (heavy libraries like pytorch).
I think the problem mostly comes from icevision.imports
It might be a good idea to curate our own imports on icedata.imports
to alliviate this issue.
Icevision and icedata versions go hand in hand together, some changes made on icedata will depend on changes on icevision not yet released, so we need to also test with icevision master.
Describe the bug
ssl error while trying to download the weights of the mmdet.fcos model like so ...
SSLError: HTTPSConnectionPool(host='openmmlab.oss-cn-hangzhou.aliyuncs.com', port=443): Max retries exceeded with url: /mmdetection/v2.0/fcos/fcos_r101_caffe_fpn_gn-head_1x_coco/fcos_r101_caffe_fpn_gn-head_1x_coco-0e37b982.pth (Caused by SSLError(SSLError(1, '[SSL] unknown error (_ssl.c:1123)')))
To Reproduce
Steps to reproduce the behavior:
1.model_type = models.mmdet.fcos
2. backbone = model_type.backbones.resnet101_caffe_fpn_gn_head_1x_coco
3. model = model_type.model(backbone=backbone(pretrained=True), num_classes=len(parser.class_map), **extra_args)
4.
Expected behavior
Downloading of the weights and instantiation of the model
Screenshots
If applicable, add screenshots to help explain your problem.
Desktop (please complete the following information):
Additional context
Add any other context about the problem here.
The old datasets use the parsers.py
convention but the template uses parser.py
(which I believe is a better name)
So we need to rename the files to parser.py
and update their respective __init__.py
file.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.