Coder Social home page Coder Social logo

asatledfish / dataset-converters Goto Github PK

View Code? Open in Web Editor NEW

This project forked from issresearch/dataset-converters

0.0 0.0 0.0 143 KB

A conversion toolset between different object detection and instance segmentation annotation formats

License: MIT License

Python 98.60% Dockerfile 1.40%

dataset-converters's Introduction

Dataset Converters

Dataset Converters is a conversion toolset between different object detection and instance segmentation annotation formats.
It was written and is maintained by deep learning developers from Intelligent Security Systems company to simplify the research.

Introduction

There are multiple different dataset annotation formats for object detection and instance segmentation.
This repository contains a system of scripts, which simplify the conversion process between those formats. We rely on COCO format as the main representation.

Installation

Please, cd to DatasetConverters folder and then type

pip install -r requirements.txt

This will install the required dependencies.

Usage

Conversion

To perform conversion between different dataset formats, we provide the script called convert.py.

For example, suppose, you have ADE20K dataset and you want to convert it into COCO format.
For that purpose, please type

python convert.py --input-folder <path_to_folder_ADE20K_2016_07_26> --output-folder <output_path> \
                  --input-format ADE20K --output-format COCO --copy

Note. The shorter version of the same can be written as

python convert.py -i <path_to_folder_ADE20K_2016_07_26> -o <output_path> -I ADE20K -O COCO --copy

Note. --copy argument stands for copying images. In Linux you can instead pass --symlink to create symbolic links.

You are ready to use ADE20K in frameworks with COCO input format.

For the full list of supported conversions, please refer to Supported conversions section.

Merging

If you have multiple annotations, converted to COCO format, we provide script merge_json_datasets.py to merge them.
Suppose, you have COCO and Pascal VOC segmentations in COCO format and want to merge dog and horse annotations from them. This is how merge_json_datasets.py can serve that purpose


python merge_json_datasets.py -d <coco_images_folder> -a <coco_annotations.json> --ids 18 19 \
                              -d <vocsegm_images_folder> -a <vocsegm_annotations.json> --ids 12 13 \
                              --output-ids 1 2 -o <output_dir> -n dog horse

In this example, number of merged datasets is two, but it is not limited. You can merge as many datasets and classes in COCO format, as you need.
For each dataset in COCO format, one should provide the following arguments

  • -d for images;
  • -a for json file of annotations;
  • --ids for list of ids of goal classes in the dataset.

After all datasets are specified with this pattern, output information is specified with the following arguments

  • --output-ids for list of output ids of the goal classes;
  • -o for output directory for the merged dataset;
  • -n for names of the goal classes in the merged dataset.

Supported conversions

In this section we list all of the supported formats and their conversions.

  • ADE20K

    Can be directly converted to

    • COCO
  • CITYSCAPES

    Can be directly converted to

    • COCO
  • COCO

    Can be directly converted to

    • TDG
    • TDGSEGM
    • VOCCALIB
    • YOLO

    Note.
    We expect names of the json annotation files correspond to names of their image folders. If annotation file is called XYZ.json, the corresponding folder is expected to be called XYZ.
    To convert original COCO dataset, please rename folders
    train2017 to instances_train2017;
    val2017 to instances_val2017
    and leave only two corresponding files in annotations folder: instances_train2017.json and instances_val2017.json.

  • CVAT

    Can be directly converted to
    • COCO

    Note.
    In case of CVAT input format, we expect the xml annotation file and images to be placed in the same folder. That folder is supposed to be input_folder argument of convert function.

  • OID

    Stands for Open Images Dataset V4.
    Can be directly converted to
    • COCO
  • TDG

    Custom format for bounding box annotation.
    You can use it to train Faster R-CNNs and SSDs in their Caffe branches.
    Can be directly converted to
  • TDGSEGM

    Custom format for instance segmnentation.
    Can be directly converted to
    • COCO
  • VOC

    Stands for bounding box annotations from Pascal VOC datasets.
    Can be directly converted to
    • COCO
  • VOCCALIB

    Stands for bounding box annotations used in OpenVINO calibration tool.
    They are supposed to be "VOC-like". Convert to this format to use the result in OpenVINO calibration tool.
    No conversions from this format available.
  • VOCSEGM

    Stands for instance segmentation annotations from Pascal VOC datasets.
    Can be directly converted to
    • COCO
  • YOLO

    Can be directly converted to

    • COCO

    Note.
    We expect an obj.data file in the input folder. The following lines in any order must be presented:

    train  = <relative path to the file with list of train images>
    valid  = <relative path to the file with list of valid images>
    names = <relative path to the file with list of classes>
    

    The list of images consists of lines with relative paths to each image:

    path/to/image1.jpg
    path/to/image2.jpg
    ...
    path/to/imageN.jpg
    

    Near every imageM.jpg must be presented an imageM.txt file with annotations in yolo format.

How to contribute

We welcome community contributions to the Dataset Converters.

If you want to add a new dataset converter, please note, that we expect

  • The new dataset is free and open, so we are able to download it and test your code;
  • Your code is written from scratch and does not have parts copied from other repositories.

The list of the core files, which are the key to understand the implementation process is the following

Converter.py
ConverterBase.py
converters.py
formats.py

The new converter is a subclass of ConverterBase class with _run mehtod overloaded and conversion format added to the list formats.

dataset-converters's People

Contributors

armaxik avatar arutyunovg avatar kolua0901 avatar slavanap avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.