Coder Social home page Coder Social logo

V3Det: Vast Vocabulary Visual Detection Dataset

Jiaqi Wang*, Pan Zhang*, Tao Chu*, Yuhang Cao*,
Yujie Zhou, Tong Wu, Bin Wang, Conghui He, Dahua Lin
(* equal contribution)
Accepted to ICCV 2023 (Oral)

Codebase

Object Detection

Open Vocabulary Detection (OVD)

Data Format

The data includes a Train Set, a Val Set, and a Test Set, comprising 13,204 categories.

Split Images BBoxes
Train Set 183,354 1,357,377
Val Set 29,821 220,429
Test Set 29,863 219,012
Train Set OVD (Base Class) 132,437 836,203

The 13,204 categories are split into 6709 Base Class and 6495 Novel Class for OVD tasks. For each of the 13,204 categories, we prepare an exemplar image and detailed descriptions from various resources (human experts, ChatGPT, GPT4, and GPT4V).

Base Class Novel Class All Class
6709 6495 13204

The Train Set OVD (Base Class) is a subset of train set that only keeps the annotations of base classes, which is prepared for OVD (Open-Vocubalary Detection) tasks. Images without any annotations after filtering out novel annotations are removed. It is perpared for OVD (Open-Vocubalary Detection) tasks.

Split Images BBoxes
Train Set 183,354 1,357,377
Train Set OVD (Base Class) 132,437 836,203

The data organization is:

V3Det/
    images/
        <category_node>/
            |────<image_name>.png
            ...
        ...
    test/
        |────<image_name>.png
        ...
    exemplar_images/
        |────<category_id>.jpg
        ...
    annotations/
        |────v3det_2023_v1_category_tree.json       # Category tree
        |────category_name_13204_v3det_2023_v1.txt  # Category name
        |────v3det_2023_v1_train.json               # Train set
        |────v3det_2023_v1_train_ovd_base.json      # Open vocabulary detection train set
        |────v3det_2023_v1_val.json                 # Validation set
        |────v3det_2023_v1_test_image_info.json     # Image information of test set

Annotation Files

Train/Val

The annotation files are provided in dictionary format and contain the keywords "images," "categories," and "annotations."

  • images : store a list containing image information, where each element is a dictionary representing an image.
    file_name            # The relative image path, eg. images/n07745046/21_371_29405651261_633d076053_c.jpg.
    height               # The height of the image
    width                # The width of the image
    id                   # Unique identifier of the image.
  • categories : store a list containing category information, where each element is a dictionary representing a category.
    name                 # English name of the category.
    name_zh              # Chinese name of the category.
    cat_info             # The format for the description information of categories is a list.
    cat_info_gpt         # The format for the description information of categories generated by ChatGPT is a list.
    cat_info_gpt4        # The format for the description information of categories generated by GPT4.
    cat_info_gpt4v       # The format for the description information of categories generated by GPT4-V.
    novel                # For open-vocabulary detection, indicate whether the current category belongs to the 'novel' category.
    id                   # Unique identifier of the category.
    exemplar_image       # Exemplar image of the category.
  • annotations : store a list containing annotation information, where each element is a dictionary representing a bounding box annotation.
    image_id             # The unique identifier of the image where the bounding box is located.
    category_id          # The unique identifier of the category corresponding to the bounding box.
    bbox                 # The coordinates of the bounding box, in the format [x, y, w, h], representing the top-left corner coordinates and the width and height of the box.
    iscrowd              # Whether the bounding box is a crowd box.
    area                 # The area of the bounding box

Category Tree

  • The category tree stores information about dataset category mappings and relationships in dictionary format.
    categoryid2treeid    # Unique identifier of node in the category tree corresponding to the category identifier in dataset
    id2name              # English name corresponding to each node in the category tree
    id2name_zh           # Chinese name corresponding to each node in the category tree
    id2desc              # English description corresponding to each node in the category tree
    id2desc_zh           # Chinese description corresponding to each node in the category tree
    id2synonym_list      # List of synonyms corresponding to each node in the category tree
    id2center_synonym    # Center synonym corresponding to each node in the category tree
    father2child         # All direct child categories corresponding to each node in the category tree
    child2father         # All direct parent categories corresponding to each node in the category tree
    ancestor2descendant  # All descendant nodes corresponding to each node in the category tree
    descendant2ancestor  # All ancestor nodes corresponding to each node in the category tree

Image Download

  • Run the command to crawl the train and val images. By default, the images will be stored in the './V3Det/' directory.
python v3det_image_download.py
  • If you want to change the storage location, you can specify the desired folder by adding the option '--output_folder' when executing the script.
python v3det_image_download.py --output_folder our_folder
  • Run the command to crawl the test images.
python v3det_test_image_download.py [--output_folder our_folder]
  • Run the command to crawl the exemplar images.
python v3det_exemplar_image_download.py [--output_folder our_folder]

Category Tree Visualization

  • Run the command and then select dataset path path/to/V3Det to visualize the category tree.
python v3det_visualize_tree.py

Please refer to the TreeUI Operation Guide for more information.

Evaluation

  • We provide evaluation code here. To evaluate the model, you need

Step 1. Install Requirements

pip install pycocotools, tqdm
pip install openmim
mim install mmengine

Step 2. Format Results

Please format your detection result into COCO JSON format

Step 3. Evaluate

Run the python script:python eval_v3det.py dt_json_path

License:

Citation

@inproceedings{wang2023v3det,
      title = {V3Det: Vast Vocabulary Visual Detection Dataset}, 
      author = {Wang, Jiaqi and Zhang, Pan and Chu, Tao and Cao, Yuhang and Zhou, Yujie and Wu, Tong and Wang, Bin and He, Conghui and Lin, Dahua},
      booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
      month = {October},
      year = {2023}
}

V3Det Dataset's Projects

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.