Coder Social home page Coder Social logo

fine-tuning-detr's Introduction

Fine-Tuning DETR on Custom Dataset

Environment

  • System Information
    OS: Ubuntu 18.04
    CPU: Intel Xeon Silver 4110 (32) @ 1.7GHz
    GPU: Matrox Electronics Systems Ltd. Integrated Matrox G200eW3 Graphics Controller
    GPU: NVIDIA Tesla V100 PCIe 16GB
    Memory: 20343MiB / 385656MiB
    GPU Driver: NVIDIA 460.91.03

How to run my code

First, clone the repository locally:

https://github.com/loijilai/Fine-Tuning-DETR.git

Then, install PyTorch and torchvision:

conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

Install pycocotools (for evaluation on COCO) and scipy (for training):

conda install cython scipy
pip install -U 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'

Data Augmentation Pipeline

  • Use this file to filter the train set and get a new file called for_blip2.py to be used later. (All filtered pictures has only one category, one bounding box, except jellyfishs can have at most 6 bounding boxes)
  • Create a separate conda environment called blip2
  • Run image captioning on all images in for_blip2.py by running blip2.py, image captions will be added in a output file called for_gligen.py
  • Create a separate conda environment called gligen
  • Run image generation with three different stategies
    bash ./GLIGEN/run_gen.sh
    
  • After 7 categories * 20 images * 3 strategies = 420 images generated, augment train set annotations with add_train.py
  • Augment train set with move_pictures.py
  • Check the generated image quality using FID scores, manually select 140 real images and resize using this script and run
    python -m pytorch_fid path/to/dataset1 path/to/dataset2
    

Training

  • Refer to this document on how to fine-tune detr on custom dataset. Use this script to get pretrained model.

  • Train without data augmentation

    CUDA_VISIBLE_DEVICES=<YOUR_GPU_NUM> \
    python ./detr/main.py \
    --dataset_file your_dataset \
    --coco_path <PATH_TO_DATASET>
    --epochs 350 \
    --lr=1e-4  \
    --batch_size=2 \
    --num_workers=4 \
    --output_dir=./outputs \
    --resume=<PATH_TO_CHECKPOINT>
    
  • Train with data augmentation (text grounding template2 only)

    bash ./run_text2.sh
    
  • Train with data augmentation (text and image grounding)

    bash ./run_text_image.sh
    

Inference and Evaluation

To get output.json

CUDA_VISIBLE_DEVICES=<YOUR_GPU_NUM> \
python ./detr/infer_json.py \
--data_path <PATH_TO_DATASET> \
--resume <PATH_TO_CHECKPOINT> \
--output_dir <PATH_TO_OUTPUT_DIR>

To get visualization result

CUDA_VISIBLE_DEVICES=<YOUR_GPU_NUM> \
python ./detr/infer_visualize.py \
--data_path <PATH_TO_DATASET> \
--resume <PATH_TO_CHECKPOINT> \
--output_dir <PATH_TO_OUTPUT_DIR>

To get map scores

python evaluate.py ./outputs/json/output.json ./hw1_dataset/annotations/val.json 

Utilities

To get bounding box on images use this script

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.