Coder Social home page Coder Social logo

detr-torch's Introduction

detr-torch

Object Detection using Transformers

Usage:

  • git clone https://github.com/gittygupta/detr-torch.git
  • cd detr-torch && mkdir saved_models
  • Download any of the models from drive
  • Model Nomenclature: detr_(Epoch Number).pth
  • Experimental results: detr_4.pth and detr_6.pth work best
  • Save the model to the folder saved_models
  • python inference.py --model detr_{epoch_number}.pth --folder {path/to/images}

Single instance usage:

from config import *
from inference import *
from model import DETR

model_path = 'path/to/model.pth'
model = DETR(num_classes=num_classes,num_queries=num_queries)
model.load_state_dict(torch.load(model_path)) 

image = cv2.imread('path/to/image.jpg')
transformed_image = transform(image)
confidences, bboxes = run_inference_for_single_image(image, model, torch.device('cuda'))
bboxes = scale_bbox(image.shape[1], image.shape[0], bboxes)

output_image = draw(image, confidences, bboxes, 0.5)
cv2.imwrite('path/to/save/image.jpg', output_image)

Comparison:

The current SOTA object detection is done by Google's EfficientDet. Due to hardware constraints, EfficientDet-D1 has been used, which has 6.6M parameters. The Transformer (odd 17M parameters) on the other hand uses ResNet50 as the backbone (odd 23M parameters) with a total of 41M parameters. The results are as follows:

Transformer         EfficientDet

The image on the left is the output of the Transformer and the one on the right is from EfficientDet-D1. We can see that the EfficientDet has an overlap of bounding boxes, whereas the Transformer doesn't, because of how the attention layer works. EfficientDet and other traditional object detection algorithms (MobileNet, YOLO) need Non-Max Suppression (NMS) to remove the overlaps. That is needed because of unstable confidence values, which do not exist in Transformers, hence does not require NMS.

Also, tested on a NVIDIA GTX 1650 Max-Q (4GB) GPU, the EfficientDet-D1 Model runs at 4-5 FPS, whereas DETR runs at 12-15 FPS, even after having much higher number of parameters, all due to the elimination of NMS.

Thus, the transformer architecture is able to provide a boost in speed and also a stability in the confidence of prediction.

More Comparisons:

Transformer         EfficientDet

  • Above, it can easily be seen that the transformer has a higher accuracy, since EfficientDet is not even able to detect the object

Transformer         EfficientDet

Transformer         EfficientDet

  • In all the above comparisons, the confidence level for both the models was set to 0.5

detr-torch's People

Contributors

gittygupta avatar

Stargazers

sleep_in_rain avatar Hamsa datta avatar  avatar

Watchers

James Cloos avatar  avatar

Forkers

madhurima-ghosh

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.