Light

prannaykaul / mm-ovod Goto Github PK

Official repo for our ICML 23 paper: "Multi-Modal Classifiers for Open-Vocabulary Object Detection"

Python 100.00%

mm-ovod's Introduction

Multi-Modal Classifiers for Open-Vocabulary Object Detection

Multi-Modal Classifiers for Open Vocabulary Object Detection,
Prannay Kaul, Weidi Xie, Andrew Zisserman
ICML 2023 (arXiv 2201.02605)

Updates

June 2023 Code and checkpoints for LVIS models in the main paper are released. Training code for visual aggregator to follow soon.

Installation

See installation instructions.

Benchmark evaluation and training

Please first prepare datasets, then check our MODEL ZOO to reproduce results in our paper.

License

See Detic. Our code is based on this repository.

Citation

If you find this project useful for your research, please use the following BibTeX entry.

@inproceedings{Kaul2023,
  title={Multi-Modal Classifiers for Open-Vocabulary Object Detection},
  author={Kaul, Prannay and Xie, Weidi and Zisserman, Andrew},
  booktitle={ICML},
  year={2023}
}

mm-ovod's People

Contributors

Stargazers

Watchers

Forkers

goldfishfive honor8880 shaniaos razirp dylansisyphe whuhxb

mm-ovod's Issues

The file “exemplar_dict.json” is missing.

Thank you for your excellent work. It seems that the file 'datasets/metadata/exemplar_dict.json' might have been omitted. Could you please provide me with a copy of this file? I look forward to your prompt reply, Thank you.

Some question about the feature dim,

I got impression with you paper, thanks.
I have a question about feature dim in the proposed architecture.
I see that both 'vision-based classifier' and 'text-based classifier' have dim on 512.
But in many case after RoI-pooling layer(such as FasterRCNN), feature dim shows 2048 or 1024.
Did you change some configuration about it or set some layer?

Thanks,

Request for Code Release

Used AskYourPDF
Sure, I can help you with that. Here is a draft of a polite request for code release:

Dear Authors,

I recently came across your paper titled "Open-Vocabulary Object Detection using Language and Vision" and found it to be extremely insightful. Your work on open-vocabulary object detection (OVOD) that can detect objects beyond the set of categories seen at training is truly innovative. I am particularly impressed by the three contributions you made: text-based classifiers, vision-based classifiers, and multi-modal classifiers.

I am currently working on a project that could greatly benefit from your approach. I believe that your model could significantly enhance our research and potentially lead to further advancements in this field.

I noticed that the code for your paper has not been released on GitHub. I understand that preparing code for release can be a time-consuming process, but I would greatly appreciate it if you could consider making it available. Having access to your code would allow me and others in the community to fully understand your methodology and potentially build upon your work.

Thank you for considering my request. I look forward to the possibility of further exploring your work.

Best regards,
Rotem

Looking forward to the training code!

Great work! Can I ask when the code for training will be released?

When the training code for visual aggregator will be released?

Many thanks to the authors for the excellent work ！I'm also very interested in the training code for visual aggregator, so when will this code be released ?

Imagenet version unavailable.

Thank you for your interesting work! Your model requires the Fall 2011 version of Imagenet, which has long become unavailable for download. I downloaded the winter 21 version but it seems that some images (and classes that your model requires) have been removed. Could you please just upload the small set of image exemplars that has been removed from the new version?

I hope to get a reply soon!

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.