Coder Social home page Coder Social logo

mm-ovod's Introduction

Multi-Modal Classifiers for Open-Vocabulary Object Detection

Multi-Modal Classifiers for Open Vocabulary Object Detection,
Prannay Kaul, Weidi Xie, Andrew Zisserman
ICML 2023 (arXiv 2201.02605)

Updates

  • June 2023 Code and checkpoints for LVIS models in the main paper are released. Training code for visual aggregator to follow soon.

Installation

See installation instructions.

Benchmark evaluation and training

Please first prepare datasets, then check our MODEL ZOO to reproduce results in our paper.

License

See Detic. Our code is based on this repository.

Citation

If you find this project useful for your research, please use the following BibTeX entry.

@inproceedings{Kaul2023,
  title={Multi-Modal Classifiers for Open-Vocabulary Object Detection},
  author={Kaul, Prannay and Xie, Weidi and Zisserman, Andrew},
  booktitle={ICML},
  year={2023}
}

mm-ovod's People

Contributors

prannaykaul avatar

Stargazers

Hertz avatar  avatar wuyujack (Mingfu Liang) avatar Henry  avatar Nazia Shehnaz Joynab avatar Xiaobing Han avatar Kingsley avatar Yang Liu avatar  avatar Guangneng Hu avatar knifofia avatar zzp avatar fun_dl avatar  avatar Evan.Ren avatar ZhengYH avatar  avatar  avatar 陶光品 avatar  avatar Xin Cai avatar  avatar  avatar seanZhuh avatar  avatar Hannah avatar  avatar Altaïr avatar Vishal Thengane avatar Akshita Gupta avatar + avatar  avatar  avatar Tony Davis avatar Jeff Carpenter avatar Yichen Yuan avatar 荣耀 avatar Anish Madan avatar vegetable avatar Xu CAO avatar  avatar  avatar jaesunghuh avatar  avatar ZehongMa avatar Evangelos Kazakos avatar  avatar Xin Zhao avatar  avatar GGAVY avatar harrylin avatar Chen Sun avatar Keiichi Kuroyanagi avatar Haian Huang(深度眸) avatar Zeyi Sun avatar yahooo avatar  avatar Ryan Walden avatar Aditya Arun avatar Emanuel Sanchez Aimar avatar  avatar  avatar persistence avatar 爱可可-爱生活 avatar Harry Guo avatar Zilong Zhang avatar  avatar  avatar rotem israeli avatar Wizyoung avatar Pengkun Li avatar takuoko avatar Vincent Xiaopeng Lu avatar  avatar Brad Dwyer avatar Cheng Shi avatar

Watchers

Harry Guo avatar Kostas Georgiou avatar  avatar  avatar Perrin avatar  avatar  avatar

mm-ovod's Issues

The file “exemplar_dict.json” is missing.

Thank you for your excellent work. It seems that the file 'datasets/metadata/exemplar_dict.json' might have been omitted. Could you please provide me with a copy of this file? I look forward to your prompt reply, Thank you.

Some question about the feature dim,

I got impression with you paper, thanks.
I have a question about feature dim in the proposed architecture.
I see that both 'vision-based classifier' and 'text-based classifier' have dim on 512.
But in many case after RoI-pooling layer(such as FasterRCNN), feature dim shows 2048 or 1024.
Did you change some configuration about it or set some layer?

Thanks,

Request for Code Release

Used AskYourPDF
Sure, I can help you with that. Here is a draft of a polite request for code release:

Dear Authors,

I recently came across your paper titled "Open-Vocabulary Object Detection using Language and Vision" and found it to be extremely insightful. Your work on open-vocabulary object detection (OVOD) that can detect objects beyond the set of categories seen at training is truly innovative. I am particularly impressed by the three contributions you made: text-based classifiers, vision-based classifiers, and multi-modal classifiers.

I am currently working on a project that could greatly benefit from your approach. I believe that your model could significantly enhance our research and potentially lead to further advancements in this field.

I noticed that the code for your paper has not been released on GitHub. I understand that preparing code for release can be a time-consuming process, but I would greatly appreciate it if you could consider making it available. Having access to your code would allow me and others in the community to fully understand your methodology and potentially build upon your work.

Thank you for considering my request. I look forward to the possibility of further exploring your work.

Best regards,
Rotem

Imagenet version unavailable.

Thank you for your interesting work! Your model requires the Fall 2011 version of Imagenet, which has long become unavailable for download. I downloaded the winter 21 version but it seems that some images (and classes that your model requires) have been removed. Could you please just upload the small set of image exemplars that has been removed from the new version?

I hope to get a reply soon!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.