Coder Social home page Coder Social logo

parham1998 / enhancing-high-vocabulary-ia-with-a-novel-attention-based-pooling Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 1.0 3.35 MB

Official Pytorch Implementation of: "Enhancing High-Vocabulary Image Annotation with a Novel Attention-Based Pooling"

Python 100.00%
asymmetric-loss convolutional-neural-networks corel-5k esp-game high-vocabulary-image-annotation iapr-tc-12 image-annotation pytorch transformer-decoder transformers

enhancing-high-vocabulary-ia-with-a-novel-attention-based-pooling's Introduction

Enhancing-High-Vocabulary-IA-with-a-Novel-Attention-Based-Pooling

Official Pytorch Implementation of: "Enhancing High-Vocabulary Image Annotation with a Novel Attention-Based Pooling"

Datasets

There are three well-known datasets that are mostly used in AIA tasks. In addition, we have utilized a dataset with a significantly larger number of images and a vocabulary list consisting of 500 words, which has a very high level of complexity. The table below provides details about these datasets. It is also possible to download them by the given links. (After downloading each dataset, replace its 'images' folder with the corresponding 'images' folder in the 'datasets' folder).

Dataset Num of images Num of training images Num of testing images Num of vocabularies Labels per image Image per label
Corel 5k 5,000 4,500 500 260 3.4 58.6
ESP Game 20,770 18,689 2081 268 4.7 362.7
IAPR TC-12 19,627 17,665 1962 291 5.7 347.7
VG-500 92,904 82,904 10,000 500 13.6 2256.6

We employed the SSGRL settings when working with the VG 500 dataset, which involves selecting images from the 500 most common categories and then dividing the data into training and testing subsets. We also attempted to identify the names of labels (vocabulary) for the mentioned dataset. Please let us know if there are any errors.

model

model

Attention Maps

Attention

Train and Evaluation

To train the model in Spyder IDE use the code below:

run main.py --data {select training dataset} --loss-function {select loss function}

Please note that:

  1. You should put Corel-5k, ESP-Game, IAPR-TC-12, or VG-500 in {select training dataset}.

  2. You should put the proposedLoss in {select loss function}.

  3. When using the VG-500 dataset, change the "image-size" to 576, change the "gamma_neg" in proposedLoss to 2, and set batch size to 128.

To evaluate the model in Spyder IDE use the code below:

run main.py --data {select training dataset} --loss-function {select loss function} --evaluate

Results

Proposed method:

data precision recall f1-score N+ mAP
Corel 5k 0.453 0.611 0.520 202 -
IAPR TC-12 0.515 0.584 0.547 287 -
ESP Game 0.442 0.500 0.470 262 -
VG-500 0.409 0.502 0.451 477 42.515

Citation

Contact

I would be happy to answer any questions you may have - Ali Salar ([email protected])

enhancing-high-vocabulary-ia-with-a-novel-attention-based-pooling's People

Contributors

parham1998 avatar

Stargazers

 avatar

Watchers

 avatar

Forkers

mahyamkashani

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.