Coder Social home page Coder Social logo

caidhome / gl-rg Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ylqi/gl-rg

0.0 0.0 0.0 432.23 MB

The code of IJCAI22 paper "GL-RG: Global-Local Representation Granularity for Video Captioning".

License: MIT License

Shell 17.54% Python 82.46%

gl-rg's Introduction

GL-RG: Global-Local Representation Granularity for Video Captioning

PyTorch 1.6.0 License MIT docs issues Github stars

framework.png

The code of IJCAI22 paper "GL-RG: Global-Local Representation Granularity for Video Captioning".

GL-RG exploit extensive vision representations from different video ranges to improve linguistic expression. We devise a novel global-local encoder to produce rich semantic vocabulary. With our incremental training strategy, GL-RG successfully leverages the global-local vision representation to achieve fine-grained captioning on video contents.

Dependencies

This repo was tested with Python 2.7, PyTorch 1.0.1 (or 0.2.0), cuDNN 10.0 (or 6.0), with CUDA 8.0. But it should be runnable with more recent PyTorch>=1.0 (or >=0.2, <=1.0) versions.

You can use anaconda or miniconda to install the dependencies:

conda create -n GL-RG-pytorch python=2.7 pytorch=1.0 scikit-image h5py requests
conda activate GL-RG-pytorch

or you can install the dependencies following this script:

conda env create -f environment.yaml
conda activate GL-RG-pytorch

Installation

First clone the this repository to any location using --recursive:

git clone --recursive https://github.com/ylqi/GL-RG.git

Check out the coco-caption/, cider/, data/ and model/ projects into your working directory. If not, please find detailed steps INSTALL.md for installation and dataset preparation.

Then, please run following script to download Stanford CoreNLP 3.6.0 models into coco-caption/:

cd coco-caption
./get_stanford_models.sh

Datasets

Model Zoo

Models Dataset Exp. B@4 M R C Links
GL-RG MSR-VTT XE 45.5 30.1 62.6 51.2 GL-RG_XE_msrvtt
GL-RG MSR-VTT DXE 46.9 30.4 63.9 55.0 GL-RG_DXE_msrvtt
GL-RG + IT MSR-VTT DR 46.9 31.2 65.7 60.6 GL-RG_DR_msrvtt
GL-RG MSVD XE 55.5 37.8 74.7 94.3 GL-RG_XE_msvd
GL-RG MSVD DXE 57.7 38.6 74.9 95.9 GL-RG_DXE_msvd
GL-RG + IT MSVD DR 60.5 38.9 76.4 101.0 GL-RG_DR_msvd

Test

Check out the trained model weights under the model/ directory (following Installation) and run:

./test.sh

Note: Please modify MODEL_NAME, EXP_NAME and DATASET in test.sh if experiment setting changes. For more details please refer to TEST.md.

Train

For Seeding Phase (e.g., using XE):

./train.sh 1  # | 0 - using XE | 1 - using DXE |

For **Boosting Phase **(e.g., using DR with b1):

./train.sh 3  # | 2 - with SCST baseline | 3 - with b1 baseline | 4 - with b2 baseline |

Note: For higher performance, please increase the batch size using --batch_size in train.sh. For more variants, please set --start_from in train.sh to determine the Incremental Training entrance model, set --use_long_range, --use_short_range and --use_local to enable different global-local features:

  • --use_long_range: enable long-range features.
  • --use_short_range: enable short-range features.
  • --use_local: enable local-keyframe features.

Modify the DATASET (choices: 'msrvtt', 'msvd') in train.sh when switch to MSR-VTT or MSVD benchmark.

License

GL-RG is released under the MIT license.

Acknowledgements

We are truly thankful of the following prior efforts in terms of knowledge contributions and open-source repos.

Citation

If you find our work useful in your research, please consider citing:

@InProceedings{yan2018GL-RG,
    title={GL-RG: Global-Local Representation Granularity for Video Captioning},
    author={Liqi Yan, Qifan Wang, Yiming Cui, Fuli Feng, Xiaojun Quan, Xiangyu Zhang and Dongfang Liu},
    booktitle={IJCAI},
    year={2022}
}

gl-rg's People

Contributors

ylqi avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.