Coder Social home page Coder Social logo

sman's Introduction

Stacked Multimodal Attention Network for Context-Aware Video Captioning

This repository includes the implementation for Stacked Multimodal Attention Network (SMAN) for Context-Aware Video Captioning.

Requierments

  • Python 3.6.0
  • PyTorch 1.1.0
  • Java 1.8.0
  • h5py 2.7.1

Data Preparation

The processed data have been provided here. The processsed feature data for video have been provided here.

Download all required data, and the file directories should be like:

|-- coco-caption
|-- cider
|-- log
|   |-- allgru_rl
|   |   |-- infos_-best.pkl
|   |   |-- optimizer-best.pth
|   |   |-- model-best.pth
|-- data
|   |-- msrvtttalk.json
|   |-- msrvtttalk_label.h5
|   |-- msrvtt-train-words.p
|   |-- msrvtt-train-idxs.p
|   |-- dataset.json
|   |-- msrvtt_c3d_features.h5
|   |-- msrvtt_appearance_features.h5
|   |-- msrvtt_box_features.h5

Training

Training with Cross-Entropy Loss
python train.py --learning_rate 2e-4 --learning_rate_decay_start 0 --learning_rate_decay_every 2 --learning_rate_decay_rate 0.8 --max_epochs 10 --batch_size 5 --save_checkpoint_every 300 --checkpoint_path log/model --self_critical_after -1 --input_json data/msrvtttalk.json --input_label_h5 data/msrvtttalk_label.h5 --input_c3d_feature data/msrvtt_c3d_features.h5 --input_app_feature data/msrvtt_appearance_features.h5 --input_box_feature data/msrvtt_box_features.h5 --cached_tokens data/msrvtt-train-idxs 
Training with Self-Critical Loss
python train.py --learning_rate 2e-5 --learning_rate_decay_start -1 --max_epochs 40 --batch_size 5 --save_checkpoint_every 300 --checkpoint_path log --self_critical_after 0 --input_json data/msrvtttalk.json --input_label_h5 data/msrvtttalk_label.h5 --input_c3d_feature data/msrvtt_c3d_features.h5 --input_app_feature data/msrvtt_appearance_features.h5 --input_box_feature data/msrvtt_box_features.h5 --cached_tokens data/msrvtt-train-idxs --start_from log/model --reduce_on_plateau

Evaluation

python eval.py --model log/allgru_rl/model-best.pth --infos_path log/allgru_rl/infos_-best.pkl --input_json data/msrvtttalk.json --input_label_h5 data/msrvtttalk_label.h5 --input_c3d_feature data/msrvtt_c3d_features.h5 --input_app_feature data/msrvtt_appearance_features.h5 --input_box_feature data/msrvtt_box_features.h5 --cached_tokens data/msrvtt-train-idxs
BLEU-1 BLEU-2 BLEU-3 BLEU-4 CIDEr METEOR ROUGE
81.3 67.2 52.6 39.7 53.0 28.0 61.4

Acknowledgements

The implementation is based on self-critical.pytorch.

sman's People

Contributors

zhengyi123456 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.