Coder Social home page Coder Social logo

ml-lab / query-objseg Goto Github PK

View Code? Open in Web Editor NEW

This project forked from bcv-uniandes/dms

0.0 3.0 0.0 1.88 MB

Dynamic Multimodal Instance Segmentation guided by natural language queries, ECCV 2018

Home Page: https://biomedicalcomputervision.uniandes.edu.co

License: MIT License

Python 95.67% Shell 4.33%

query-objseg's Introduction

dmn-pytorch

License codecov Codacy Badge

PyTorch code for Dynamic Multimodal Instance Segmentation guided by natural language queries, ECCV 2018

horses
A dark horse between three lighter horses

Dependencies

To execute this, you must have Python 3.6.*, PyTorch, Visdom, cupy, Cython, Numpy and Matplotlib installed. To accomplish this, we recommend installing the Anaconda Python distribution and use conda to install the dependencies, as it follows:

conda install matplotlib numpy cython
conda install pytorch torchvision cuda90 -c pytorch
conda install aria2 -c bioconda
pip install -U visdom opencv-python cupy-cuda90 pynvrtc tqdm

You will also require the ReferIt loader library, which you can clone from: https://github.com/andfoy/refer. To install it, you can use pip as it follows:

pip install git+https://github.com/andfoy/refer.git

Finally, you will need to install the Simple Recurrent Unit (SRU):

pip install -U git+https://github.com/taolei87/sru.git@43c85ed --no-deps

Conda packages will be created on future releases.

Dataset download

Additionally, you must download the ReferIt, UNC, UNC+ and GRef datasets. To accomplish this, we provide the download_dataset.sh bash script that will take care of the required downloads.

bash download_data --path $PATH_TO_STORE_THE_DATASETS

Datasets

Dataset Name Original Name Splits
referit RefCLEF train, val, trainval, test
unc RefCOCO train, val, testA, testB
unc+ RefCOCO+ train, val, testA, testB
gref RefCOCOg train, val

Training

To train the model, you will need to provide the path to the directory that contains the aforementioned datasets, as well to other parameters required to train the model. To train the model with the low-resolution setup described on the original paper, please execute:

python -u -m dmn_pytorch.train --data $PATH_TO_STORE_THE_DATASETS --dataset $DATASET --val $SPLIT_TO_EVALUATE --backend dpn92 --num-filters 10 --lang-layers 3 --mix-we --save-folder $PATH_TO_STORE WEIGHT_SNAPSHOTS --snapshot $PATH_TO_THE_SNAPSHOT_FILE --accum-iters 1

To train the model on high-resolution, you just need to add the --high-res and --upsamp-amplification 32 flags to the previous command. Note: The snapshot file must correspond to the low resolution weights.

To inspect all the available parameters and their description, please execute python -m dmn_pytorch.train --help. Please refer to the datasets table displayed above to get more information about the dataset names and their respective available splits.

Evaluation

To evaluate the model, you can define the --eval-first and --epochs 0 parameter flags to dmn_pytorch.train as it follows:

python -u -m dmn_pytorch.train --data $PATH_TO_STORE_THE_DATASETS --dataset $DATASET --val $SPLIT_TO_EVALUATE --backend dpn92 --num-filters 10 --lang-layers 3 --mix-we --save-folder $PATH_TO_STORE WEIGHT_SNAPSHOTS --snapshot $PATH_TO_THE_SNAPSHOT_FILE --epochs 0 --eval-first

Results Visualization

Additionally, you can visualize the results of the DMN model with a set of pretrained weights on visdom. To do so, you can execute the dmn_pytorch.visdom_display script as it follows:

python -m dmn_pytorch.visdom_display --data $PATH_TO_STORE_THE_DATASETS --dataset $DATASET --split $SPLIT_TO_EVALUATE --backend dpn92 --num-filters 10 --lang-layers 3 --mix-we --num-images $NUMBER_OF_EXAMPLES_TO_DISPLAY --snapshot $PATH_TO_THE_SNAPSHOT_FILE --no-eval --visdom http://$HOST:$PORT --env $NAME_OF_THE_VISDOM_ENV

Performance

The pretrained weights provided below were trained on two phases: during the low-resolution phase, the DMN was trained on UNC during 24 epochs with a constant learning rate, which then were fine-tuned for the remaining datasets during 10 epochs. Finally, the high-resolution phase was done over all the datasets using the weights from the previous phase during a total number of 4 epochs.

Dataset Examples High-Resolution Pretrained Weights Splits Performance (mIoU)
Referit Referit Examples Link val 0.5328
test 0.5281
UNC UNC Examples Link val 0.4978
testA 0.5484
testB 0.4520
UNC+ UNC+ Examples Link val 0.3888
testA 0.4425
testB 0.3249
GRef GRef Examples Link val 0.3764

External Installation

The DMN can be used and imported as a regular Python package on your scripts. To install it, you can use pip:

pip install -U .

Then you can import it as it follows:

from dmn_pytorch import DMN

Contribution Guidelines

We follow PEP8 and PEP257 style guidelines. Feel free to send a PR or create an issue if you have any problem/question.

Citation

@article{margffoy2018dmn,
  title={Dynamic Multimodal Instance Segmentation guided by natural language queries},
  author={{Margffoy-Tuay}, E.~A. and {P{\'e}rez}, J.~C. and {Botero}, E. and
	{Arbel{\'a}ez}, P.},
  journal={arXiv preprint arXiv:1807.02257},
  year={2018}
}

query-objseg's People

Contributors

andfoy avatar juancprzs avatar milongo avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.