Coder Social home page Coder Social logo

elastic's Introduction

ELASTIC

This repo contains the original PyTorch implementation of Elastic introduced in the following paper

ELASTIC: Improving CNNs with Dynamic Scaling Policies (CVPR 2019, Oral)

Huiyu Wang, Aniruddha Kembhavi, Ali Farhadi, Alan Yuille, and Mohammad Rastegari

It is compatible with PyTorch 1.0-stable, PyTorch 1.0-preview and PyTorch 0.4.1. All released models are exactly the models evaluated in the paper.

Contents

ImageNet Classification

We prepare our data following https://github.com/pytorch/examples/tree/master/imagenet

Pretrained models available at

for a in resnext50 resnext50_elastic resnext101 resnext101_elastic dla60x dla60x_elastic dla102x se_resnext50_elastic densenet201 densenet201_elastic; do
   wget http://ai2-vision.s3.amazonaws.com/elastic/imagenet_models/"$a".pth.tar
done

Testing

python classify.py /path/to/imagenet/ --evaluate --resume /path/to/model.pth.tar

Training

python classify.py /path/to/imagenet/

Multi-processing distributed training in Docker (recommended):

We train all the models in docker containers: https://docs.nvidia.com/deeplearning/dgx/pytorch-release-notes/rel_18.07.html

You may need to follow instructions in the link above to install docker and nvidia-docker if you haven't done so.

After pulling the docker image, we run a docker container:

nvidia-docker run -it -e NVIDIA_VISIBLE_DEVICES=0,1 --ipc=host --rm -v /path/to/code:/path/to/code -v /path/to/imagenet:/path/to/imagenet nvcr.io/nvidia/pytorch:18.07-py3

Then run this training script inside the docker container.

python -m apex.parallel.multiproc docker_classify.py /path/to/imagenet

MSCOCO Multi-label Classification

We extract data into this structure and use python cocoapi to load data: https://github.com/cocodataset/cocoapi

/path/to/mscoco/annotations/instances_train2014.json
/path/to/mscoco/annotations/instances_val2014.json
/path/to/mscoco/train2014
/path/to/mscoco/val2014

Pretrained models available at

for a in resnext50 resnext50_elastic resnext101 resnext101_elastic dla60x dla60x_elastic densenet201 densenet201_elastic; do
   wget http://ai2-vision.s3.amazonaws.com/elastic/coco_models/coco_"$a".pth.tar
done

Testing

python multilabel_classify.py /path/to/mscoco --resume /path/to/model.pth.tar --evaluate

Finetuning or resume training

python multilabel_classify.py /path/to/mscoco --resume /path/to/model.pth.tar

PASCAL VOC Semantic Segmentation

We prepare PASCAL VOC data following https://github.com/chenxi116/DeepLabv3.pytorch

Pretrained models available at

for a in resnext50 resnext50_elastic resnext101 resnext101_elastic dla60x dla60x_elastic; do
   wget http://ai2-vision.s3.amazonaws.com/elastic/pascal_models/deeplab_"$a"_pascal_v3_original_epoch50.pth
done

Testing

Models should be put at data/deeplab_*.pth

CUDA_VISIBLE_DEVICES=0 python segment.py --exp original

Finetuning or resume training

All PASCAL VOC semantic segmentation models are trained on one GPU.

CUDA_VISIBLE_DEVICES=0 python segment.py --exp my_exp --train --resume /path/to/model.pth.tar

Note

Distributed training maintains batchnorm statistics on each GPU/worker/process without synchronization, which leads to different performances on different GPUs. At the end of each epoch, our distributed script reports averaged performance (top-1, top-5) by evaluating the whole validation set on all GPUs, and saves the model on the first GPU (throws away models on other GPUs). As a result, evaluating the saved model after training leads to slightly (<0.1%) different (could be either better or worse) numbers. In the paper, we reported the average performances for all models. Averaging batchnorm statistics before evaluation may lead to marginally better numbers.

Citation

Please consider citing this paper if you find this project useful in your research.

@article{wang2019elastic,
  title={ELASTIC: Improving CNNs with Dynamic Scaling Policies},
  author={Huiyu Wang, Aniruddha Kembhavi, Ali Farhadi, Alan Yuille, Mohammad Rastegari},
  journal={The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2019}
}

Credits

elastic's People

Contributors

anikem avatar csrhddlam avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

elastic's Issues

关于学习率调整函数的问题--About the adjustment function of learning rate

论文里面说,学习率在第24个epoch和第30个epoch分别除以10,如下:
The paper says that the learning rate is divided by 10 in the 24th epoch and the 30th epoch, as follows:
论文1
但是在你的代码里面,epoch >= 24和epoch >= 30,学习率分别乘0.1,这样导致epoch >= 24后,所有epoch全部乘0.1,这样学习率将会很低,我个人认为,这是一个代码的错误,应该改为epoch ==24和epoch == 30
But in your code, when epoch > = 24 and epoch > = 30, the learning rate is multiplied by 0.1, which results in that when epoch > = 24, all epochs are multiplied by 0.1, so the learning rate will be very low. I personally think this is a code error, and should be changed to epoch = = 24 and epoch = = 30.

if epoch >= 24:

if epoch >= 30:

学习率

Explanation needed for improvement trend of small > medium > large objects.

Hello @csrhddlam,

Thanks for the code and paper. The idea of using up-sampling and down-sampling for scale is simple, elegant and very interesting. In scale challenging image subsection and in table 4 of paper, we can see improvement is more for small size objects compared to large objects. The explanation given is not clear to me. Large objects captured by low-resolution path is fine, it makes sense from receptive filed size, but then how are you concluding that high-resolution branches don't waste capacity for large objects? (they aren't mutually exclusive right). Elastic block merges various scales and feeds scale-invariant information into the next block is also fine but again after that how are you saying it allows more capacity for small objects at high resolution? Please clarify them.

Thanks and Regards

How to calculate CP/CR/CF1/OP/OR/OF1 of top-3

top3
If I want to calculate CP/CR/CF1/OP/OR/OF1 of top-3, is it correct for me to add the following code in the original program?

`

    no_examples = target.shape[0] # 获得样本数 ——> batch_size的大小
    output=torch.sigmoid(output)  #  sigmoid激活
    output=output.cpu().detach().numpy()
    top3 = np.zeros_like(output) 
    for ind_example in range(no_examples):
        top_pred_inds = np.argsort(output[ind_example])[::-1] # 概率排序
        for k in range(3):
            top3[ind_example, top_pred_inds[k]] = 1 # 概率前3的置为1, 其余为0
    pred=torch.from_numpy(top3).long().cuda() #  numpy——>tensor 与下文保持一致
    print("pred: ", pred)
    print("target: ", target)

`

Problem of the released code and publicaiton

Hi, thank you for your code and idea. And I have a question, As figured in Figure.1 when "pop bottle " goes through the ResNext-Elastic, each block adopts the different resolutions (from X to S),
as you described in other issues, the select path is the high-contributed/high-activated path. However, there are no operations can recover the spatial resolution, I can not understand why the "M" can convert to "L" when in the third stage of the network?

some questions about MSCOCO Multi-label Classification

Hi, Thank you for sharing the good idea and code.
But I have some questions about MSCOCO Multi-label Classification

  1. What do these two files coco_resnext50.pth.tar and coco_resnext50_elastic.pth.tar mean respectively?

  2. Are these two files the result of ImageNet pre-training or running on MSCOO?

  3. If I want to do the ResNeXt50 + Elastic experiment from scratch, which file should I use?

Instance specific?

Hi,

Thanks for sharing the good idea and code.

I do not understand how the network structure is instance specific or learned from data.
What I see is different data/image has different activation distribution.

For example, DenseNet block with Elastic as following. It seems a fixed architecture and data-independent.
image

From the code, it also seems a fixed architecture.

class _DenseLayerElastic(nn.Module):

Would you please provide some explanation?

Thanks in advance!

Best
Yukang

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.