Coder Social home page Coder Social logo

amutong / e2net Goto Github PK

View Code? Open in Web Editor NEW

This project forked from zhiweichen0012/e2net

0.0 0.0 0.0 6.49 MB

Official implementation of "E2Net : Excitative-Expansile Learning for Weakly Supervised Object Localization", ACMMM2021.

Shell 0.29% C++ 3.55% Python 96.06% Makefile 0.09%

e2net's Introduction

E2Net: Excitative-Expansile Learning for Weakly Supervised Object Localization (ACM MM 2021)

Tensorflow implementation of ''E2Net: Excitative-Expansile Learning for Weakly Supervised Object Localization''.

๐Ÿ“‹ Table of content

  1. ๐Ÿ“Ž Paper Link
  2. ๐Ÿ’ก Abstract
  3. ๐Ÿ“– Method
  4. ๐Ÿ“ƒ Requirements
  5. โœ๏ธ Usage
    1. Start
    2. Prepare Datasets
    3. Training&Testing
  6. ๐Ÿ” Citation

๐Ÿ“Ž Paper Link

E2Net: Excitative-Expansile Learning for Weakly Supervised Object Localization (link)

  • Authors: Zhiwei Chen, Liujuan Cao, Yunhang Shen, Feihong Lian, Yongjian Wu, Rongrong Ji
  • Institution: Xiamen University, Xiamen, China. Tencent Youtu Lab, Shanghai, China.

๐Ÿ’ก Abstract

Weakly supervised object localization (WSOL) has gained recent popularity, which seeks to train localizers with only image-level labels. However, due to relying heavily on classification objective for training, prevailing WSOL methods only localize discriminative parts of object, ignoring other useful information, such as the wings of a bird, and suffer from severe rotation variations. Moreover, learning object localization imposes CNNs to attend non-salient regions under weak supervision, which may negatively influence image classification results. To address these challenges, this paper proposes a novel end-to-end Excitation-Expansion network, coined as E2Net, to localize entire objects with only image-level labels, which served as the base of most multimedia tasks. The proposed E2Net consists of two key components: Maxout-Attention Excitation (MAE) and Orientation-Sensitive Expansion (OSE). Firstly, MAE module aims to activate non-discriminative localization features while simultaneously recovering discriminative classification cues. To this end, we couple erasing strategy with maxout learning efficiently to facilitate entire-object localization without hurting classification accuracy. Secondly, to address rotation variations, the proposed OSE module expands less salient object parts along with all possible orientations. Particularly, OSE module dynamically combines selective attention banks from various orientated expansions of receptive-field, which introduces additional multi-parallel localization heads. Extensive experiments on ILSVRC 2012 and CUB-200-2011 demonstrate that the proposed E2Net outperforms the previous state-of-the-art WSOL methods and also significantly improves classification performance.

๐Ÿ“– Method


The architecture of our proposed network. There are two main components: Maxout-Attention Excitation (MAE) and Orientation-Sensitive Expansion (OSE). MAE is applied to intermediate feature maps of the backbone in a sequential way. The output maps of multi-parallel localization heads in OSE are fused during the test phase. Note that GAP refers to global average pooling.

๐Ÿ“ƒ Requirements

  • Python 3.3+
  • Tensorflow (โ‰ฅ 1.12, < 2)

โœ๏ธ Usage

Start

git clone https://github.com/zhiweichen0012/E2Net.git
cd E2Net

Download Datasets

Run the following command to download original CUB dataset and extract the image files on root directory.

./dataset/prepare_cub.sh

The structure of image files looks like

dataset
โ””โ”€โ”€ CUB
    โ””โ”€โ”€ 001.Black_footed_Albatross
        โ”œโ”€โ”€ Black_Footed_Albatross_0001_796111.jpg
        โ”œโ”€โ”€ Black_Footed_Albatross_0002_55.jpg
        โ””โ”€โ”€ ...
    โ””โ”€โ”€ 002.Laysan_Albatross
    โ””โ”€โ”€ ...

Corresponding annotation files can be found in here.

To prepare ImageNet data, download ImageNet "train" and "val" splits from here and put the downloaded file on dataset/ILSVRC2012_img_train.tar and dataset/ILSVRC2012_img_val.tar. Then, run the following command on root directory to extract the images.

./dataset/prepare_imagenet.sh

The structure of image files looks like

dataset
โ””โ”€โ”€ ILSVRC
    โ””โ”€โ”€ train
        โ””โ”€โ”€ n01440764
            โ”œโ”€โ”€ n01440764_10026.JPEG
            โ”œโ”€โ”€ n01440764_10027.JPEG
            โ””โ”€โ”€ ...
        โ””โ”€โ”€ n01443537
        โ””โ”€โ”€ ...
    โ””โ”€โ”€ val
        โ”œโ”€โ”€ ILSVRC2012_val_00000001.JPEG
        โ”œโ”€โ”€ ILSVRC2012_val_00000002.JPEG
        โ””โ”€โ”€ ...

Corresponding annotation files can be found in here.

Training & Testing

First download pretrained models from here. Currently, we provide ResNet50-SE and VGG-16 networks. Then, run the following command on root directory.

./run_train_vgg16.sh
./run_train_resnet50.sh

๐Ÿ” Citation

@inproceedings{chen2021e2net,
  title={E2Net: Excitative-Expansile Learning for Weakly Supervised Object Localization},
  author={Chen, Zhiwei and Cao, Liujuan and Shen, Yunhang and Lian, Feihong and Wu, Yongjian and Ji, Rongrong},
  booktitle={ACM MM},
  pages={573--581},
  year={2021}
}

e2net's People

Contributors

zhiweichen0012 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.