Coder Social home page Coder Social logo

enorm's Introduction

Equi-normalization of Neural Networks

ENorm is a fast and iterative method for minimizing the L2 norm of the weights of a given neural network that provably converges to a unique solution. Interleaving ENorm with SGD during training improves the test accuracy.

This repository contains the implementation of ENorm as detailed in our paper Equi-normalization of Neural Networks (ICLR 2019). The library is easy to use and requires adding only two lines of code to your usual training loop.

Matrices $W_k$ and $W_{k+1}$ are updated by multiplying the columns of the first matrix with rescaling coefficients. The rows of the second matrix are inversely rescaled to ensure that the product of the two matrices is unchanged. The rescaling coefficients are strictly positive to ensure functional equivalence when the matrices are interleaved with \mbox{ReLUs}. This rescaling is applied iteratively to each pair of adjacent matrices

Dependencies

ENorm works with Python 3.6 and newest. To run the code, you must have the following packages installed:

These dependencies can be installed with: pip install -r requirements.txt

Installation

Install ENorm by cloning the repository, cd into it and running:

python setup.py install

How to use ENorm

The training procedure consists in performing one ENorm cycle (iterating ENorm on the entire network once) after each SGD step as detailed below.

from enorm import ENorm


# defining model and optimizer
model = ...
criterion = ...
optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9 weight_decay=1e-4)

# instantiating ENorm (here with asymmetric scaling coefficient c=1)
enorm = ENorm(model.named_parameters(), optimizer, c=1)

# training loop
for batch, target in train_loader:
  # forward pass
  output = model(input)
  loss = criterion(output, target)

  # backward pass
  optimizer.zero_grad()
  loss.backward()

  # SGD and ENorm steps
  optimizer.step()
  enorm.step()

Some precisions about the usage of ENorm (for details, see our paper):

  • ENorm is compatible with feedforward fully connected and/or convolutional architectures, including ReLU and pooling layers. In particular, our current implementation supports the widely used ResNet architecture.
  • The asymmetric scaling coefficient c penalizes the layers exponentially according to their depth. Usually, values of c equal to or slightly above 1 give the best results.
  • Currently, only the SGD optimizer is supported due to the momentum buffer update. Indeed, with using momentum, ENorm performs a jump in the parameter space, thus we update the momentum accordingly.
  • Optionally, one can perform more ENorm cycles or apply ENorm every k SGD iterations (k > 1). In our experience, performing one ENorm cycle after each SGD iteration generally works best.
  • In practice, we have found the training to be more stable when not balancing the biases.
  • When applying ENorm to a network with BatchNorm layers, we simply ignore the BatchNorm weights and perform the ENorm cycle on the network as usual.
  • Use the documentation of the file enorm.py file to adapt ENorm to your favourite network architecture.

Results

You can reproduce the results of our paper by running the following commands:

# fully connected network on CIFAR10 with 15 intermediary layers
python main.py --dataset cifar10 --model-type linear --n-layers 15 --enorm 0 --epochs 60 --lr 0.1 --weight-decay 1e-3 --momentum 0 --n-iter 5
python main.py --dataset cifar10 --model-type linear --n-layers 15 --enorm 1.2 --epochs 60 --lr 0.1 --weight-decay 1e-3 --momentum 0 --n-iter 5

# fully convolutional network on CIFAR10
python main.py --dataset cifar10 --model-type conv --enorm 0 --epochs 128  --lr 0.05 --weight-decay 1e-3 --momentum 0.9 --n-iter 5
python main.py --dataset cifar10 --model-type conv --enorm 1.1 --epochs 128 --lr 0.05 --weight-decay 1e-3 --momentum 0.9 --n-iter 5

License

ENorm is released under Creative Commons Attribution 4.0 International (CC BY 4.0) license, as found in the LICENSE file.

Bibliography

Please consider citing [1] if you found the resources in this repository useful.

[1] P. Stock, B. Graham, R. Gribonval and H. Jégou. Equi-normalization of Neural Networks.

@inproceedings{stock2018enorm,
  title = {Equi-normalization of Neural Networks},
  author = {Stock, Pierre and Graham, Benjamin and Gribonval, Rémi and Jégou, Hervé},
  booktitle = {International Conference on Learning Representations (ICLR)},
  year = {2019}
}

enorm's People

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.