Coder Social home page Coder Social logo

fabianschilling / awesome-convnets Goto Github PK

View Code? Open in Web Editor NEW
5.0 2.0 2.0 8 KB

A list of papers I used for my thesis about convolutional neural networks and batch normalization

convolutional-neural-networks batch-normalization deep-learning

awesome-convnets's Introduction

Convolutional Neural Networks Reading List

A list of papers I used for my thesis about convolutional neural networks with a focus on batch normalization. The papers are mostly ordered chronologically in terms of their publication date. I divided the papers in sections Early work, Pre-AlexNet, Post-AlexNet, and Batch Normalization.

Early work

Invention of the backpropagation algorithm.

First paper on convolutional neural networks trained with backpropagation.

Overview of training end-to-end systems such as convolutional neural networks with gradient-based optimization.

Efficient Backprop (LeCun et al 1998)

Gives many practical recommendations for traning multi-layer (convolutional) neural networks.

  • Motivates stochastic gradient descent with mini-batches
  • Shows benefits of mean subtraction, normalization, and decorrelation
  • Shows drawbacks of sigmoid activation function and motivates hyperbolic tangent (tanh)
  • Proposes weight initialization scheme (LeCun initialization)
  • Motivates use of adaptive optimization techniques and momentum

Pre-AlexNet

Introduces unsupervised pre-training and shows significant convergence improvements and generalization performance.

Shows why training deep neural networks in deep networks is difficult and gives pointers for improvements.

  • Gradient propagation study with sigmoid, tanh, and softsign
  • New initialization scheme for these activations (Xavier initialization)
  • Motivates the cross entropy loss function instead of mean squared error (MSE)

Shows the advantages of rectified activation functions (ReLU) for convergence speed.

Introduces adagrad, an adaptive optimization technique.

Practical recommendations for setting hyperparameters such as the learning rate, learning rate decay, batch size, momentum, weight decay, and nonlinearity.

Post-AlexNet

Breakthrough paper that popularized convolutional neural networks (namely AlexNet) and made the following contributions.

  • The use of local response normalization
  • Extensive use of regularizers such as data augmentation and dropout

Describes dropout in detail.

Introduces adadelta, an improved version of the adagrad adaptive optimization technique.

Maxout Networks (Goodfellow et al 2013)

Introduces the maxout neuron, a companion to dropout, that is able to approximate activation functions such as ReLU and the absolute value.

Theoretical analysis of the dynamics in deep neural networks and proposal of the orthogonal initialization scheme.

Shows why careful weight initialization and (Nesterov) momentum accelerated SGD are cruciual for training deep neural networks.

Introduces dropconnect, a generalization of dropout that drops random weights instead of entire neurons.

Introduces a novel visualization technique for convolutional filters using a method called deconvolution that maps layer activations back to the input pixel space.

Introduces adam and adamax, improved versions of the adadelta adaptive optimization technique.

Going Deeper with Convolutions (Szegedy et al 2014)

Describes the inception architecture (GoogLeNet) that reduces the amount of learable parameters significantly while improving accuracy.

Motivates the use of architectures with smaller convolutional filters such as 1 x 1 and 3 x 3 (VGGNet).

Introduces a novel parametric rectifier (PReLU) and a weight initialization scheme tailored to rectified activations (Kaiming initialization).

Describes a network architecture with residual connections (ResNet) that enable deeper architectures and are easier to optimize.

Batch Normalization

Introduces batch normalization, a method to accelerate deep network training by reducing the internal covariate shift. The authors claim batch normalization has the following properties.

  • Enables higher learning rates and faster learning rate decay without the risk of divergence
  • Regularizes the model by stabilizing the parameter growth
  • Reduce the need for dropout, weight regularization, and local response normalization

awesome-convnets's People

Contributors

fabianschilling avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.