Coder Social home page Coder Social logo

adalbertocq / neuralnetwork Goto Github PK

View Code? Open in Web Editor NEW
16.0 2.0 6.0 8.2 MB

Neural Network implementation in Numpy and Keras. Batch Normalization, Dropout, L2 Regularization and Optimizers

Python 100.00%
batch-normalization dropout gradient optimization deep-neural-networks mlp

neuralnetwork's Introduction

NeuralNetwork

Neural Network implementation in Numpy & Keras.

The code consists in the following (The idea is that in the future this can be combined, as of now the are used independently):

  • deep_neural_network_base: Class that will build a MLP based on the input activation functions per layer and number of nodes.

  • deep_neural_network_batch_normalization: Extension of deep_neural_network_base class, that implements batch normalization (Needs work, mean and variance for inference is not properly coded).

  • deep_neural_network_dropout: Extension of deep_neural_network_base class that implements dropout for training and inference.

  • deep_neural_network_l2regularization: Extension of deep_neural_network_base class that implements L2 regularization.

  • deep_neural_network_optimized: Extension of deep_neural_network_base class that given the following options for gradient update:

    • Momentum.
    • Nag: pending.
    • Adagrad.
    • RMSprop.
    • Adadelta.
    • Adam.
    • Adamax.
    • Nadam: pending.
  • launch_nn_training: Code to test out the implementation on MNIST data.

  • deep_neural_network_keras: Keras implementation of an MLP for MNIST, used to do a comprehensive analysis on the explanations below.

This is a brief summary of my own understanding for: Regularization, Optimizations, Batch Normalization and Gradient updates.

Comparison between batch/SGD/mini-batch:

Batch:

  • It doesn’t allow on-line training.
  • Slow update.

SGD:

  • Allows on-line training.
  • Updates only on one sample which cause to have high variance in the cost function with the weight updates.
  • Complicates getting to the minima as it overshoots constantly.

Mini-batch:

  • With a proper size reduces the variances on the updates.
  • Allows a faster converge to a minima since updates sooner than batch.
  • A good matrix size cam take advantage of different computer architectures for matrix operations.

Batch Normalization:

  • Limits the amount that the data distribution in the layer can shift affected by weight and bias updates.
  • It weakens the coupling between one layer and the previous one.
  • It has the same effect as normalizing the inputs, it will control the data distribution from getting really high or really low values through the network. It allows a faster convergence to the minima.
  • Includes a regularization effect since in training the normalization is done through the mini-batch, including some noise that translates in the network activation outputs Z[l]. Similar to Dropout in effect.
  • Mathematical demostration: Batch Normalization Gradient

Regularization:

  • Techniques to prevent the neural network to overfit on the training data.

L2:

  • Forces the weights to take lower values and preventing high standouts for the weight values.
  • Penalizes the Cost function with the median quadratic values for the weights.
  • Prevents weights to take large values, this helps to generalize features, not giving to much importance to some compare to others:
    • When the weight is updated, there's a certain weight decay introduced from the gradient of Cost function (Added median quadratic values in cost function).

L1:

  • It penalizes the weight with the absolute value.
  • Same intention as L2, but lower impact.

Dropout:

  • Randomly drops activation outputs along the network, only during training and based on a certain probability.
  • The objective is to balance the weight values preventing them to fit only on the training set samples and not for the general case.

Convergence to minima on cost function optimizations:

  1. Momentum:

    • Uses previous gradients (cost function slopes) to overcome possible local minima valley.
    • Computes a weighted average of previous and current gradient.
  2. Nesterov accelerated gradient:

    • TODO
  3. Adagrad:

    • Normalizes the weight updates; dividing by the the variance of the weight updates.
    • Gradients with larger updates will effectively receive a smaller update.
    • The monotonic learning rate proves to be too aggressive and stops learning sooner.
  4. RMSprop:

    • Adjust the aggressiveness of the monotonically decreasing learning rate in Adagrad.
    • Unlike in Adagrad, the updates do not get monotonically smaller.
  5. Adadelta:

    • Based on Adagrad. Improves its problem with a converging to 0 update over time.
    • Based on two main ideas:
      • Scale learning rate based on historical gradient that only takes into account a given window of updates.
      • Use component that serves as an acceleration term, as in momentum.
  6. Adam:

    • Combines Momentum and RMS prop.
    • Commonly used a default optimization algorithm.
    • It also includes the bias correction for the initial values, that way at the beginning it isn't so off.
  7. Adamax:

    • Similar to Adam.
    • Changes over the learning rate factor. When the update is small, it is ignored --> This makes it more robust to noisy gradients.
  8. Nadam:

    • TODO

Loss function over different optimizations:

Loss function over different optimizations

  • Learning rate = 0.002
  • Epochs = 25
  • Batch size = 128
  • Dropout prob = 0.2
  • Weight Initialization = He over normal distribution.

neuralnetwork's People

Contributors

adalbertocq avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

neuralnetwork's Issues

mnist.loader Error

mnist.loader Error reported. need to add dependent packages? add "mnist" but no module named "mnist.loader", THX!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.