Direct-Feedback-Alignment

Understanding the general framework

In dfa-linear-net.ipynb, I show how a neural network without activation function can learn a linear function (multiplication by a matrix) using direct feedback alignment (DFA), as in Nøkland, 2016. There is also some theory about it.

In dfa-mnist.ipynb, I show how a neural network trained with DFA achieves very similar results to one trained with backpropagation. The architecture is very simple: one hidden layer of 800 Tanh units, sigmoid in the last layer and binary crossentropy loss.

Go to the last lines of mlp-torch-results.txt if you want to see the results of the same architecture using Torch code provided by Nøkland.

Stacking neural networks

Do networks with different feedback matrices learn different features at least in the first few steps? Apparently yes. Stacking works training a lot of weak learners on recognizing different features and using their outputs as inputs for a new model, which will learn how to combine these weak learners and give a performance boost.

In Stacking-dfa-nets folder, you have the following files. They must be executed in the following order:

create_dataset.py: preprocess MNIST data loaded from Keras and save them to a Numpy file mnist.npz, ready to be used.
weak-learners.py or diff-weak-learners.py: train as many weak learners as you want (NNs with one hidden layer 800 Tanh units). The difference between the first and the second is that the first trains all of them starting from the same initialization, while the second initializes each one of them in a different state. They generate respectively files called: train_linouts.npz & test_linouts.npz, diff-train_linouts.npz & diff-test_linouts.npz
stacked-model.py or RD-stacked-model.py: train respectively a dense or an RD layer on top of the features extracted by each weak learner. The program takes as input the names of the files generated by the previous steps and the number of weak learners given in the previous step.

Example call to train 50 weak learners:

python weak_learners.py 50

Example call to train a stacked model on top of 50 weak learners:

python weak_learners.py 50 train_linouts.npz test_linouts.npz

RD Layers

Layers with a linear number of parameters vaguely inspired by ACDC. Basically they do these operations:

where D1 and D2 are diagonal matrices and R is a random matrix.

Requirements

numpy
matplotlib
scipy
keras
scikit-learn

iacolippo / direct-feedback-alignment Goto Github PK