Coder Social home page Coder Social logo

khurramjaved96 / mrcl Goto Github PK

View Code? Open in Web Editor NEW
192.0 7.0 26.0 1.66 MB

Code for the NeurIPS19 paper "Meta-Learning Representations for Continual Learning"

Home Page: https://arxiv.org/abs/1905.12588

Python 100.00%
metalearning representation reprod machine continual

mrcl's Introduction

(05 July, 2020) Major bug fix and refactoring log:

  1. Fixed a bug that resulted in incorrect meta-gradients.
  2. Refactored the code. It should be easier to understand and modify now.
  3. Significantly improved results on both omniglot and sine benchmark by fixing the bug. By using a linear PLN layer -- as suggested by S. Beaulieu et.al (2020) -- it is possible to get the same results as ANML (S. Beaulieu 2020) without using any neuromodulation layers.
  4. The bug fix also makes the optimization more robust to hyper-parameter changes. The omniglot results hold for a wide range of meta-learning and inner learning rates.
  5. Added new pretrained models in the google drive. Check mrcl_trained_models/Omniglot_updated. There are eight pre-trained models, with different hyper-parameters. You can look at hyper-parameters in the metadata.json file. The old model will no longer work withe new code. If you want to use the old models, checkout an older commit of the repo.

A discussion on the changes: #15

Reference: Beaulieu, Shawn, et al. "Learning to continually learn." ECAI (2020).

OML (Online-aware Meta-learning) ~ NeurIPS19

Paper : https://arxiv.org/abs/1905.12588

Overall system architecture for learning representations

Learning OML Representations

To learn representations for omnigtot run the following command:

python oml_omniglot.py --update_lr 0.03 --meta_lr 1e-4 --name OML_Omniglot/ --tasks 3 --update_step 5 --steps 700000 --rank 0

This will store the learned model at ../results/DDMonthYYYY/Omniglot/0.0001/oml_omniglot)

Evaluating Representations learned by OML

We provide trained models at https://drive.google.com/drive/folders/1vHHT5kxtgx8D4JHYg25iA-C31O5OjAQz?usp=sharing which can be used to evaluate performance on the continual learning benchmarks.

To evaluate performance on test trajectories of omniglot run:

python evaluate_omniglot.py --model-path path_to_model/learner.model --name Omniglot_evaluation/  --schedule 10:50:100:200:600

Exclude the --test argument to get result on training trajectories (Used to measure forgetting).

Results will be stored in a json file in "../results/DDMonthYYYY/Omniglot/eval/Omni_test_traj_0"

Visualizing Representations

To visualize representations for different omniglot models, run

python visualize_representations.py --name OML_rep_study --model ./trained_models/split_omniglot_oml.model

Results

Classification Results

The accuracy curve averaged over 50 runs as we learn more classes sequentially. The error bars represent 95% confidence intervals drawn using 1,000 bootstraps. We report results on both the training trajectory (left) and a held out dataset that has the same classes as the training trajectory (right). alt text Online updates starting from OML are capable of learning 200 classes with little to no forgetting. Other representations, such as pretraining and SR-NN suffer from noticeable forgetting on the other hand. OML also generalizes better than the other methods on the unseen held out set. Note that the Oracle, learned using multiple, IID passes over the trajectory, represents an upper bound on the performance, reflecting the inherent inaccuracy when training on an increasing number of classes.

Regression Results

Mean squared error across all 10 regression tasks. The x-axis in (a) corresponds to seeing all data points of samples for class 1, then class 2 and so on. These learning curves are averaged over 50 runs, with error bars representing 95% confidence interval drawn by 1,000 bootstraps. alt text We can see that the representation trained on iid data---pretraining---is not effective for online updating. Notice that in the final prediction accuracy in (b), pretraining and SR-NN representations have accurate predictions for task 10, but high error for earlier tasks. OML, on the other hand, has a slight skew in error towards later tasks in learning but is largely robust.

References

  1. Meta-learning code has been taken and modified from : https://github.com/dragen1860/MAML-Pytorch
  2. For EWC, MER, and ER-Reservoir experiments, we modify the following implementation to be able to load our models : https://github.com/mattriemer/MER

mrcl's People

Contributors

anon1efergwerfwer avatar dependabot[bot] avatar khurramjaved96 avatar kostis-s-z avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

mrcl's Issues

Question about the output layer for classification

Hi,

I have a question regarding the number of nodes/neurons for the output layer for classification CLP problem. To learn the representation, you can have the access to the whole dataset, and the meta objective is minimized by S_test sampled from \tau. Does it mean that the size of output layer is always fixed regardless how many classes the network has seemed so far?

I see the output size is always 1000 from the following code?

"config": {"out": 1000, "in": 9 * channels}

Thanks.

Trained models

Hello,

I was looking at the trained models you have provided in the google drive directory. I was wondering what is the difference between old and new omniglot models and also what is the difference b/w the models provided in the New_omniglot_models folder, named as 1_1 through 1_8. In conclusion, which one should I use to evaluate the model?

Thanks!🙂

IndexError in Omniglot Experiments

I am getting an IndexError in the meta_learner.py file, MetaLearingClassification class, forward() method, lines 252-276.

When I run the command
python mrcl_classification.py --rln 6 --update_lr 0.03 --name mrcl_omniglot --update_step 20 --steps 15000

I receive the following error message:

IndexError: index 19 is out of bounds for dimension 0 with size 19

I have attempted to fix by:

  1. Initializing meta_losses and accuracy_meta_set as lists of floats rather than ints:
# Original
meta_losses = [0 for _ in range(self.update_step)]
accuracy_meta_set = [0 for _ in range(self.update_step)]

# My Change
meta_losses = [0.0 for _ in range(self.update_step)]
accuracy_meta_set = [0.0 for _ in range(self.update_step)]
  1. Reducing the for loop index by 1
# Original
for k in range(1, self.update_step):

# My Change
for k in range(1, self.update_step-1):

Is this change a correct interpretation of the MRCL method?

Can you please comment on this issue?

Thank you.

Hyperparams for Imagenet

Hi Khurram, thanks for making the repo available to us. I have been trying to run the repo for the Imagenet dataset and I haven't been able to replicate your results. I think it might be an issue with the hyperparameters that I am using. Will it be possible for you to share the correct set of hyperparameters?

Fewer data for compared methods on Omniglot

Hello K. Javed,

I read your paper and was very excited with the idea that you proposed because of its intuitiveness. Thank you for your work and for open-sourcing the code. One thing I found is that it seems that the methods baseline and SR-NN are using fewer samples to pretrain the models than MRCL / OML. Can you confirm if this is indeed the case, if there is a reason for this or if its a bug to fix it? If its a bug, it is fairly straight-forward how to fix it.

Specifically, it seems that the amount of classes selected to pretrain the models are hard-coded. This can be found here:

baseline_pretraining.py -> line 40
iterator = torch.utils.data.DataLoader(utils.remove_classes_omni(dataset, list(range(900))), batch_size=256, shuffle=True, num_workers=1)

baseline_srnn.py -> line 37
iterator = torch.utils.data.DataLoader(utils.remove_classes_omni(dataset, list(range(900))), batch_size=256, shuffle=True, num_workers=1)

whereas in mrcl_classification.py -> line 27
args.classes = list(range(963))

this causes the baseline and the SR-NN to be trained with 900 * 15 = 13.500 samples whereas for MRCL / OML with 963 * 20 = 19.260 samples. You can also confirm this is the case by just adding a counter in the for loop (count_samples += list(img.shape)[0]) of the iterator and see the difference of samples.

Not able to achieve the same accuracy as claimed

Hi,
I evaluated the pre-trained model provided by you. I got the following results.
10 - 0.8467
50 - 0.6677
75 - 0.5864
100 - 0.5251
150 - 0.4444
200 - 0.3642
These numbers are less than that of the numbers for every class size provided in the revised paper. In the older version too the accuracies for 150 and 200 are not matching. You have mentioned that you have not released the code for the latest version. Could you please provide the results for the current implementation?

Layer widths

Hi, thanks for making this code available! In the modelfactory file, the architectures the sinusoidal regression problem have one extra wide layer in them (for the 9-layer feedforward model, layer 6 is 3x as wide). Is this intended? I couldn't find mention of this in the paper, but I might be misunderstanding something in the paper and/or the code.

omniglot meta testing using data similar to Rand_X

Hi guys, thank you for providing this code. That really helps me understand some implementation detail much better. Nevertheless, there are some parts in the code that still puzzles me.
I am working on re-implementing the oml_omniglot experiment. If my understanding is correct, the Traj_X is generated from class(481:963) and Rand_X is from class(0:963). After training, your model should be able to do incremental testing which is classification task over class(481:963) and meta testing which is adapting to new class, i.e. class(0:481), in few shots. The question I have is since during training, you use Rand_X to update the RLN which means it already gain some knowledge about class(0:481). Then is it fair to do the meta testing on those classes?

Only half of classes used as query for outer loop in omniglot

Hi, when creating the complete omniglot iterator, this line

self.task_sampler.add_complete_iteraetor(list(range(0, int(len(self.tasks)/2))))

when creating the query sampler, if I understood corrrectly, you make a subset of the first half of the omniglot classes.
Is this correct? This may cause just a small relaxation on the goal of not forgetting. If I want to train a more strict method should I change this to use all classes?

p.d: thanks for releasing the code and cool work!

Discrepancy between code and paper

Good afternoon!

I have a question regarding your OML implementation. Based on the paper, parameters W are updated in the adaptation (inner) loop (Alg. 2, L. 9), while $\theta$ is updated in the meta-optimization step (Alg. 2, L. 12). However, in your code, both W and $\theta$ are passed to the meta-optimizer:

self.optimizer = optim.Adam(self.net.parameters(), lr=self.meta_lr)

which later updates these parameters in the meta step

mrcl/model/meta_learner.py

Lines 391 to 396 in 2855a6b

# Taking the meta gradient step
self.optimizer.zero_grad()
meta_loss = meta_losses[-1]
meta_loss.backward()
self.optimizer.step()

Hence, to reflect the code, line 12 in Algorithm 2 should be modified as:

Screen Shot 2021-11-18 at 12 08 29 PM

Question:

Should W, in fact, be updated in the meta-optimization loop? Or should we simply use W = W_k to update W based on the adaptation step and only pass $\theta$ to the meta-optimizer?

Thanks!

Code for OML

While reading the revised version of your paper, I've found out that you updated some parts of the algorithm. Do you have any plans to release the new code?

Truncated backprop through time

One other question I have: the paper mentions that the computation graph is never unrolled for more than a fixed number of steps, similar to truncated backprop through time. I don't fully understand how this works, since some of the meta-parameters being optimized are the initial network weights, which by the time you are in the middle of the inner loop are no longer directly involved in the computation graph -- so I'm not clear on how you get a "truncated" gradient of the meta-loss with respect to these weights. Would you be able to clarify how this works, or point to the place in the code that does it?

Bugs in the codes

Hi, thanks for your codes! It really helps me! But when I re-implement the Omniglot training expriment, there seems to be an error as following:
Traceback (most recent call last):
File "oml_omniglot.py", line 100, in
main(args)
File "oml_omniglot.py", line 71, in main
accs, loss = maml(x_spt, y_spt, x_qry, y_qry)
File "/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/mrcl/model/meta_learner.py", line 254, in forward
fast_weights = self.inner_update(x_traj[k], fast_weights, y_traj[k], True)
IndexError: index 19 is out of bounds for dimension 0 with size 19

I cannot figure out the reason. Hope for your response.

Question on meta parameters

Hi @khurramjaved96.

Thanks for publishing your code. I have got a question regarding meta parameters (theta) optimization during your meta-training. I have explored your code and, let's say in oml_omniglot.py, I did not understand how did you optimize those parameters. Because before training, you freeze the weights of early layers which are actually theta parameters, and based on this evidence I believe that meta parameters would not be updated anymore. Am I correct?

Bug fix

Hi! I noticed the code was recently updated to (among other things) fix a bug in the meta-gradient computation. Could you explain what the bug was, and what would be the minimal change needed to fix it, starting from the earlier version of the code? Thanks a bunch!

Two issues in the latest version of the code (args['memory'] and bn_training)

Congratulations on the impressive work!

I am using the the latest version of the code (05 July, 2020) for the Omniglot experiment. Training the model works fine. However, when I try to evaluate the trained model by using the following command:

python evaluate_omniglot.py --model-path path_to_model/learner.model --name Omniglot_evaluation/ --schedule 10:50:100:200:600

I get the following error:

Traceback (most recent call last):
File "evaluate_omniglot.py", line 203, in
main()
File "evaluate_omniglot.py", line 151, in main
for mem_size in [args['memory']]:
KeyError: 'memory'

I fixed this by adding the following line to the file mrcl/configs/classification/class_parser_eval.py
self.add('--memory', type=int, help='I dont know what this argument is used for', default=0)

When I re-run the code with the above hack, it proceeds further but then gives a new error:

File "evaluate_omniglot.py", line 179, in main
logits_q = maml(img, vars=None, bn_training=False, feature=False)
Unknown argument: 'bn_training'

I removed 'bn_training' and 'feature' as arguments to fix the above errors.

Could you please confirm whether these fixes for the two errors above are correct?
Also, could you describe what args['memory'] is supposed to do?

Thanks,
Aditya Rawal

Code for EWC, MER, and ER-Reservoir Experiments

Hello, thank you very much for releasing this code! Would it be possible to make available the code used for the EWC, MER, and ER-Reservoir experiments? It might take a lot of effort to recreate these based on the link in the readme. Thank you!

Unable to run benchmark

Hi, I've set up an environment using the requirements.txt and downloaded your pretrained model but am unable to run the evaluate_omniglot.py script.

Calling python evaluate_omniglot.py --model-path pretrained_models/omniglot_oml.model --name Omniglot_evaluation/ --test
produces:

All args =  {'my_config': 'configs/regression/empty.ini', 'steps': 200000, 'gpus': 1, 'rank': 0, 'tasks': [1], 'meta_lr': [0.0001], 'update_lr': [0.01], 'update_step': [10], 'dataset': 'omniglot', 'no_reset': False, 'seed': [90], 'name': 'Omniglot_evaluation/', 'path': '/Users/sean/data/omniglot', 'schedule': '200', 'scratch': False, 'reset': False, 'test': True, 'iid': False, 'runs': 50, 'model_path': ['pretrained_models/omniglot_oml.model']}
{'seed': 90, 'my_config': 'configs/regression/empty.ini', 'steps': 200000, 'gpus': 1, 'rank': 0, 'tasks': 1, 'meta_lr': 0.0001, 'update_lr': 0.01, 'update_step': 10, 'dataset': 'omniglot', 'no_reset': False, 'name': 'Omniglot_evaluation/', 'path': '/Users/sean/data/omniglot', 'schedule': '200', 'scratch': False, 'reset': False, 'test': True, 'iid': False, 'runs': 50, 'model_path': 'pretrained_models/omniglot_oml.model'}
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
Total classes =  658
[0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5]
Total classes =  658
/Users/sean/repositories/continual-learning/mrcl/.env/lib/python3.6/site-packages/torch/serialization.py:435: SourceChangeWarning: source code of class 'model.learner.Learner' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
  warnings.warn(msg, SourceChangeWarning)
Traceback (most recent call last):
  File "evaluate_omniglot.py", line 186, in <module>
    main()
  File "evaluate_omniglot.py", line 86, in main
    maml = load_model(args, config)
  File "evaluate_omniglot.py", line 30, in load_model
    net.reset_vars()
  File "/Users/sean/repositories/continual-learning/mrcl/model/learner.py", line 77, in reset_vars
    if var.adaptation is True:
AttributeError: 'Parameter' object has no attribute 'adaptation'

Also, the download omniglot functionality didn't seem to work, but I was able to get around that by downloading manually from the same source.

Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.