yaoyao-liu / class-incremental-learning Goto Github PK

View Code? Open in Web Editor NEW

456.0 13.0 70.0 109 KB

PyTorch implementation of AANets (CVPR 2021) and Mnemonics Training (CVPR 2020 Oral)

Home Page: https://class-il.mpi-inf.mpg.de

License: MIT License

Python 99.68% Shell 0.32%

continual-learning class-incremental-learning

class-incremental-learning's Introduction

Hey there, I'm Yaoyao! 👋

Welcome to my GitHub. I am currently a postdoc in Computer Science at Johns Hopkins University. My research lies at the intersection of computer vision and machine learning – with a special focus on building intelligent visual systems that are continual and data-efficient. If you have any questions on my projects, please feel free to send me an email.

class-incremental-learning's People

Contributors

Stargazers

Watchers

Forkers

kaihuatang xinw1012 ye-xiyong qianrusun1015 theshadow29 buzzit-jimmytse lilujunai concrete13377 moonjian sailfish009 yinianfengchen jianjiangkcl chaoso zhen-zohn-wang noodlesz wb11uo klonggan billpsomas youngbigbird1985 hieutrluu vein1990 naderasadi dlwbm123 wanghd-mvp ankitshah009 superduduguan 827108981 rehohoho ywang-art yang-aiteam abcxq eeeeeei josephkj oyt9306 gunnerwang wangqing-hub linhduongtuan blue-blue272 swaggyzhang astoncpou jlcxxzj swagshaw xiaopengguo wangjunxiao purewhites mhd-medfa anqisheng iamwangyabin chenhuayou python-repository-hub cvjie hyunmin-hwang chaoangxiao abdohelmy zhangshixong123 weisili2016 2698022795 yoksen ankurmali wangliang233 jingwenqiu windbro98 iq-scm lulinglanfeng athzpadilla spidartist ceciume asli-karacelik

class-incremental-learning's Issues

torchvision version for CIFAR100

Hi,

Thanks for your innovative method and sharing the code with community.
When running python3 main.py --method=mnemonics --nb_cl=10 I face the following AttributeErorrs
'CIFAR100' object has no attribute 'train_labels' in mnemonics.py. This issue pesisted for train_data, as well and I was able to solve it by replacing self.train with self.trainset.train_data.
However for:

self.trainset.train_labels
self.testset.test_data
self.testset.test_labels

no replacement worked with existing torchvision(0.5.0).
I would appreciate if you specify the torchvision that worked with your model.

Bests,
Nila

How to save the model of each stage

hello, i want to save the model to do the T-SNE, how to save the model of each stage?
i found you had set the ckpt but don't set the torch.save() to save the model.

About training time

Hello! I try to train the model on ImageNet by setting epochs=1,it still takes 11 hours to finish training.I want to know
how long does it take to train AANets on ImageNet (N=5/10/25)?

Where ImageNet images resized such that the smaller dimension is 256?

Hi @yaoyao-liu,

One more clarification request, @yaoyao-liu.

Where ImageNet images resized such that the smaller dimension is 256? It would be great if you could recall this.

Thanks,
Joseph

some bugs

Hi,Professors.There some bugs when I run the code.Can you explain what they mean and how to solve the problems? Thank you very much!
Using gpu: 0
Files already downloaded and verified
Files already downloaded and verified
Order name:./logs/cifar100_nfg50_ncls10_nproto20_mnemonics/seed_1993_cifar100_order_run_0.pkl
Generating orders
pickle into ./logs/cifar100_nfg50_ncls10_nproto20_mnemonics/seed_1993_cifar100_order_run_0.pkl
[68, 56, 78, 8, 23, 84, 90, 65, 74, 76, 40, 89, 3, 92, 55, 9, 26, 80, 43, 38, 58, 70, 77, 1, 85, 19, 17, 50, 28, 53, 13, 81, 45, 82, 6, 59, 83, 16, 15, 44, 91, 41, 72, 60, 79, 52, 20, 10, 31, 54, 37, 95, 14, 71, 96, 98, 97, 2, 64, 66, 42, 22, 35, 86, 24, 34, 87, 21, 99, 0, 88, 27, 18, 94, 11, 12, 47, 25, 30, 46, 62, 69, 36, 61, 7, 63, 75, 5, 32, 4, 51, 48, 73, 93, 39, 67, 29, 49, 57, 33]
Out_features: 50
Batch of classes number 5 arrives
Max and min of train labels: 0, 49
Max and min of valid labels: 0, 49
Checkpoint name: ./logs/cifar100_nfg50_ncls10_nproto20_mnemonics/run_0_iteration_4_model.pth
Incremental train

Epoch: 0, LR: [0.1]
/opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [3,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [4,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [5,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [6,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [7,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [8,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [9,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [12,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [13,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [17,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [19,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [21,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [22,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [23,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [26,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [28,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [29,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [30,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [31,0,0] Assertion t >= 0 && t < n_classes failed.
Traceback (most recent call last):
File "main.py", line 78, in
trainer.train()
File "/data1/22160073/project/incremental learning/class-incremental-learning-main/mnemonics-training/1_train/trainer/mnemonics.py", line 237, in train
tg_model = incremental_train_and_eval(self.args.epochs, tg_model, ref_model, free_model, ref_free_model, tg_optimizer, tg_lr_scheduler, trainloader, testloader, iteration, start_iter, cur_lamda, self.args.dist, self.args.K, self.args.lw_mr)
File "/data1/22160073/project/incremental learning/class-incremental-learning-main/mnemonics-training/1_train/trainer/incremental.py", line 44, in incremental_train_and_eval
loss.backward()
File "/data1/22160073/anaconda3/envs/xxz/lib/python3.7/site-packages/torch/tensor.py", line 221, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/data1/22160073/anaconda3/envs/xxz/lib/python3.7/site-packages/torch/autograd/init.py", line 132, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: CUDA error: device-side assert triggered

About accuracy

In your paper , you said you run the experiment three times,did you use different random seed,or you just use seed 1993 and run 3 times?I found that different random seed may affect the final accuracy.So i want to know your configuration for fair comparison.

Inquiries about the comparison between mnemonics and baseline

Hi, I am reading the code of mnemonics-training, and want to verify the effectiveness of the algorithm, which is related to dataset distillation. However, when I read the code, I noticed that the network implemented here in mnemonics.py is different from that in baseline.py
I think it could be more reasonable to change the "self.network_mtl" to "self.network" when comparing baseline and mnemonics. I want to inquire why mnemonics uses network_mtl, while baseline uses a different model.

Another inquiry is about the "fusion_mode". I have seen some issues discussing about it, but I have not figured out how this option influences the code. What's this option designed for?

Paper uses dynamic budget, but repository recommends fixed?

Hi,

The paper uses a dynamic budget of 20 exemplars per class in training the models associated with every result reported.

However, the bash commands that you provide do not contain the flag --dynamic_budget at all.

Does that mean the code uses a fixed budget? In that case, what does the flag indicate?

Thanks!

Some questions about the code.

Hi, thanks for your work.
There are some codes in mnemonics-training/1_train/trainer/mnemonics.py I am confused about.

what's the different between two branch?

                if iteration > start_iter:
                    tg_model = incremental_train_and_eval(self.args.epochs, tg_model, ref_model, free_model, ref_free_model, tg_optimizer, tg_lr_scheduler, trainloader, testloader, iteration, start_iter, cur_lamda, self.args.dist, self.args.K, self.args.lw_mr)
                else:
                    tg_model = incremental_train_and_eval(self.args.epochs, tg_model, ref_model, free_model, ref_free_model, tg_optimizer, tg_lr_scheduler, trainloader, testloader, iteration, start_iter, cur_lamda, self.args.dist, self.args.K, self.args.lw_mr)

I can't find and differences between two brances if and else.
original code: see here.

What does the function process_fp do? As far as I am concerned, this function is just pass input in to the model, functioning the same with pytorch method forward().
What's the function of variable tg_feature_model? Is it just a alias of tg_model w/o the final fc layer?

It would be so kind of you if you could help me answer these questions.
Thanks again.

Gradient of self.mnemonics is None after backward in mnemonics_train.py

Hi,

I run the experiment with the following command "python main.py --nb_cl_fg=50 --nb_cl=2 --nb_protos 20 --resume --imprint_weights". I find that the gradient of self.mnemonics in mnemonics_train.py is None after q_loss.backward(), which means it doesn't update.
I am wondering whether this is correct?

Thanks.

Implementations of Mnemonics training

Hi @yaoyao-liu, thanks for your interesting work! I am interested in the BOP mnemonics training and
willing to make further extensions on it! I found some problems when I checked the codes of Mnemonics Training. It seems that trainer/mnemonics.py is not the complete version, and it does not follow the training strategies described in the paper.

For example, I couldn't find the binary-level optimization of mnemonics exemplars. There seems to be only 1-level optimization of mnemonics based on NCE classification. But there should be another level before that, which is training for a temporary model on the exemplars (Eq.8) and unroll all training gradients? Also, I couldn't find the process of splitting exemplars and adjusting the mnemonics of old classes. Another issue is that some arguments are defined but are not used, such as self.mnemonics_lrs?

Could you please help to explain my doubts? I am really interested in the implementation of solving BOP and mnemonics training. I apologize if my understanding is wrong. Thank you very much!

Running errors

Hi, Professors. there are bugs when I run the code. Can you explain what they mean and how to solve the problems? Thank you very much!

python main.py --nb_cl_fg=50 --nb_cl=10 --gpu=0 --random_seed=1993 --baseline=lucir --branch_mode=dual --branch_1=ss --branch_2=free --dataset=cifar100

Traceback (most recent call last):
File "main.py", line 436, in
train(config, training_chunked_samples_dir, testing_chunked_samples_file)
File "main.py", line 238, in train
X_valid_ori, Y_valid_ori, X_valid_cumul, Y_valid_cumul, iteration, is_start_iteration, top1_acc_list_ori, top1_acc_list_cumul)
File "/data/gjj/code/hf2vad/cil/AANet/trainer/base_trainer.py", line 792, in compute_acc
order_list, is_start_iteration=is_start_iteration)
File "/data/gjj/code/hf2vad/cil/AANet/utils/incremental/compute_accuracy.py", line 78, in compute_accuracy
sqd_icarl = cdist(class_means[:,:,0].T, outputs_feature.cpu(), 'sqeuclidean')
File "/home/gjj/anaconda3/envs/py36/lib/python3.6/site-packages/scipy/spatial/distance.py", line 2710, in cdist
raise ValueError('XB must be a 2-dimensional array.')

training hyperparamters for imagenet1000?

Hello, I would like to ask what is the hyperparameter setting of training Imagenet1000, is it the same as that of Imagenet100?

about stable block and plastic block

Hi, @yaoyao-liu
why can't I find stable block and plastic block in your modified_resnet file. Did I miss something.
Thanks.

runs the code in mini-imagenet

hi, professor,
I have read your paper, and runs the code in CIFAR100.
but I can't found the dataset for mini-imagenet process, could you tell me how you deal with the mini-imagenet for data process?
I am very appreciate if you can tell me , thanks!

Save model in PODNET repo

Hi @yaoyao-liu
Can you guide me how to save model after each phase ? So that I can use those models and calculate accuracies per class similar to what you have done e.g class 1-10 (%) 11-20 (%) and so on

How are the hyperparameters tuned?

Hi @yaoyao-liu,

Thanks for your wonderful work.

I have a question: How are the hyperparameters for your model set, in general (not in particular for AANets, I mean parameters such as lambda etc that determine the stability-plasiticity tradeoff).

Do you do multiple runs on the entire dataset with a lot of hyperparameter combinations sampled from a coarse grid?

Or do you determine the hyperparameter separaterly for each task?

Thank you!

Question about exemplar selection code

Hi @yaoyao-liu ,

I understand the code for herding selection and how the alpha_dr_herding is populated.

But I don't understand how the class-means are calculated.

class-incremental-learning/adaptive-aggregation-networks/trainer/base_trainer.py

Line 939 in 6111859

class_means[:,current_cl[iter_dico],0] = (np.dot(D,alph)+np.dot(D2,alph))/2

What does the line (np.dot(D,alph)+np.dot(D2,alph))/2 do?
Why can it not be just np.dot(D, alph)?

What does the flag `--fusion_mode` represent?

Hi,
I notice that the target model in the squential class-incremental training is set to network_mtl. Regarding your implementation for constructing an "mtl" network, it seems every weight of convolution kernels is freezed and multiplied by a learnable mask (initialized as 1). My question is why the network architecture is implemented in this way?

Thanks in adavance!
Michael

Code for T-SNE in the mnemonics paper

Hi @yaoyao-liu,

Thank you for the perfect codebase to experiment upon, and two very interesting papers.

I was captivated by Figure 1 of your mnemonics paper where you have plotted the data and highlighted the exemplars.

Can you please lend me your code that generated it (how you get such well spaced clusters, as well as how you highlight the exemplars)? It would be of great use to my work.

Thank you very much.

question about `modified_linear.py`

Hello,
I have a question regarding what the module modified_linear.py does.

Isn't the cosine linear layer a component for LUCIR?
I thought the layer was for cosine normalization and therefore should only be used for LUCIR, but your code doesn't seem to choose the resnet model dependent on the method (LUCIR/ICARL) so I wonder if I'm missing something.

I'm sorry in advance that I might have totally misunderstood the concept and might be bothering you.

Thank you!!

About stable block

In your paper, your said you apply a small set of scaling weights in stable block .But i can't find the corresponding code.Can you tell me where to find it?
Also,i read the code for optimizer in base_trainer.py.
In function set_optimizer,the parameters for b2_model is learnable if the 2nd branch is not fixed.But the parameters for b1_model, the FC weights for old classes is freezed and the others is all put into the optimizer.Why? What about the scaling weights? If we optimize the parameters for b1_model just as b2_model,how can it be called stable block?(Although the FC weights for old classes is freezed)

about ImageNet

Hi, yaoyao.
I directly followed the instructions and ran the code
"python main.py --nb_cl_fg=50 --nb_cl=10 --gpu=0 --random_seed=1993 --baseline=lucir --branch_mode=dual --branch_1=ss --branch_2=free --dataset=imagenet".
I just changed the dataset option and download the imagenet data and changed the "data_dir " option to the path i stored the data.It works and showed the results.Does it mean i successfully run the code for training the model on imagenet?.
I remembered that you said the code for ImageNet is not included in the current GitHub repository in
https://github.com/yaoyao-liu/class-incremental-learning/issues/12.Or you have already uploaded the code for imagenet?

Can this program run successfully?

I ran this code in a Windows environment and there were some errors. After some modifications, there were still errors. Is this code best suited to run in Ubuntu?

Code for PODNet-CNN + AANets

Hi @yaoyao-liu,

Thank you for your amazing works!

I was trying to improve upon your work, and would like to replicate PODNet-CNN + AANets, because it has the best performance when compared to other methods. Hence, it qualifies to be called as AANets results for comparison.

It would be very kind of you if you could provide code for PODNet-CNN + AANets. It is okay even if it is not polished.

Thanks,
Joseph

ValueError: signal number 32 out of range

It's pip list followed by README.md
Package Version

certifi 2016.2.28
cffi 1.10.0
joblib 1.1.0
mkl-fft 1.3.0
mkl-random 1.1.1
mkl-service 2.3.0
numpy 1.19.2
olefile 0.44
Pillow 6.2.2
pip 21.3.1
protobuf 3.19.1
pycparser 2.18
scikit-learn 0.24.2
scipy 1.5.4
setuptools 36.4.0
six 1.10.0
sklearn 0.0
tensorboardX 2.4.1
threadpoolctl 3.0.0
torch 1.2.0
torchvision 0.4.0a0+6b959ee
tqdm 4.62.3
wheel 0.29.0
following error：
Exception in thread Thread-2:
Traceback (most recent call last):
File "/home/tokaka22/.conda/envs/AANets-PyTorch/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/home/tokaka22/.conda/envs/AANets-PyTorch/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/home/tokaka22/.conda/envs/AANets-PyTorch/lib/python3.6/multiprocessing/resource_sharer.py", line 139, in _serve
signal.pthread_sigmask(signal.SIG_BLOCK, range(1, signal.NSIG))
File "/home/tokaka22/.conda/envs/AANets-PyTorch/lib/python3.6/signal.py", line 60, in pthread_sigmask
sigs_set = _signal.pthread_sigmask(how, mask)
ValueError: signal number 32 out of range

I must set num_workers to 0 which may cost too much time. Another solution is upgrade to Python 3.7 (but in README.md it said "conda create --name AANets-PyTorch python=3.6")
How can i fix it?

Difference between subset_ImageNet vs miniImageNet

Hello, thank you for sharing great project.
I wonder, what is difference between subset-ImageNet and miniImageNet?

They both are subset of ImageNet and have 100 classses.
So, would there be any reason for using subset-ImageNet instead of miniImageNet?

About the use of free model

Hi, yaoyao
I'm quite confused about the use of free model in your code and I can't find explanations in your essay. I'll appreciate it if you can provide me some details about the function of 'free model' in your code.
Thanks.

ImageNet-100 split

Hi,
Can you please share the 'list of the class directories' used for the ImageNet-100 experiment. Or the original sequence in ImageNet-1000 which was used for sampling and shuffling.

About initializing learnable parame φi and ηi

hello! in Algorithm 1 -Lines3,I want to kown how to update φi and ηi by φi-1 and ηi-1,could you enjoy a pecific methods？
And others, In i-th phase,plastic blocks will learn new parameter by novel data,so this new paramter how to merge with stable blocks in i+1-th phase?

Thank you!!!

Gradient of self.mnemonics is None after backward in mnemonics_train.py

Hi,

Thanks.

Training on ImageNet100/ImageNet1000

Hi, I was trying to figure out how to train the mnemonics incremental models for ImageNet100 protocol. However, I do not see any code to train on ImageNet. The main.py under 1_train directory lists only cifar100 as the choice.

Can you please point me to the training code for ImageNet100/1000 if it is already there in the repo, or let me know if you are providing the training code only for cifar100.

Thanks,
Touqeer

`BaseTrainer.init_current_phase_dataset` returning two `Y_valid_cumuls`

Hi,
I'm currently looking into your neat code!

I have a simple question regarding the function init_current_phase_dataset of BaseTrainer.
It is returning Y_valid_cumuls twice, but I think it's supposed to return Y_train_cumuls instead of one of them, or maybe just erase it.
(there seems to be no other function that uses Y_train_cumuls, but just wanted to make sure I am correctly following your code!)

And by the way, thank you very much for your work :))

This is a very strange question

I used python3.6 and pytorch1.2.0, but it has the following problems.

Kindly explain a little about the results terms and accuracy matching

Hi there, I am running this experiment, " [LUCIR] w/ AANets"
" python main.py --nb_cl_fg=50 --nb_cl=10 --gpu=0 --random_seed=1993 --baseline=lucir --branch_mode=dual --branch_1=ss --branch_2=free --dataset=cifar100"

Concerns:

which one accuracy you have used in your excel file.
What do you mean by
I) Current Accuracy FC
II) Current Accuracy (proto)
III) Current Accuracy (Proto-UB)

Thank you very much and looking forward to hearing.

How to install dependencies?

How to install the dependencies required for the mnemonics training repository?

can't convert cuda:0 device type tensor to numpy

Get the following error while running with !python mnemonics/main.py --nb_cl_fg=50 --nb_cl=2 --nb_protos=20 --resume --imprint_weights

train on CIFAR100 with --nb_cl=10

Hi, yaoyao.
I directly followed the instructions and ran the code "python main.py --nb_cl_fg=50 --nb_cl=10 --nb_protos 20 --resume --imprint_weights".

I expect that the accuracy is over 55%, which is consistent with the results in fig.4(a).
But the results was only about 52%.

The average accuracy I got was about 62% but the results in table 1 is 64.95%.

I an wondering if I got something wrong or they are reasonable results?

Thanks~

Where is the code for training?

Hi,
Can you share the code for training?
I haven't found the training part in your updated code.
Many thanks.

ablation study for the "mtl" and "feature fusion" strategies

Hi, yaoyao,

Thanks for your interesting work.

I did some ablation study for these two components(mtl and feature fusion), i.e., 1. a plain network (resnet32_cifar) is constructed for tg_model in your code; 2. extracting features or outputs without calling the function process_input_fp; 3. replace herding exemplars by self.mnemonics (cumulated from phase 0 to i) under the framework of LUCIR[1]. The experiments show that without the two components, the performance after the first incremental learning is 63.47 instead of 69.05 (obtained by directly running your training code).

In my scenario, I'm only allowed to use a plain network (e.g., resnet-like network) and also without feature fusion. I'm wondering if it is necessary for the Mnemonics method to work with "mtl" and "feature fusion".

[1] Hou, S., Pan, X., Loy, C. C., Wang, Z., & Lin, D. (2019). Learning a unified classifier incrementally via rebalancing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 831-839).

training problem

Hello, Professor! I have the following problem when running the code on win11. Can you explain what they mean and how to solve the problems? (my graph memory is 8GB) Thank you very much!

python main.py --nb_cl_fg=50 --nb_cl=10 --gpu=0 --random_seed=1993 --baseline=lucir --branch_mode=dual --branch_1=ss --branch_2=free --dataset=cifar100
Namespace(K=2, base_lr1=0.1, base_lr2=0.1, baseline='lucir', branch_1='ss', branch_2='free', branch_mode='dual', ckpt_dir_fg='-', ckpt_label='exp01', custom_momentum=0.9, custom_weight_decay=0.0005, data_dir=
'data/seed_1993_subset_100_imagenet/data', dataset='cifar100', disable_gpu_occupancy=True, dist=0.5, dynamic_budget=False, epochs=160, eval_batch_size=128, fusion_lr=1e-08, gpu='0', icarl_T=2, icarl_beta=0.25
, imgnet_backbone='resnet18', lr_factor=0.1, lw_mr=1, nb_cl=10, nb_cl_fg=50, nb_protos=20, num_classes=100, num_workers=1, random_seed=1993, resume=False, resume_fg=False, test_batch_size=100, the_lambda=5, train_batch_size=128)
Using gpu: 0
Total memory: 8192, used memory: 829
Occupy GPU memory in advance.
Files already downloaded and verified
Files already downloaded and verified
Order name:./logs/cifar100_nfg50_ncls10_nproto20_lucir_dual_b1ss_b2free_fixed_exp01\seed_1993_cifar100_order.pkl
Loading the saved class order
[68, 56, 78, 8, 23, 84, 90, 65, 74, 76, 40, 89, 3, 92, 55, 9, 26, 80, 43, 38, 58, 70, 77, 1, 85, 19, 17, 50, 28, 53, 13, 81, 45, 82, 6, 59, 83, 16, 15, 44, 91, 41, 72, 60, 79, 52, 20, 10, 31, 54, 37, 95, 14, 71, 96, 98, 97, 2, 64, 66, 42, 22, 35, 86, 24, 34, 87, 21, 99, 0, 88, 27, 18, 94, 11, 12, 47, 25, 30, 46, 62, 69, 36, 61, 7, 63, 75, 5, 32, 4, 51, 48, 73, 93, 39, 67, 29, 49, 57, 33]
Feature: 64 Class: 50
Setting the dataloaders ...
Check point name: ./logs/cifar100_nfg50_ncls10_nproto20_lucir_dual_b1ss_b2free_fixed_exp01\iter_4_b1.pth

Epoch: 0, learning rate: 0.1
Traceback (most recent call last):
File "main.py", line 88, in
trainer.train()
File "E:\AlgSpace\pycharm\AANets\trainer\trainer.py", line 171, in train
cur_lambda, self.args.dist, self.args.K, self.args.lw_mr)
File "E:\AlgSpace\pycharm\AANets\trainer\zeroth_phase.py", line 63, in incremental_train_and_eval_zeroth_phase
outputs = b1_model(inputs)
File "E:\Anaconda\envs\aanets\lib\site-packages\torch\nn\modules\module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "E:\AlgSpace\pycharm\AANets\models\modified_resnet_cifar.py", line 109, in forward
x = self.fc(x)
File "E:\Anaconda\envs\aanets\lib\site-packages\torch\nn\modules\module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "E:\AlgSpace\pycharm\AANets\models\modified_linear.py", line 37, in forward
F.normalize(self.weight, p=2, dim=1))
File "E:\Anaconda\envs\aanets\lib\site-packages\torch\nn\functional.py", line 1371, in linear
output = input.matmul(weight.t())
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)

The code is super clean! Thank you for your contributions.

Hi @yaoyao-liu,

Loved your base_trainer.py and trainer.py. The abstractions are very nice. Thank you for your efforts in writing such clean code.

(This is a complement and not an issue, but I dont think Github allows discussion other than one creating 'Issues')

Thanks,
Joseph

Running errors

Hi, Professors. there are bugs when I run the code. Can you explain what they mean and how to solve the problems?

Traceback (most recent call last):
File "main.py", line 78, in
trainer.train()
File "/home/gyc/class-incremental-learning/mnemonics-training/1_train/trainer/baseline.py", line 237, in train
tg_model = incremental_train_and_eval(self.args.epochs, tg_model, ref_model, free_model, ref_free_model, tg_optimizer, tg_lr_scheduler, trainloader, testloader, iteration, start_iter, cur_lamda, self.args.dist, self.args.K, self.args.lw_mr)
File "/home/gyc/class-incremental-learning/mnemonics-training/1_train/trainer/incremental.py", line 32, in incremental_train_and_eval
for batch_idx, (inputs, targets) in enumerate(trainloader):
File "/home/gyc/.conda/envs/tmp/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 286, in next
return self._process_next_batch(batch)
File "/home/gyc/.conda/envs/tmp/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 307, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
IndexError: Traceback (most recent call last):
File "/home/gyc/.conda/envs/tmp/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 57, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/gyc/.conda/envs/tmp/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 57, in
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/gyc/.conda/envs/tmp/lib/python3.6/site-packages/torchvision/datasets/cifar.py", line 90, in getitem
img, target = self.train_data[index], self.train_labels[index]
IndexError: index 46523 is out of bounds for axis 0 with size 25000

PODNET-AAN Related experiment running issue!

One more thing please,
I am trying to run the experiment of POD-AANets, please verify that this is the right way to run the experiment? Because I am getting error and not able to find out the way to execute it successfully. I have installed all the required libraries as well. Thank you!

I have cloned this repository: https://github.com/yaoyao-liu/POD-AANets

Installed all the required libraries which you mentioned from: https://github.com/arthurdouillard/incremental_learning.pytorch

I have attached the screenshot the error which is coming out, while I am running it on NVIDEA T4. Like when I execute this command "python run_exp.py" then this data loader trace back type of error is popping out.

Thank you once again and looking [forward.]

size of trainloader

Dear author,
I have some questions about the training phase on Lucir+AANets. When I run the main.py on CIFAR100, the size of training set in the 0-th is only 156, and 55,55,55,55,56 for the rest phases, respectively. I was wondering how many images are trained in each epoch? Because the 0-th phase needs to train 50 classes, and in CIFAR100 50 classes means 50*500 images totally.

A request

Hello @yaoyao-liu,

If you have the numbers for each datapoint to redraw Figure S2 in AAN paper (all the graphs with accuracy at each step), can you share that across?

This would help to recreate the graph without having to rerun all the baselines.

It would be very kind of you if you could share it along.

Thanks,
Joseph

yaoyao-liu / class-incremental-learning Goto Github PK

class-incremental-learning's Introduction

Hey there, I'm Yaoyao! 👋

class-incremental-learning's People

Contributors

Stargazers

Watchers

Forkers

class-incremental-learning's Issues

Recommend Projects

Recommend Topics

Recommend Org