yaoyao-liu / class-incremental-learning Goto Github PK
View Code? Open in Web Editor NEWPyTorch implementation of AANets (CVPR 2021) and Mnemonics Training (CVPR 2020 Oral)
Home Page: https://class-il.mpi-inf.mpg.de
License: MIT License
PyTorch implementation of AANets (CVPR 2021) and Mnemonics Training (CVPR 2020 Oral)
Home Page: https://class-il.mpi-inf.mpg.de
License: MIT License
Hi @yaoyao-liu,
Thank you for the perfect codebase to experiment upon, and two very interesting papers.
I was captivated by Figure 1 of your mnemonics paper where you have plotted the data and highlighted the exemplars.
Can you please lend me your code that generated it (how you get such well spaced clusters, as well as how you highlight the exemplars)? It would be of great use to my work.
Thank you very much.
Hi @yaoyao-liu
Can you guide me how to save model after each phase ? So that I can use those models and calculate accuracies per class similar to what you have done e.g class 1-10 (%) 11-20 (%) and so on
I ran this code in a Windows environment and there were some errors. After some modifications, there were still errors. Is this code best suited to run in Ubuntu?
Hi, I am reading the code of mnemonics-training, and want to verify the effectiveness of the algorithm, which is related to dataset distillation. However, when I read the code, I noticed that the network implemented here in mnemonics.py is different from that in baseline.py
I think it could be more reasonable to change the "self.network_mtl" to "self.network" when comparing baseline and mnemonics. I want to inquire why mnemonics uses network_mtl, while baseline uses a different model.
Another inquiry is about the "fusion_mode". I have seen some issues discussing about it, but I have not figured out how this option influences the code. What's this option designed for?
Hi, @yaoyao-liu
why can't I find stable block and plastic block in your modified_resnet file. Did I miss something.
Thanks.
Hi,
I run the experiment with the following command "python main.py --nb_cl_fg=50 --nb_cl=2 --nb_protos 20 --resume --imprint_weights"
. I find that the gradient of self.mnemonics
in mnemonics_train.py is None after q_loss.backward()
, which means it doesn't update.
I am wondering whether this is correct?
Thanks.
Hi, yaoyao.
I directly followed the instructions and ran the code "python main.py --nb_cl_fg=50 --nb_cl=10 --nb_protos 20 --resume --imprint_weights".
I expect that the accuracy is over 55%, which is consistent with the results in fig.4(a).
But the results was only about 52%.
The average accuracy I got was about 62% but the results in table 1 is 64.95%.
I an wondering if I got something wrong or they are reasonable results?
Thanks~
Hello, thank you for sharing great project.
I wonder, what is difference between subset-ImageNet and miniImageNet?
They both are subset of ImageNet and have 100 classses.
So, would there be any reason for using subset-ImageNet instead of miniImageNet?
Hi, thanks for your work.
There are some codes in mnemonics-training/1_train/trainer/mnemonics.py I am confused about.
if iteration > start_iter:
tg_model = incremental_train_and_eval(self.args.epochs, tg_model, ref_model, free_model, ref_free_model, tg_optimizer, tg_lr_scheduler, trainloader, testloader, iteration, start_iter, cur_lamda, self.args.dist, self.args.K, self.args.lw_mr)
else:
tg_model = incremental_train_and_eval(self.args.epochs, tg_model, ref_model, free_model, ref_free_model, tg_optimizer, tg_lr_scheduler, trainloader, testloader, iteration, start_iter, cur_lamda, self.args.dist, self.args.K, self.args.lw_mr)
I can't find and differences between two brances if
and else
.
original code: see here.
tg_model
w/o the final fc layer?It would be so kind of you if you could help me answer these questions.
Thanks again.
Hi there, I am running this experiment, " [LUCIR] w/ AANets"
" python main.py --nb_cl_fg=50 --nb_cl=10 --gpu=0 --random_seed=1993 --baseline=lucir --branch_mode=dual --branch_1=ss --branch_2=free --dataset=cifar100"
Concerns:
Thank you very much and looking forward to hearing.
Hi,Professors.There some bugs when I run the code.Can you explain what they mean and how to solve the problems? Thank you very much!
Using gpu: 0
Files already downloaded and verified
Files already downloaded and verified
Order name:./logs/cifar100_nfg50_ncls10_nproto20_mnemonics/seed_1993_cifar100_order_run_0.pkl
Generating orders
pickle into ./logs/cifar100_nfg50_ncls10_nproto20_mnemonics/seed_1993_cifar100_order_run_0.pkl
[68, 56, 78, 8, 23, 84, 90, 65, 74, 76, 40, 89, 3, 92, 55, 9, 26, 80, 43, 38, 58, 70, 77, 1, 85, 19, 17, 50, 28, 53, 13, 81, 45, 82, 6, 59, 83, 16, 15, 44, 91, 41, 72, 60, 79, 52, 20, 10, 31, 54, 37, 95, 14, 71, 96, 98, 97, 2, 64, 66, 42, 22, 35, 86, 24, 34, 87, 21, 99, 0, 88, 27, 18, 94, 11, 12, 47, 25, 30, 46, 62, 69, 36, 61, 7, 63, 75, 5, 32, 4, 51, 48, 73, 93, 39, 67, 29, 49, 57, 33]
Out_features: 50
Batch of classes number 5 arrives
Max and min of train labels: 0, 49
Max and min of valid labels: 0, 49
Checkpoint name: ./logs/cifar100_nfg50_ncls10_nproto20_mnemonics/run_0_iteration_4_model.pth
Incremental train
Epoch: 0, LR: [0.1]
/opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [3,0,0] Assertion t >= 0 && t < n_classes
failed.
/opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [4,0,0] Assertion t >= 0 && t < n_classes
failed.
/opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [5,0,0] Assertion t >= 0 && t < n_classes
failed.
/opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [6,0,0] Assertion t >= 0 && t < n_classes
failed.
/opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [7,0,0] Assertion t >= 0 && t < n_classes
failed.
/opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [8,0,0] Assertion t >= 0 && t < n_classes
failed.
/opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [9,0,0] Assertion t >= 0 && t < n_classes
failed.
/opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [12,0,0] Assertion t >= 0 && t < n_classes
failed.
/opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [13,0,0] Assertion t >= 0 && t < n_classes
failed.
/opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [17,0,0] Assertion t >= 0 && t < n_classes
failed.
/opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [19,0,0] Assertion t >= 0 && t < n_classes
failed.
/opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [21,0,0] Assertion t >= 0 && t < n_classes
failed.
/opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [22,0,0] Assertion t >= 0 && t < n_classes
failed.
/opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [23,0,0] Assertion t >= 0 && t < n_classes
failed.
/opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [26,0,0] Assertion t >= 0 && t < n_classes
failed.
/opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [28,0,0] Assertion t >= 0 && t < n_classes
failed.
/opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [29,0,0] Assertion t >= 0 && t < n_classes
failed.
/opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [30,0,0] Assertion t >= 0 && t < n_classes
failed.
/opt/conda/conda-bld/pytorch_1603729138878/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [31,0,0] Assertion t >= 0 && t < n_classes
failed.
Traceback (most recent call last):
File "main.py", line 78, in
trainer.train()
File "/data1/22160073/project/incremental learning/class-incremental-learning-main/mnemonics-training/1_train/trainer/mnemonics.py", line 237, in train
tg_model = incremental_train_and_eval(self.args.epochs, tg_model, ref_model, free_model, ref_free_model, tg_optimizer, tg_lr_scheduler, trainloader, testloader, iteration, start_iter, cur_lamda, self.args.dist, self.args.K, self.args.lw_mr)
File "/data1/22160073/project/incremental learning/class-incremental-learning-main/mnemonics-training/1_train/trainer/incremental.py", line 44, in incremental_train_and_eval
loss.backward()
File "/data1/22160073/anaconda3/envs/xxz/lib/python3.7/site-packages/torch/tensor.py", line 221, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/data1/22160073/anaconda3/envs/xxz/lib/python3.7/site-packages/torch/autograd/init.py", line 132, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: CUDA error: device-side assert triggered
Hi @yaoyao-liu,
One more clarification request, @yaoyao-liu.
Where ImageNet images resized such that the smaller dimension is 256? It would be great if you could recall this.
Thanks,
Joseph
Hi,
I notice that the target model in the squential class-incremental training is set to network_mtl. Regarding your implementation for constructing an "mtl" network, it seems every weight of convolution kernels is freezed and multiplied by a learnable mask (initialized as 1). My question is why the network architecture is implemented in this way?
Thanks in adavance!
Michael
Hi, yaoyao
I'm quite confused about the use of free model in your code and I can't find explanations in your essay. I'll appreciate it if you can provide me some details about the function of 'free model' in your code.
Thanks.
Hi,
The paper uses a dynamic budget of 20 exemplars per class in training the models associated with every result reported.
However, the bash commands that you provide do not contain the flag --dynamic_budget
at all.
Does that mean the code uses a fixed budget? In that case, what does the flag indicate?
Thanks!
Hi @yaoyao-liu, thanks for your interesting work! I am interested in the BOP mnemonics training and
willing to make further extensions on it! I found some problems when I checked the codes of Mnemonics Training. It seems that trainer/mnemonics.py
is not the complete version, and it does not follow the training strategies described in the paper.
For example, I couldn't find the binary-level optimization of mnemonics exemplars. There seems to be only 1-level optimization of mnemonics based on NCE classification. But there should be another level before that, which is training for a temporary model on the exemplars (Eq.8) and unroll all training gradients? Also, I couldn't find the process of splitting exemplars and adjusting the mnemonics of old classes. Another issue is that some arguments are defined but are not used, such as self.mnemonics_lrs?
Could you please help to explain my doubts? I am really interested in the implementation of solving BOP and mnemonics training. I apologize if my understanding is wrong. Thank you very much!
hello, i want to save the model to do the T-SNE, how to save the model of each stage?
i found you had set the ckpt but don't set the torch.save() to save the model.
How to install the dependencies required for the mnemonics training repository?
One more thing please,
I am trying to run the experiment of POD-AANets, please verify that this is the right way to run the experiment? Because I am getting error and not able to find out the way to execute it successfully. I have installed all the required libraries as well. Thank you!
I have cloned this repository: https://github.com/yaoyao-liu/POD-AANets
Installed all the required libraries which you mentioned from: https://github.com/arthurdouillard/incremental_learning.pytorch
I have attached the screenshot the error which is coming out, while I am running it on NVIDEA T4. Like when I execute this command "python run_exp.py" then this data loader trace back type of error is popping out.
Thank you once again and looking [forward.]
Hi,
I run the experiment with the following command "python main.py --nb_cl_fg=50 --nb_cl=2 --nb_protos 20 --resume --imprint_weights"
. I find that the gradient of self.mnemonics
in mnemonics_train.py is None after q_loss.backward()
, which means it doesn't update.
I am wondering whether this is correct?
Thanks.
Hi,
I'm currently looking into your neat code!
I have a simple question regarding the function init_current_phase_dataset
of BaseTrainer
.
It is returning Y_valid_cumuls
twice, but I think it's supposed to return Y_train_cumuls
instead of one of them, or maybe just erase it.
(there seems to be no other function that uses Y_train_cumuls
, but just wanted to make sure I am correctly following your code!)
And by the way, thank you very much for your work :))
Hello,
I have a question regarding what the module modified_linear.py
does.
Isn't the cosine linear layer a component for LUCIR?
I thought the layer was for cosine normalization and therefore should only be used for LUCIR, but your code doesn't seem to choose the resnet model dependent on the method (LUCIR/ICARL) so I wonder if I'm missing something.
I'm sorry in advance that I might have totally misunderstood the concept and might be bothering you.
Thank you!!
Dear author,
I have some questions about the training phase on Lucir+AANets. When I run the main.py on CIFAR100, the size of training set in the 0-th is only 156, and 55,55,55,55,56 for the rest phases, respectively. I was wondering how many images are trained in each epoch? Because the 0-th phase needs to train 50 classes, and in CIFAR100 50 classes means 50*500 images totally.
Hello! I try to train the model on ImageNet by setting epochs=1
,it still takes 11 hours to finish training.I want to know
how long does it take to train AANets on ImageNet (N=5/10/25)?
Hello, Professor! I have the following problem when running the code on win11. Can you explain what they mean and how to solve the problems? (my graph memory is 8GB) Thank you very much!
python main.py --nb_cl_fg=50 --nb_cl=10 --gpu=0 --random_seed=1993 --baseline=lucir --branch_mode=dual --branch_1=ss --branch_2=free --dataset=cifar100
Namespace(K=2, base_lr1=0.1, base_lr2=0.1, baseline='lucir', branch_1='ss', branch_2='free', branch_mode='dual', ckpt_dir_fg='-', ckpt_label='exp01', custom_momentum=0.9, custom_weight_decay=0.0005, data_dir=
'data/seed_1993_subset_100_imagenet/data', dataset='cifar100', disable_gpu_occupancy=True, dist=0.5, dynamic_budget=False, epochs=160, eval_batch_size=128, fusion_lr=1e-08, gpu='0', icarl_T=2, icarl_beta=0.25
, imgnet_backbone='resnet18', lr_factor=0.1, lw_mr=1, nb_cl=10, nb_cl_fg=50, nb_protos=20, num_classes=100, num_workers=1, random_seed=1993, resume=False, resume_fg=False, test_batch_size=100, the_lambda=5, train_batch_size=128)
Using gpu: 0
Total memory: 8192, used memory: 829
Occupy GPU memory in advance.
Files already downloaded and verified
Files already downloaded and verified
Order name:./logs/cifar100_nfg50_ncls10_nproto20_lucir_dual_b1ss_b2free_fixed_exp01\seed_1993_cifar100_order.pkl
Loading the saved class order
[68, 56, 78, 8, 23, 84, 90, 65, 74, 76, 40, 89, 3, 92, 55, 9, 26, 80, 43, 38, 58, 70, 77, 1, 85, 19, 17, 50, 28, 53, 13, 81, 45, 82, 6, 59, 83, 16, 15, 44, 91, 41, 72, 60, 79, 52, 20, 10, 31, 54, 37, 95, 14, 71, 96, 98, 97, 2, 64, 66, 42, 22, 35, 86, 24, 34, 87, 21, 99, 0, 88, 27, 18, 94, 11, 12, 47, 25, 30, 46, 62, 69, 36, 61, 7, 63, 75, 5, 32, 4, 51, 48, 73, 93, 39, 67, 29, 49, 57, 33]
Feature: 64 Class: 50
Setting the dataloaders ...
Check point name: ./logs/cifar100_nfg50_ncls10_nproto20_lucir_dual_b1ss_b2free_fixed_exp01\iter_4_b1.pth
Epoch: 0, learning rate: 0.1
Traceback (most recent call last):
File "main.py", line 88, in
trainer.train()
File "E:\AlgSpace\pycharm\AANets\trainer\trainer.py", line 171, in train
cur_lambda, self.args.dist, self.args.K, self.args.lw_mr)
File "E:\AlgSpace\pycharm\AANets\trainer\zeroth_phase.py", line 63, in incremental_train_and_eval_zeroth_phase
outputs = b1_model(inputs)
File "E:\Anaconda\envs\aanets\lib\site-packages\torch\nn\modules\module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "E:\AlgSpace\pycharm\AANets\models\modified_resnet_cifar.py", line 109, in forward
x = self.fc(x)
File "E:\Anaconda\envs\aanets\lib\site-packages\torch\nn\modules\module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "E:\AlgSpace\pycharm\AANets\models\modified_linear.py", line 37, in forward
F.normalize(self.weight, p=2, dim=1))
File "E:\Anaconda\envs\aanets\lib\site-packages\torch\nn\functional.py", line 1371, in linear
output = input.matmul(weight.t())
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)
It's pip list followed by README.md
Package Version
certifi 2016.2.28
cffi 1.10.0
joblib 1.1.0
mkl-fft 1.3.0
mkl-random 1.1.1
mkl-service 2.3.0
numpy 1.19.2
olefile 0.44
Pillow 6.2.2
pip 21.3.1
protobuf 3.19.1
pycparser 2.18
scikit-learn 0.24.2
scipy 1.5.4
setuptools 36.4.0
six 1.10.0
sklearn 0.0
tensorboardX 2.4.1
threadpoolctl 3.0.0
torch 1.2.0
torchvision 0.4.0a0+6b959ee
tqdm 4.62.3
wheel 0.29.0
following error:
Exception in thread Thread-2:
Traceback (most recent call last):
File "/home/tokaka22/.conda/envs/AANets-PyTorch/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/home/tokaka22/.conda/envs/AANets-PyTorch/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/home/tokaka22/.conda/envs/AANets-PyTorch/lib/python3.6/multiprocessing/resource_sharer.py", line 139, in _serve
signal.pthread_sigmask(signal.SIG_BLOCK, range(1, signal.NSIG))
File "/home/tokaka22/.conda/envs/AANets-PyTorch/lib/python3.6/signal.py", line 60, in pthread_sigmask
sigs_set = _signal.pthread_sigmask(how, mask)
ValueError: signal number 32 out of range
I must set num_workers to 0 which may cost too much time. Another solution is upgrade to Python 3.7 (but in README.md it said "conda create --name AANets-PyTorch python=3.6")
How can i fix it?
In your paper , you said you run the experiment three times,did you use different random seed,or you just use seed 1993 and run 3 times?I found that different random seed may affect the final accuracy.So i want to know your configuration for fair comparison.
Hi,
Can you please share the 'list of the class directories' used for the ImageNet-100 experiment. Or the original sequence in ImageNet-1000 which was used for sampling and shuffling.
Hi @yaoyao-liu,
Loved your base_trainer.py and trainer.py. The abstractions are very nice. Thank you for your efforts in writing such clean code.
(This is a complement and not an issue, but I dont think Github allows discussion other than one creating 'Issues')
Thanks,
Joseph
hi, professor,
I have read your paper, and runs the code in CIFAR100.
but I can't found the dataset for mini-imagenet process, could you tell me how you deal with the mini-imagenet for data process?
I am very appreciate if you can tell me , thanks!
Hi, Professors. there are bugs when I run the code. Can you explain what they mean and how to solve the problems? Thank you very much!
python main.py --nb_cl_fg=50 --nb_cl=10 --gpu=0 --random_seed=1993 --baseline=lucir --branch_mode=dual --branch_1=ss --branch_2=free --dataset=cifar100
Traceback (most recent call last):
File "main.py", line 436, in
train(config, training_chunked_samples_dir, testing_chunked_samples_file)
File "main.py", line 238, in train
X_valid_ori, Y_valid_ori, X_valid_cumul, Y_valid_cumul, iteration, is_start_iteration, top1_acc_list_ori, top1_acc_list_cumul)
File "/data/gjj/code/hf2vad/cil/AANet/trainer/base_trainer.py", line 792, in compute_acc
order_list, is_start_iteration=is_start_iteration)
File "/data/gjj/code/hf2vad/cil/AANet/utils/incremental/compute_accuracy.py", line 78, in compute_accuracy
sqd_icarl = cdist(class_means[:,:,0].T, outputs_feature.cpu(), 'sqeuclidean')
File "/home/gjj/anaconda3/envs/py36/lib/python3.6/site-packages/scipy/spatial/distance.py", line 2710, in cdist
raise ValueError('XB must be a 2-dimensional array.')
Hi @yaoyao-liu ,
I understand the code for herding selection and how the alpha_dr_herding
is populated.
But I don't understand how the class-means are calculated.
What does the line (np.dot(D,alph)+np.dot(D2,alph))/2
do?
Why can it not be just np.dot(D, alph)?
Hi @yaoyao-liu,
Thanks for your wonderful work.
I have a question: How are the hyperparameters for your model set, in general (not in particular for AANets, I mean parameters such as lambda
etc that determine the stability-plasiticity tradeoff).
Do you do multiple runs on the entire dataset with a lot of hyperparameter combinations sampled from a coarse grid?
Or do you determine the hyperparameter separaterly for each task?
Thank you!
Hi,
Thanks for your innovative method and sharing the code with community.
When running python3 main.py --method=mnemonics --nb_cl=10
I face the following AttributeErorrs
'CIFAR100' object has no attribute 'train_labels'
in mnemonics.py. This issue pesisted for train_data, as well and I was able to solve it by replacing self.train
with self.trainset.train_data
.
However for:
self.trainset.train_labels
self.testset.test_data
self.testset.test_labels
no replacement worked with existing torchvision(0.5.0).
I would appreciate if you specify the torchvision that worked with your model.
Bests,
Nila
Hi,
Can you share the code for training?
I haven't found the training part in your updated code.
Many thanks.
Hi, yaoyao.
I directly followed the instructions and ran the code
"python main.py --nb_cl_fg=50 --nb_cl=10 --gpu=0 --random_seed=1993 --baseline=lucir --branch_mode=dual --branch_1=ss --branch_2=free --dataset=imagenet".
I just changed the dataset option and download the imagenet data and changed the "data_dir " option to the path i stored the data.It works and showed the results.Does it mean i successfully run the code for training the model on imagenet?.
I remembered that you said the code for ImageNet is not included in the current GitHub repository in
https://github.com/yaoyao-liu/class-incremental-learning/issues/12.Or you have already uploaded the code for imagenet?
Hello @yaoyao-liu,
If you have the numbers for each datapoint to redraw Figure S2 in AAN paper (all the graphs with accuracy at each step), can you share that across?
This would help to recreate the graph without having to rerun all the baselines.
It would be very kind of you if you could share it along.
Thanks,
Joseph
In your paper, your said you apply a small set of scaling weights in stable block .But i can't find the corresponding code.Can you tell me where to find it?
Also,i read the code for optimizer in base_trainer.py
.
In function set_optimizer
,the parameters for b2_model
is learnable if the 2nd branch is not fixed.But the parameters for b1_model
, the FC weights for old classes is freezed and the others is all put into the optimizer.Why? What about the scaling weights? If we optimize the parameters for b1_model
just as b2_model
,how can it be called stable block?(Although the FC weights for old classes is freezed)
Hi, Professors. there are bugs when I run the code. Can you explain what they mean and how to solve the problems?
Traceback (most recent call last):
File "main.py", line 78, in
trainer.train()
File "/home/gyc/class-incremental-learning/mnemonics-training/1_train/trainer/baseline.py", line 237, in train
tg_model = incremental_train_and_eval(self.args.epochs, tg_model, ref_model, free_model, ref_free_model, tg_optimizer, tg_lr_scheduler, trainloader, testloader, iteration, start_iter, cur_lamda, self.args.dist, self.args.K, self.args.lw_mr)
File "/home/gyc/class-incremental-learning/mnemonics-training/1_train/trainer/incremental.py", line 32, in incremental_train_and_eval
for batch_idx, (inputs, targets) in enumerate(trainloader):
File "/home/gyc/.conda/envs/tmp/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 286, in next
return self._process_next_batch(batch)
File "/home/gyc/.conda/envs/tmp/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 307, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
IndexError: Traceback (most recent call last):
File "/home/gyc/.conda/envs/tmp/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 57, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/gyc/.conda/envs/tmp/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 57, in
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/gyc/.conda/envs/tmp/lib/python3.6/site-packages/torchvision/datasets/cifar.py", line 90, in getitem
img, target = self.train_data[index], self.train_labels[index]
IndexError: index 46523 is out of bounds for axis 0 with size 25000
Hi @yaoyao-liu,
Thank you for your amazing works!
I was trying to improve upon your work, and would like to replicate PODNet-CNN + AANets, because it has the best performance when compared to other methods. Hence, it qualifies to be called as AANets results for comparison.
It would be very kind of you if you could provide code for PODNet-CNN + AANets. It is okay even if it is not polished.
Thanks,
Joseph
Hi, yaoyao,
Thanks for your interesting work.
I did some ablation study for these two components(mtl and feature fusion), i.e., 1. a plain network (resnet32_cifar) is constructed for tg_model
in your code; 2. extracting features or outputs without calling the function process_input_fp
; 3. replace herding exemplars by self.mnemonics
(cumulated from phase 0 to i) under the framework of LUCIR[1]. The experiments show that without the two components, the performance after the first incremental learning is 63.47 instead of 69.05 (obtained by directly running your training code).
In my scenario, I'm only allowed to use a plain network (e.g., resnet-like network) and also without feature fusion. I'm wondering if it is necessary for the Mnemonics method to work with "mtl" and "feature fusion".
[1] Hou, S., Pan, X., Loy, C. C., Wang, Z., & Lin, D. (2019). Learning a unified classifier incrementally via rebalancing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 831-839).
Hello, I would like to ask what is the hyperparameter setting of training Imagenet1000, is it the same as that of Imagenet100?
Hi, I was trying to figure out how to train the mnemonics incremental models for ImageNet100 protocol. However, I do not see any code to train on ImageNet. The main.py under 1_train directory lists only cifar100 as the choice.
Can you please point me to the training code for ImageNet100/1000 if it is already there in the repo, or let me know if you are providing the training code only for cifar100.
Thanks,
Touqeer
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.