thuml / cdan Goto Github PK

Code release for "Conditional Adversarial Domain Adaptation" (NIPS 2018)

CMake 1.13% Makefile 0.28% Shell 0.30% Dockerfile 0.03% HTML 0.08% CSS 0.10% Jupyter Notebook 56.98% C++ 33.06% Python 5.02% Cuda 2.56% C 0.11% MATLAB 0.36%

cdan's People

Contributors

Stargazers

Watchers

Forkers

drpengsong deep0learning maplewzx dlwbm123 redhat12345 hanfeng-cdd usccolumbia haorotu bylake ihaeyong wangkanger anonymous1computervision hanzijun amena6490 gufeicang mklimasz tungk xcpeng chen-dixi sobalgi likeafoolqvq yyht muhammedabdelnasser zhangjingsecond thoamsdong dark-noisy-py liuyubao fancangning daivd1994 lotuswhu michaelcaohn ffzhang1231 jiangyizhou16 mendessp hyd111111 jgyoung33 qinenergy chenmingthu lcx96 kl418 heleibin yamad07 adarshchbs haha-533 ybzh jing--li kj141 siyayao lebyni bigdataha cherishlrx ksaito-ut amanoswal orangeqqq guoshuxuan christophraab wanzijuan compass-wang cv-ip aaron2bin liqinglai youtang1993 jongkook-heo andy12392 zhangxingcheng clnsand dl-ml exitudio kunyu9198 zjh-username rhtm02 ribenshisunzi weonyoungjoo tarun005 lqmmring weiduowang wsnd183 shliujing wzx1998johnny ilmondo1 jiangze18 hfawaz git-jhy shunyayamagami zhanghuaqing123 arailen

cdan's Issues

Hello,I did not find the resnet model. Can you upload it?Thanks

Cannot train pytorch version on multi-gpu

When trained on multi-gpu, the transfer loss (CDAN + Entropy loss) is very low and the test accuracy alway decreases. Dataset: visda2017
tested on Ubuntu 18.04, Pytorch version: 1.2.0 / 1.4.0, GPU: GTX 1080 / RTX 2080Ti

do we need whole iterations 100000?

although i don't run the code on all datasets , It doesn't look much different after 20000 iterations.
how do you think about iterations?

Questions about learning rate decay

In the paper section 4.1, it is said:
ηp = η0(1 + αp)−β, where p is the training progress changing from 0 to 1

I understand p = iter_num/max_iters

However, in the following pytorch code:

CDAN/pytorch/lr_schedule.py

Line 3 in f788906

lr = lr * (1 + gamma * iter_num) ** (-power)

and also in the tensorflow code:
https://github.com/thuml/CDAN/blob/master/tensorflow/train.py#L13

p = iter_num is used instead of p = iter_num/max_iters

Can you explain about this? Thanks!

Office-31 Test Set And Val Set

From the code it seems like the Test Set = Val Set = Target Domain.

config["data"] = {"source":{"list_path":args.s_dset_path, "batch_size":36}, \
                          "target":{"list_path":args.t_dset_path, "batch_size":36}, \
                          "test":{"list_path":args.t_dset_path, "batch_size":4}}

And the model seems to be selected on entire test set

if i % config["test_interval"] == config["test_interval"] - 1:
            base_network.train(False)
            temp_acc = image_classification_test(dset_loaders, \
                base_network, test_10crop=prep_config["test_10crop"])
            temp_model = nn.Sequential(base_network)
            if temp_acc > best_acc:
                best_acc = temp_acc
                best_model = temp_model
            log_str = "iter: {:05d}, precision: {:.5f}".format(i, temp_acc)
            config["out_file"].write(log_str+"\n")
            config["out_file"].flush()
            print(log_str)

Could you clarify the following general questions on evaluation on Office-31 dataset used in the paper:
i) Whether my understanding is correct from above code
ii) Whether the numbers in paper are reported using above logic
iii) I see Importance Weighted Cross Validation is being discussed in the paper. Is that used for Office-31 dataset and if yes, could you a) explain how the splits are created and b) it would be of great help if you can share your evaluation code.

NotImplementedError

Please help to diagnose the error.
I have cuda installed, which I checked by
print(torch.version.cuda)
10.2
print(torch.cuda.device_count())
1
print(torch.cuda.is_available())
True

Full Error:

THCudaCheck FAIL file=..\aten\src\THC\THCGeneral.cpp line=47 error=100 : no CUDA-capable device is detected
Traceback (most recent call last):
File "train_image.py", line 277, in
train(config)
File "train_image.py", line 110, in train
base_network = base_network.cuda()
File "C:\Users\aman\anaconda3\lib\site-packages\torch\nn\modules\module.py", line 307, in cuda
return self._apply(lambda t: t.cuda(device))
File "C:\Users\aman\anaconda3\lib\site-packages\torch\nn\modules\module.py", line 203, in _apply
module._apply(fn)
File "C:\Users\aman\anaconda3\lib\site-packages\torch\nn\modules\module.py", line 225, in _apply
param_applied = fn(param)
File "C:\Users\aman\anaconda3\lib\site-packages\torch\nn\modules\module.py", line 307, in
return self.apply(lambda t: t.cuda(device))
File "C:\Users\aman\anaconda3\lib\site-packages\torch\cuda_init.py", line 153, in _lazy_init
torch._C._cuda_init()
RuntimeError: cuda runtime error (100) : no CUDA-capable device is detected at ..\aten\src\THC\THCGeneral.cpp:47

question about visualization of the features

The parameters in the ad_net

Hi, I found the parameters are setting as 1024 and 500 for two scenarios, how can I custom it in my cases?

ad_net = network.AdversarialNetwork(base_network.output_num() * class_num, 1024)

About the transfer loss in pytorch implementation

Hi, I'm confused about how you implement the entropy conditioning loss as presented in Equation (9) of the paper. I wonder whether you could describe the tricks you used in the implementation, since it is inconsistent with the paper. Specifically, in Line 34 below, I don't understand why 1 is added to the entropy.

Look forward to your reply. Thanks.

question about training process

Thanks for your impressive work! When I run your pytorch code "Amazon to Webcam" in my workstation, the printed content "iter: 00499, percision: 0.74214" means the test percision is only 0.74214, which is far from the percision you memtioned in "Conditional Adversarial Domain Adaptation". After training for a long time, the percision cannot improve. the Could you tell me how to train and test it?

SVHN -> MNIST Accuracy

Hi, I am trying to run the code for SVHN to MNIST.
However, with CDAN I only got ~84% accuracy (88.5% reported in the paper). Is there anything I should tune? or is the variance of accuracy very high?
Also, there is svhn_balanced.txt, how is it generated from svhn.txt? The class distributions are not uniform in both versions. Does the balanced version makes difference?
Thank you!

Confusion about the data

I am using the provided data of Image-clef. It is a little confusing what does "b_list.txt, c_list.txt, i_list.txt, list, p_list.txt" means.
From the paper I find that you use "c, i, p" for "Caltech-256, ImageNet ILSVRC 2012, Pascal VOC 2012", respectively. And I guess "b" is Bing. Just to check if I understand correctly.
Another problem is what is the difference between the files in "list" folder and the other four list files? In your readme file it seems that you are not using the files in the list folder. Is this the same setting in your paper?

does the args 'lr_mult' of SGD actually used in pytorch?

Question about test dataset.

Thanks for the impresive work and sharing code.

I notice in your code that target data set and test data set are same.

config["data"] = {"source":{"list_path":args.s_dset_path, "batch_size":36}, \
                          "target":{"list_path":args.t_dset_path, "batch_size":36}, \
                          "test":{"list_path":args.t_dset_path, "batch_size":4}}

Is this common way to deal with test data in Domain Adaptation?

AlexNet Pretrained model

Hi,
I got the following error.

FileNotFoundError: [Errno 2] No such file or directory: './alexnet.pth.tar'

Can you please provide the alexnet.pth.tar file? Thanks in advanced.

Quick question about visda target-set

Thanks for releasing your code. Impressive work and a good paper. I see that in the paper you report CDAN's result as 70% on VISDA dataset. Is it on visda's validation or test set. These two can be seen as two targets and the labels for test set is not released right.
So is 70% on the validation or after submitting your predictions on the test set to their codalab site.

Thanks a lot

how to set parameters when training office-home with version of pytorch

C:\Users\Alarak>python D:\桌面\CDAN-master\pytorch\train_image.py --gpu_id id --net ResNet50 --dset office-home --test_interval 2000 --s_dset_path D:\桌面\OfficeHomeDataset_10072016/Art.txt --t_dset_path D:\桌面\OfficeHomeDataset_10072016/Clipart.txt CDAN
Traceback (most recent call last):
File "D:\桌面\CDAN-master\pytorch\train_image.py", line 278, in
train(config)
File "D:\桌面\CDAN-master\pytorch\train_image.py", line 84, in train
dsets["source"] = ImageList(open(data_config["source"]["list_path"]).readlines(),
FileNotFoundError: [Errno 2] No such file or directory: 'D:\桌面\OfficeHomeDataset_10072016/Art.txt'

我按照贵课题组给出的命令格式，下载好office-home数据集后，将命令格式中的../data/office-home替换成了我的实际路径D:\桌面\OfficeHomeDataset_10072016，同时加入了train_image.py的路径。但运行后出现了上面的报错。我按照报错手动在相应目录下创建了Art.txt和clipart.txt后，出现了更多错误，请问应该如何正确设置相关参数？

question about pre-process of image for alexnet

I notice that the pytorch version of your code implements pre-process in pre_process.py all calling function transfrom.toTensor(), which scale range of image to [0,1]. Since your alexnet is based on this repo, in this repo, all image are read as RGB image in range of [0, 255] before feed into alexnet. Now that the Alexnet are pretrained with [0,255] range image, then finetuned(used as backbone) with [0.1] range image, will that hurt the performance of our algorithm?

Different results with multi-GPUs

I tested the results using one GPU and multi-GPUs in the same server on Office-31. The results are different.

For the CDAN+E on A->W task:
One GPU: around 75%
Multi-GPUs (>=2): 92%

I am still investigating the reason.

How to choose checkpoints?

Hi,
The total number of iterations for training is ~ 100000.
But how do you choose the checkpoints to report numbers in the paper?

Thanks!

Implementation of loss function

Thanks for implementation from Long, and there are two points confusing me

the total loss is defined as classification loss + transfer loss, which is different from equation(3) classification loss-transfer loss.
the domain discriminator is updated based on the total loss instead of the transfer loss.
Hoping for your help

path

May I ask which statement should I change to change the path of reading data and the test function in train_svhnminst.py file shows an error in args? How to solve this

How to use CDAN to train with my own dataset?

我想使用CDAN来训练一个我自己的模型。我希望使用pytorch版本。我的数据集包括两个部分，分别是两种肺炎的CT影像（已打好标签），我希望对此进行迁移学习，请问应该如何设置运行时使用的参数？

AlexNet pre-trained model

您好！
请问alexnet.pth.tar是caffe预训练的模型吗？能否提供一份？
万分感谢！

Alexnet’s based model performance in pytorch

Hi. Thanks for your excellent work. I am re-implementing your result for my project. I have an issue: when the base CNN is AlexNet, do you get the same performance on Office 31 data set on Pytorch as your model on Caffe.
I try to run your Pytorch code for the task Amazon to Webcam (anh use AlexNet) but get around 71% accuracy after more than 25000 iterations (which is not as reported in the paper). It is worth noticing that your Resnet version works perfectly fine on my computer.

How can I get usps dataset

DANN

Hello, there was an error when I was running the loss function of DANN. The dimensions did not match. Did you use this code when you were running DANN? I can't find DANN's code online right now.

Learning rate setting

classifier's learning rate set to 10 times than feature extractor according to paper, but It is setting the same in code, is that on purpose after many experiments or just an error?

Access for ImageCLEF dataset

Hello, I have a question about access for usage.
I want to access ImageCLEF dataset, but if I click the link, I can't access the dataset, because I don't have access right.
Can I have the access right for ImageCLEF dataset?

RuntimeError: size mismatch, m1: [64 x 640], m2: [1792 x 1792] at /opt/conda/conda-bld/pytorch_1550813258230/work/aten/src/THC/generic/THCTensorMathBlas.cu:266

Hello, I am trying to use this with my custom dataset. And I get a mismatch of tensors while training the network.

Traceback (most recent call last):
File "train_lidar.py", line 212, in
main()
File "train_lidar.py", line 181, in main
train(args, model, ad_net, random_layer, train_loader, train_loader1, optimizer, optimizer_ad, epoch, 0, args.method)
File "train_lidar.py", line 84, in train
loss += loss_func.CDAN([feature, softmax_output], ad_net, None, None, random_layer)
File "/home/acharyad/Documents/CDAN/pytorch/loss.py", line 27, in CDAN
ad_out = ad_net(op_out.view(-1, softmax_output.size(1) * feature.size(1)))
File "/home/acharyad/anaconda2/envs/xynet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/acharyad/Documents/CDAN/pytorch/network.py", line 420, in forward
x = self.ad_layer1(x)
File "/home/acharyad/anaconda2/envs/xynet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/acharyad/anaconda2/envs/xynet/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 67, in forward
return F.linear(input, self.weight, self.bias)
File "/home/acharyad/anaconda2/envs/xynet/lib/python3.7/site-packages/torch/nn/functional.py", line 1352, in linear
ret = torch.addmm(torch.jit._unwrap_optional(bias), input, weight.t())
RuntimeError: size mismatch, m1: [64 x 640], m2: [1792 x 1792] at /opt/conda/conda-bld/pytorch_1550813258230/work/aten/src/THC/generic/THCTensorMathBlas.cu:266
(xynet) acharyad@acharyad-G752VY:~/Documents/CDAN/pytorch$

IWCV

Hello, in your paper "Conditional Adversarial Domain Adaptation", you mentioned "We conduct serial-weighted cross-validation (IWCV) [48]to select hyper-parameters for all methods". I did not understand the meaning of this sentence, could you please answer.

some confusion about test results

Are the test10_crop and best_model used for final report results?