We were very impressed by the results reported in the paper and wanted to replicate the work and try to apply it in our own area.
However, there's an UnpicklingError exception when running train.py, which perhaps indicates a problem with the specification of requirements.txt and the Dockerfile?
> python3 scripts/train.py \
> --data 'data/downstream/ds002105' \
> --n-train-subjects-per-dataset 11 \
> --n-val-subjects-per-dataset 3 \
> --n-test-subjects-per-dataset 9 \
> --architecture 'GPT' \
> --pretrained-model 'results/models/upstream/GPT_lrs-4_hds-12_embd-768_train-CSM_lr-0005_bs-192_drp-01/model_final/pytorch_model.bin' \
> --training-style 'decoding' \
> --decoding-target 'task_label.pyd' \
> --num-decoding-classes 26 \
> --training-steps 10000 \
> --per-device-training-batch-size 64 \
> --learning-rate 1e-4 \
> --log-dir 'results/models/downstream/ds002105' \
> --log-every-n-steps 1000
/usr/local/lib/python3.8/dist-packages/nilearn/input_data/__init__.py:27: FutureWarning: The import path 'nilearn.input_data' is deprecated in version 0.9. Importing from 'nilearn.input_data' will be possible at least until release 0.13.0. Please import from 'nilearn.maskers' instead.
warnings.warn(message, FutureWarning)
Saving tarfile split to results/models/downstream/ds002105/GPT_lrs-4_hds-12_embd-768_train-decoding_lr-0001_bs-64_drp-01_2022-10-13_16-43-26/tarfile_paths_split.json
Loading pretrained model from results/models/upstream/GPT_lrs-4_hds-12_embd-768_train-CSM_lr-0005_bs-192_drp-01/model_final/pytorch_model.bin
loading the following pre-trained path:
results/models/upstream/GPT_lrs-4_hds-12_embd-768_train-CSM_lr-0005_bs-192_drp-01/model_final/pytorch_model.bin
cudaisavailable is True
loading with cpu
Traceback (most recent call last):
File "scripts/train.py", line 1077, in <module>
trainer = train()
File "scripts/train.py", line 237, in train
trainer = make_trainer(
File "/learningfrombrains/scripts/../src/trainer/make.py", line 207, in make_trainer
trainer = Trainer(
File "/learningfrombrains/scripts/../src/trainer/base.py", line 14, in __init__
super().__init__(**kwargs)
File "/usr/local/lib/python3.8/dist-packages/transformers/trainer.py", line 313, in __init__
model = self.call_model_init()
File "/usr/local/lib/python3.8/dist-packages/transformers/trainer.py", line 980, in call_model_init
model = self.model_init(trial)
File "scripts/train.py", line 226, in model_init
return make_model(model_config)
File "scripts/train.py", line 366, in make_model
model.from_pretrained(model_config["pretrained_model"])
File "/learningfrombrains/scripts/../src/model.py", line 69, in from_pretrained
pretrained = torch.load(pretrained_path, map_location=torch.device('cpu'))
File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 713, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 920, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, 'v'.
if next(self.parameters()).is_cuda:
if torch.cuda.is_available():
because with the first version, it didn't seem to recognize our cuda device. However, the UnpicklingError exception occurs regardless of which version we're using (i.e., whether we use pretrained = torch.load(pretrained_path)
or pretrained = torch.load(pretrained_path, map_location=torch.device('cpu'))
).
I don't know why we're getting an UnpicklingError here. My first guess is that the version of pickle
in the Dockerfile and requirements.txt doesn't match the version that the torch code actually expects. That would be easier to sort out if you were able to check your own implementation to see if the versions in the Dockerfile and requirements.txt are the versions your code actually runs on.