athms / learning-from-brains Goto Github PK

Self-supervised learning techniques for neuroimaging data inspired by prominent learning frameworks in natural language processing + One of the broadest neuroimaging datasets used for pre-training to date.

Dockerfile 0.43% Python 99.57%

neuroimaging research-project transfer-learning

learning-from-brains's Introduction

learning-from-brains's People

Contributors

Stargazers

Watchers

Forkers

abigail-oppong bjsmith uosan zijiand rkobler jithendran-adikesavan ppwangyc wyj1996 heodawoon camgbus rgbayrak neurdylab amberxuqianchen shaohuasonggit ssadhukha

learning-from-brains's Issues

Arguments for other datasets?

Cool work! Could you also provide arguments you used for other datasets, including both upstream and downstream stages?

Training details for the upstream dataset

Very nice work!!! Are there some training guidelines (the code usage) for the upstream dataset? It would be great if you can give that. Thank you so much!

Why did you insert [CLS] token at the last? Should't it be first?

from the code, it seems that you append the [CLS] to the last of the sequence, shouldn't it be at the first position?

learning-from-brains/src/embedder/csm.py

Lines 157 to 166 in b061ef9

    
           inputs_embeds.append( 
        
               torch.cat( 
        
                   [ 
        
                       batch[inputs_key][i, :sequence_lengths[i], :], 
        
                       self.cls_embed[0], 
        
                       batch[inputs_key][i, sequence_lengths[i]:, :] 
        
                   ], 
        
                   dim=0 
        
               ) 
        
           )

.git is 14GB

Even though the data/models are removed, the version history persists. Repo is downloaded as a 14GB folder.

Here is a solution I found: https://stackoverflow.com/questions/11050265/remove-large-pack-file-created-by-git

Not able to download your pretrained model due to your LFS bandwidth

Hi! When I used git LFS to get your pretrained model, it shows that you don't have enought data quota, i.e. IFS bandwidth:

Downloading results/models/upstream/BERT_lrs-4_hds-12_embd-768_train-BERT_lr-0001_bs-96_drp-01_msk-02/model_final/pytorch_model.bin (128 MB)
Error downloading object: results/models/upstream/BERT_lrs-4_hds-12_embd-768_train-BERT_lr-0001_bs-96_drp-01_msk-02/model_final/pytorch_model.bin (6df0990): Smudge error: Error downloading results/models/upstream/BERT_lrs-4_hds-12_embd-768_train-BERT_lr-0001_bs-96_drp-01_msk-02/model_final/pytorch_model.bin (6df0990175a1cb7df29a504c2be9a8fd2f95d3d816e746a80d1aeef6eeabaf9f): batch response: This repository is over its data quota. Account responsible for LFS bandwidth should purchase more data packs to restore access.

Could you kindly fix it or let us know any other alternative way to get your pretrained weights? Thank you.

Estimated preprocessing time for parcellating fMRI data

Hello, thank you for your excellent work!
I am currently trying to apply your model to my custom dataset. Nevertheless, the preprocessing time seems a bit long. (for Difumo with 1024 networks).
Could you please provide an estimate for the preprocessing (parcellating) time required to convert fMRI data into vectors?

UnpicklingError exception when running train.py

We were very impressed by the results reported in the paper and wanted to replicate the work and try to apply it in our own area.

However, there's an UnpicklingError exception when running train.py, which perhaps indicates a problem with the specification of requirements.txt and the Dockerfile?

We downloaded the repo, and created environments in two different ways:

Adapted requirements.txt into a environment.yml
Used the docker file to create a docker image, then from that, created a singularity container and ran within that

In both environments we run into the same UnpicklingError exception:

> python3 scripts/train.py \
>     --data 'data/downstream/ds002105' \
>     --n-train-subjects-per-dataset 11 \
>     --n-val-subjects-per-dataset 3 \
>     --n-test-subjects-per-dataset 9 \
>     --architecture 'GPT' \
>     --pretrained-model 'results/models/upstream/GPT_lrs-4_hds-12_embd-768_train-CSM_lr-0005_bs-192_drp-01/model_final/pytorch_model.bin' \
>     --training-style 'decoding' \
>     --decoding-target 'task_label.pyd' \
>     --num-decoding-classes 26 \
>     --training-steps 10000 \
>     --per-device-training-batch-size 64 \
>     --learning-rate 1e-4 \
>     --log-dir 'results/models/downstream/ds002105' \
>     --log-every-n-steps 1000
/usr/local/lib/python3.8/dist-packages/nilearn/input_data/__init__.py:27: FutureWarning: The import path 'nilearn.input_data' is deprecated in version 0.9. Importing from 'nilearn.input_data' will be possible at least until release 0.13.0. Please import from 'nilearn.maskers' instead.
  warnings.warn(message, FutureWarning)
Saving tarfile split to results/models/downstream/ds002105/GPT_lrs-4_hds-12_embd-768_train-decoding_lr-0001_bs-64_drp-01_2022-10-13_16-43-26/tarfile_paths_split.json
Loading pretrained model from results/models/upstream/GPT_lrs-4_hds-12_embd-768_train-CSM_lr-0005_bs-192_drp-01/model_final/pytorch_model.bin
loading the following pre-trained path:
results/models/upstream/GPT_lrs-4_hds-12_embd-768_train-CSM_lr-0005_bs-192_drp-01/model_final/pytorch_model.bin
cudaisavailable is True
loading with cpu
Traceback (most recent call last):
  File "scripts/train.py", line 1077, in <module>
    trainer = train()
  File "scripts/train.py", line 237, in train
    trainer = make_trainer(
  File "/learningfrombrains/scripts/../src/trainer/make.py", line 207, in make_trainer
    trainer = Trainer(
  File "/learningfrombrains/scripts/../src/trainer/base.py", line 14, in __init__
    super().__init__(**kwargs)
  File "/usr/local/lib/python3.8/dist-packages/transformers/trainer.py", line 313, in __init__
    model = self.call_model_init()
  File "/usr/local/lib/python3.8/dist-packages/transformers/trainer.py", line 980, in call_model_init
    model = self.model_init(trial)
  File "scripts/train.py", line 226, in model_init
    return make_model(model_config)
  File "scripts/train.py", line 366, in make_model
    model.from_pretrained(model_config["pretrained_model"])
  File "/learningfrombrains/scripts/../src/model.py", line 69, in from_pretrained
    pretrained = torch.load(pretrained_path, map_location=torch.device('cpu'))
  File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 713, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 920, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, 'v'.

An aside: Along the way, we did make a small tweak to https://github.com/athms/learning-from-brains/blob/master/src/model.py#L60 , which we modified from

if next(self.parameters()).is_cuda:

if torch.cuda.is_available():

because with the first version, it didn't seem to recognize our cuda device. However, the UnpicklingError exception occurs regardless of which version we're using (i.e., whether we use pretrained = torch.load(pretrained_path) or pretrained = torch.load(pretrained_path, map_location=torch.device('cpu'))).

I don't know why we're getting an UnpicklingError here. My first guess is that the version of pickle in the Dockerfile and requirements.txt doesn't match the version that the torch code actually expects. That would be easier to sort out if you were able to check your own implementation to see if the versions in the Dockerfile and requirements.txt are the versions your code actually runs on.

License

Thank you for your work. What is the license of this? We are interested in your ds000212 dataset.

However, without a license, the default copyright laws apply, meaning that you retain all rights to your source code and no one may reproduce, distribute, or create derivative works from your work.

https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/licensing-a-repository

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

	inputs_embeds.append(
	torch.cat(
	[
	batch[inputs_key][i, :sequence_lengths[i], :],
	self.cls_embed[0],
	batch[inputs_key][i, sequence_lengths[i]:, :]
	],
	dim=0
	)
	)