brain-score / model-tools Goto Github PK

View Code? Open in Web Editor NEW

8.0 2.0 27.0 26.88 MB

Helper functions to extract model activations and translate from Machine Learning to Neuroscience

License: MIT License

Python 100.00%

model-tools's Introduction

Brain-Score Model Tools

Utility for generic models to interact with brain data.

Environment variables

Environment variables are prefixed with MT_.

Variable	Description
MT_MULTITHREAD	whether or not to use multi-threading
MT_HOME	path to framework root
MT_IMAGENET_PATH	path to ImageNet file containing the validation image set

model-tools's People

Contributors

Stargazers

Watchers

model-tools's Issues

Cannot install model-tools

ollecting git+https://github.com/brain-score/model-tools.git
Cloning https://github.com/brain-score/model-tools.git to /tmp/pip-req-build-10va2fci
Running command git clone --filter=blob:none --quiet https://github.com/brain-score/model-tools.git /tmp/pip-req-build-10va2fci
Resolved https://github.com/brain-score/model-tools.git to commit 75365b5
Preparing metadata (setup.py) ... done
Collecting brainio@ git+https://github.com/brain-score/brainio (from model-tools==0.1.0)
Cloning https://github.com/brain-score/brainio to /tmp/pip-install-4lpjkv0w/brainio_a2958f4ac95540b0abc3f8fa26316e6c
Running command git clone --filter=blob:none --quiet https://github.com/brain-score/brainio /tmp/pip-install-4lpjkv0w/brainio_a2958f4ac95540b0abc3f8fa26316e6c
Resolved https://github.com/brain-score/brainio to commit 9bc00b21a82f4b3637117a6329b1f629df3170cd
Preparing metadata (setup.py) ... done
Collecting brain-score@ git+https://github.com/brain-score/brain-score (from model-tools==0.1.0)
Cloning https://github.com/brain-score/brain-score to /tmp/pip-install-4lpjkv0w/brain-score_3025ffe44f7448c480537b97c3a59fd2
Running command git clone --filter=blob:none --quiet https://github.com/brain-score/brain-score /tmp/pip-install-4lpjkv0w/brain-score_3025ffe44f7448c480537b97c3a59fd2
Resolved https://github.com/brain-score/brain-score to commit 25c9abde4479c1422cdffab7c0380dd05d21d125
Preparing metadata (setup.py) ... done
Collecting result_caching@ git+https://github.com/brain-score/result_caching (from model-tools==0.1.0)
Cloning https://github.com/brain-score/result_caching to /tmp/pip-install-4lpjkv0w/result-caching_fbb0aaa46ed94ae6a68a7d954ea8bd95
Running command git clone --filter=blob:none --quiet https://github.com/brain-score/result_caching /tmp/pip-install-4lpjkv0w/result-caching_fbb0aaa46ed94ae6a68a7d954ea8bd95
Resolved https://github.com/brain-score/result_caching to commit 27ace7e892a2cbfbcb654d027e8d108e168986d4
Preparing metadata (setup.py) ... done
Requirement already satisfied: h5py in /home/atuin/b112dc/b112dc10/software/privat/conda/envs/model-training/lib/python3.9/site-packages (from model-tools==0.1.0) (3.1.0)
Requirement already satisfied: Pillow in /home/hpc/b112dc/b112dc10/.local/lib/python3.9/site-packages (from model-tools==0.1.0) (10.0.0)
Requirement already satisfied: numpy in /home/atuin/b112dc/b112dc10/software/privat/conda/envs/model-training/lib/python3.9/site-packages (from model-tools==0.1.0) (1.26.1)
Requirement already satisfied: tqdm in /home/atuin/b112dc/b112dc10/software/privat/conda/envs/model-training/lib/python3.9/site-packages (from model-tools==0.1.0) (4.66.1)
Requirement already satisfied: torch in /home/hpc/b112dc/b112dc10/.local/lib/python3.9/site-packages (from model-tools==0.1.0) (2.0.1)
Requirement already satisfied: torchvision in /home/hpc/b112dc/b112dc10/.local/lib/python3.9/site-packages (from model-tools==0.1.0) (0.15.2)
INFO: pip is looking at multiple versions of model-tools to determine which version is compatible with other requirements. This could take a while.
ERROR: Could not find a version that satisfies the requirement tensorflow==1.15 (from model-tools) (from versions: 2.5.0, 2.5.1, 2.5.2, 2.5.3, 2.6.0rc0, 2.6.0rc1, 2.6.0rc2, 2.6.0, 2.6.1, 2.6.2, 2.6.3, 2.6.4, 2.6.5, 2.7.0rc0, 2.7.0rc1, 2.7.0, 2.7.1, 2.7.2, 2.7.3, 2.7.4, 2.8.0rc0, 2.8.0rc1, 2.8.0, 2.8.1, 2.8.2, 2.8.3, 2.8.4, 2.9.0rc0, 2.9.0rc1, 2.9.0rc2, 2.9.0, 2.9.1, 2.9.2, 2.9.3, 2.10.0rc0, 2.10.0rc1, 2.10.0rc2, 2.10.0rc3, 2.10.0, 2.10.1, 2.11.0rc0, 2.11.0rc1, 2.11.0rc2, 2.11.0, 2.11.1, 2.12.0rc0, 2.12.0rc1, 2.12.0, 2.12.1, 2.13.0rc0, 2.13.0rc1, 2.13.0rc2, 2.13.0, 2.13.1, 2.14.0rc0, 2.14.0rc1, 2.14.0)
ERROR: No matching distribution found for tensorflow==1.15

empty model list will not print anything

Right now, if the list of base models is empty, nothing will be output:

model-tools/model_tools/check_submission/check_models.py

Line 37 in ce3b520

for model in module.get_model_list():

We should also check that there is >=1 model in the list.

StimulusSet in brainio does not have get_image

File "/home/allagash/miniconda3/envs/fuzzy-spoon/lib/python3.7/site-packages/pandas/core/generic.py", line 5487, in getattr
return object.getattribute(self, name)
AttributeError: 'StimulusSet' object has no attribute 'get_image'

model-tools/model_tools/activations/core.py

Line 54 in dfee128

    
           stimuli_paths = [str(stimulus_set.get_stimulus(stimulus_id)) for stimulus_id in stimulus_set['stimulus_id']]

This line appears to be referencing a function that no longer exists in StimulusSet
https://github.com/brain-score/brainio/blob/main/brainio/stimuli.py#L10

key error when layers have different flattened coordinates

I was running brainscore on some transformer models and ran into an issue with channel_x not being a key in a dictionary. It seems to be noted in the code:

model-tools/model_tools/activations/core.py

Line 172 in dfee128

    
           # using these names/keys for all assemblies results in KeyError if the first layer contains flatten_coord_names

Log here for the failed model:
http://braintree.mit.edu:8080/job/run_benchmarks/3861/parsed_console/job/run_benchmarks/3861/parsed_console/log.html

I found that I also could not include the final fc (logits) layer as one of the places where I was grabbing the activations, as this caused a key error ('embedding' was missing). I just removed these parts of the model from scoring, but it seems like something that others might run into.

merging layer assemblies for many layers takes very long

when merging many layer assemblies, the merge takes very long.

FileNotFoundError with Majaj V4 but not Majaj IT

I have a script which fits models against Majaj IT and Majaj V4. IT runs fine, but when I try specifying V4 instead, I receive the following stack trace and error:

  File "python3.7/site-packages/model_tools/activations/core.py", line 79, in _from_paths_stored
    return self._from_paths(layers=layers, stimuli_paths=stimuli_paths)
  File "python3.7/site-packages/model_tools/activations/core.py", line 85, in _from_paths
    layer_activations = self._get_activations_batched(stimuli_paths, layers=layers, batch_size=self._batch_size)
  File "python3.7/site-packages/model_tools/activations/core.py", line 135, in _get_activations_batched
    batch_activations = hook(batch_activations)
  File "python3.7/site-packages/model_tools/activations/pca.py", line 23, in __call__
    self._ensure_initialized(batch_activations.keys())
  File "python3.7/site-packages/model_tools/activations/pca.py", line 40, in _ensure_initialized
    n_components=self._n_components)
  File "python3.7/site-packages/result_caching/__init__.py", line 231, in wrapper
    self.save(result, function_identifier)
  File "python3.7/site-packages/result_caching/__init__.py", line 125, in save
    os.rename(savepath_part, path)
FileNotFoundError: [Errno 2] No such file or directory: '/om2/user/rylansch/FieteLab-Reg-Eff-Dim/.result_caching/model_tools.activations.pca.LayerPCA._pcas/identifier=architecture:RF-100-cosine-bernoulli-b-ns|task:None|kind:Rand|source:RS|lyr:mlp|agg:pca|n_comp:1000,n_components=1000.pkl.filepart' -> '/om2/user/rylansch/FieteLab-Reg-Eff-Dim/.result_caching/model_tools.activations.pca.LayerPCA._pcas/identifier=architecture:RF-100-cosine-bernoulli-b-ns|task:None|kind:Rand|source:RS|lyr:mlp|agg:pca|n_comp:1000,n_components=1000.pkl'

I'm not familiar with result_caching. Could someone please help me understand why this problem emerges for V4 but not IT? What's the solution to fixing it?

speed up activations retrieval

when we started writing model-tools, we were primarily thinking of neuroscience stimulus sets with only a couple thousand images. Speed was therefore less of an issue because even with a suboptimal implementation, a few k images are quickly passed through the network.

We are now evaluating models on increasingly large ML benchmarks (e.g. brain-score/vision#232) and due to the slow activations retrieval, the evaluation takes very long (days), sometimes timing out on the cluster. We therefore need to speed up the activations extraction.

Models are already using cuda when possible, I believe the main bottleneck is actually the loading of images which is currently single-threaded. We should profile the code to confirm this and (if true) use multiple workers to load the images to pass into the model (e.g. here and here for pytorch, ideally using existing tools such as DataLoader).

LayerPCA.is_hooked() not yet implemented?

is there an implementation already in the works?

delete submitted models and score consistency

Hey,
I just realized that the average model score is different in all three places (profile, model page, competition leaderboard). I guess it's due to benchmarks that have and have not been added in the places. I guess it would clearer and more comprehensive to have this unified on all tables.

Also I was wondering if you might want to implement a function to delete submitted models from the database, as the number of model submissions might increase..

Thanks!

//EDIT wrong repo :( see here: brain-score/brain-score.web#129

Error encountered while testing a base model.

Hello,

I have encountered an error while testing a base_models.py implementation. Here is the log and the line where it breaks. There is a comment there that may be describing the issue, but I'm not sure how to interpret it.

It looks like this traces back to the behavioral benchmark, specifically here. Is this assuming the model has a layer named 'logits'? The model I am testing does not have this.

Any advice on how to debug this would be very appreciated.

Thank you,
Cory

check_submission/images are not copied with setup.py

need to update MANIFEST to include those images

ModelCommitment does not support multiple layer assignments for behavioral benchmarks

This can be changed by making a few changes on model_tools/brain_transformation/behavior.py

I've tested and it is working.

record/retrieve activations for single neuroids

@tiagogmarques requires the ability to record from single neurons as specified by their neuroid_id

model-tools 0.2.0 version and library dependency

In setup.py, the version number still says 0.1.0 rather than 0.2.0. Also in the version release tar.gz file, the result_caching dependency links to result_caching @ git+https://github.com/mschrimpf/result_caching rather than result_caching @ git+https://github.com/brain-score/result_caching

cannot use python 3.9 with tensorflow==1.15

We're currently still on TensorFlow 1:

model-tools/setup.py

Line 18 in 2919700

"tensorflow==1.15",

Python 3.9 so far does not seem to work with 1.15, we should upgrade to TensorFlow 2 to avoid specifying versions directly

reduce memory requirements

running large models like ResNet takes insane amounts of memory (~450GB).
This is probably due to us collecting all the layer activations across batches, and could be optimized by continuously writing batches to disk

deal with fitting stimuli for LogitsBehavior (imagenet)

The current LogitsBehavior implementation assumes that the fitting_stimuli will just be the string "imagenet".
We should remove this restriction and instead fit a linear classifier to the features in response to the fitting stimuli (see brain-score/vision#262).

OOM error caused by minimal memory requirements?

I'm getting an OOM error that allegedly says not enough memory can be found for 1.19 GB. I'm running SLURM jobs with ~80GB.

How can I investigate the cause? Is it possible that previous layers' activations are consuming memory? If so, is there some flag or some mechanism to free that memory?

Traceback (most recent call last):
  File "scripts/compute_eigenspectra_and_fit_encoding_model.py", line 63, in <module>
    activations_extractor=model,
  File "/home/gridsan/rschaeffer/FieteLab-Reg-Eff-Dim/regression_dimensionality/custom_model_tools/eigenspectrum.py", line 44, in fit
    image_transform_name=transform_name)
  File "/home/gridsan/rschaeffer/FieteLab-Reg-Eff-Dim/regdim_venv/lib/python3.7/site-packages/result_caching/__init__.py", line 223, in wrapper
    result = function(**reduced_call_args)
  File "/home/gridsan/rschaeffer/FieteLab-Reg-Eff-Dim/regression_dimensionality/custom_model_tools/eigenspectrum.py", line 141, in _fit
    activations = self._extractor(image_paths, layers=[layer])
  File "/home/gridsan/rschaeffer/FieteLab-Reg-Eff-Dim/regdim_venv/lib/python3.7/site-packages/model_tools/activations/pytorch.py", line 41, in __call__
    return self._extractor(*args, **kwargs)
  File "/home/gridsan/rschaeffer/FieteLab-Reg-Eff-Dim/regdim_venv/lib/python3.7/site-packages/model_tools/activations/core.py", line 43, in __call__
    return self.from_paths(stimuli_paths=stimuli, layers=layers, stimuli_identifier=stimuli_identifier)
  File "/home/gridsan/rschaeffer/FieteLab-Reg-Eff-Dim/regdim_venv/lib/python3.7/site-packages/model_tools/activations/core.py", line 73, in from_paths
    activations = fnc(layers=layers, stimuli_paths=reduced_paths)
  File "/home/gridsan/rschaeffer/FieteLab-Reg-Eff-Dim/regdim_venv/lib/python3.7/site-packages/model_tools/activations/core.py", line 85, in _from_paths
    layer_activations = self._get_activations_batched(stimuli_paths, layers=layers, batch_size=self._batch_size)
  File "/home/gridsan/rschaeffer/FieteLab-Reg-Eff-Dim/regdim_venv/lib/python3.7/site-packages/model_tools/activations/core.py", line 141, in _get_activations_batched
    layer_activations[layer_name] = np.concatenate((layer_activations[layer_name], layer_output))
  File "<__array_function__ internals>", line 6, in concatenate
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 1.19 GiB for an array with shape (1216, 256, 32, 32) and data type float32

check_submission not working with Stochastic models

Getting error when testing stochastic model for model submission. From what I could infer the MockBenchmark does not average trial presentations and then gets a conflicting size error. Check log below.

Traceback (most recent call last):
File "brain_models.py", line 50, in
check_models.check_brain_models(name)
File "/braintree/home/tmarques/brainscore/model-tools/model_tools/check_submission/check_models.py", line 24, in check_brain_models
check_brain_model_processing(model)
File "/braintree/home/tmarques/brainscore/model-tools/model_tools/check_submission/check_models.py", line 30, in check_brain_model_processing
score = benchmark(model, do_behavior=True)
File "/braintree/home/tmarques/brainscore/model-tools/model_tools/check_submission/check_models.py", line 88, in call
candidate.look_at(self.assembly.stimulus_set)
File "/braintree/home/tmarques/brainscore/model-tools/model_tools/brain_transformation/init.py", line 53, in look_at
return self.behavior_model.look_at(stimuli)
File "/braintree/home/tmarques/brainscore/model-tools/model_tools/brain_transformation/behavior.py", line 22, in look_at
return self.current_executor.look_at(stimuli, *args, **kwargs)
File "/braintree/home/tmarques/brainscore/model-tools/model_tools/brain_transformation/behavior.py", line 50, in look_at
dims=['choice', 'presentation'])
File "/braintree/home/tmarques/anaconda3/envs/model-submission/lib/python3.6/site-packages/brainio_base/assemblies.py", line 24, in init
super(DataAssembly, self).init(*args, **kwargs)
File "/braintree/home/tmarques/anaconda3/envs/model-submission/lib/python3.6/site-packages/xarray/core/dataarray.py", line 230, in init
coords, dims = _infer_coords_and_dims(data.shape, coords, dims)
File "/braintree/home/tmarques/anaconda3/envs/model-submission/lib/python3.6/site-packages/xarray/core/dataarray.py", line 81, in _infer_coords_and_dims
'coordinate %r' % (d, sizes[d], s, k))
ValueError: conflicting sizes for dimension 'choice': length 1 on the data but length 20 on coordinate 'synset'

brain-score / model-tools Goto Github PK

model-tools's Introduction

Brain-Score Model Tools

Environment variables

model-tools's People

Contributors

Stargazers

Watchers

Forkers

model-tools's Issues

Recommend Projects

Recommend Topics

Recommend Org