I tried to use Casanovo to make predictions on an MGF file containing 31,078 spectra, and it ran out of memory. Is there anything I can do to mitigate this problem, other than breaking up the input file into small pieces or switching to a different machine?
casanovo --mode=denovo --model_path=/net/noble/vol1/home/noble/proj/2022_varun_ls-casanovo/data/22-07-02_weights/pretrained_excl_mouse.ckpt --test_data_path=20190227_231_15%_1 --output_path=20190227_231_15%_1 --config_path=config.yaml
INFO: De novo sequencing with Casanovo...
INFO: Created a temporary directory at /tmp/tmpzqps6s6h
INFO: Writing /tmp/tmpzqps6s6h/_remote_module_non_scriptable.py
INFO: Reading 1 files...
20190227_231_15%_1/20190227_231_15%_1.mgf: 31078spectra [00:08, 3647.09spectra/s]
/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py:287: LightningDeprecationWarning: Passing `Trainer(accelerator='ddp')` has been deprecated in v1.5 and will be removed in v1.7. Use `Trainer(strategy='ddp')` instead.
f"Passing `Trainer(accelerator={self.distributed_backend!r})` has been deprecated"
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1
[W socket.cpp:401] [c10d] The server socket cannot be initialized on [::]:55938 (errno: 97 - Address family not supported by protocol).
[W socket.cpp:558] [c10d] The client socket cannot be initialized to connect to [localhost]:55938 (errno: 97 - Address family not supported by protocol).
[W socket.cpp:558] [c10d] The client socket cannot be initialized to connect to [localhost]:55938 (errno: 97 - Address family not supported by protocol).
INFO: Added key: store_based_barrier_key:1 to store for rank: 0
INFO: Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes.
----------------------------------------------------------------------------------------------------
distributed_backend=nccl
All distributed processes registered. Starting with 1 processes
----------------------------------------------------------------------------------------------------
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
Testing: 35% 11/31 [02:46<03:05, 9.26s/it]Traceback (most recent call last):
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/bin/casanovo", line 8, in <module>
sys.exit(main())
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/casanovo/casanovo.py", line 83, in main
denovo(test_data_path, model_path, config, output_path)
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/casanovo/denovo/train_test.py", line 246, in denovo
trainer.test(model_trained, loaders.test_dataloader())
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 911, in test
return self._call_and_handle_interrupt(self._test_impl, model, dataloaders, ckpt_path, verbose, datamodule)
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 685, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 954, in _test_impl
results = self._run(model, ckpt_path=self.tested_ckpt_path)
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1199, in _run
self._dispatch()
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1275, in _dispatch
self.training_type_plugin.start_evaluating(self)
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 206, in start_evaluating
self._results = trainer.run_stage()
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1286, in run_stage
return self._run_evaluate()
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1334, in _run_evaluate
eval_loop_results = self._evaluation_loop.run()
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 145, in run
self.advance(*args, **kwargs)
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 110, in advance
dl_outputs = self.epoch_loop.run(dataloader, dataloader_idx, dl_max_batches, self.num_dataloaders)
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 145, in run
self.advance(*args, **kwargs)
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 122, in advance
output = self._evaluation_step(batch, batch_idx, dataloader_idx)
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 213, in _evaluation_step
output = self.trainer.accelerator.test_step(step_kwargs)
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py", line 247, in test_step
return self.training_type_plugin.test_step(*step_kwargs.values())
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/ddp.py", line 450, in test_step
return self.lightning_module.test_step(*args, **kwargs)
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/casanovo/denovo/model.py", line 403, in test_step
pred_seqs, scores = self.predict_step(batch)
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/casanovo/denovo/model.py", line 188, in predict_step
return self(batch[0], batch[1])
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/casanovo/denovo/model.py", line 163, in forward
scores, tokens = self.greedy_decode(spectra, precursors)
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/casanovo/denovo/model.py", line 212, in greedy_decode
memories, mem_masks = self.encoder(spectra)
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/depthcharge/components/transformers.py", line 105, in forward
return self.transformer_encoder(peaks, src_key_padding_mask=mask), mask
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/torch/nn/modules/transformer.py", line 238, in forward
output = mod(output, src_mask=mask, src_key_padding_mask=src_key_padding_mask)
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/net/noble/vol1/home/noble/miniconda3/envs/casanovo_env/lib/python3.7/site-packages/torch/nn/modules/transformer.py", line 456, in forward
src_mask if src_mask is not None else src_key_padding_mask,
RuntimeError: CUDA out of memory. Tried to allocate 714.00 MiB (GPU 0; 7.79 GiB total capacity; 2.46 GiB already allocated; 632.94 MiB free; 3.65 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF