Comments (4)
(I'm not from Amazon).
The Environment
object is likely related to the lmdb's https://lmdb.readthedocs.io/en/release/#environment-class
Without the full stack trace, I can only guess.
My guess is something is trying to save the processor
which includes the preprocessor
, which includes the lookups to the lmdb tables. Perhaps the checkpoint portion.
Nevertheless, I suggest that you use a machine with GPU, because this is research code, and not 'battle-tested' in different environments (e.g. just pure CPU for training). I have testing for inference, CPU-only works, but didn't try training/fine-tuning with just CPU.
from refined.
My guess is that the torch data loader is trying spin up x number of worker processes to prepare the batches of data.
Problem is likely here:
https://github.com/amazon-science/ReFinED/blob/main/src/refined/dataset_reading/entity_linking/wikipedia_dataset.py
You can look it up, I think others also encountered similar issues with lmdb pickling when using multiple workers:
pytorch/vision#689 (comment)
Cheers
from refined.
This happens when there is no GPUs, but I am not sure how to workaround this.
from refined.
Thanks, it may be reasonable to try at a GPU machine.
Just for a reference, I put the full stack trace below.
/Users/yayamamo/.pyenv/versions/3.12.2/lib/python3.12/site-packages/torch/cuda/amp/grad_scaler.py:126: UserWarning: torch.cuda.amp.GradScaler is enabled, but CUDA is not available. Disabling.
warnings.warn(
14:35:33 - __main__ - INFO - Fine-tuning end-to-end EL
14:36:00 - __main__ - INFO - Fine-tuning end-to-end EL
INFO:__main__:Fine-tuning end-to-end EL
0%| | 0/10 [00:00<?, ?it/s]14:36:02 - __main__ - INFO - Starting epoch number 0
INFO:__main__:Starting epoch number 0
14:36:02 - __main__ - INFO - lr: 0.0
INFO:__main__:lr: 0.0
14:36:02 - __main__ - INFO - lr: 0.0
INFO:__main__:lr: 0.0
14:36:02 - __main__ - INFO - lr: 0.0
INFO:__main__:lr: 0.0
14:36:02 - __main__ - INFO - lr: 0.0
INFO:__main__:lr: 0.0
14:36:02 - __main__ - INFO - lr: 0.0
INFO:__main__:lr: 0.0
0%| | 0/10 [00:03<?, ?it/s]
Traceback (most recent call last):
File "/Users/yayamamo/git/ReFinED/src/refined/training/fine_tune/fine_tune.py", line 207, in <module>
main()
File "/Users/yayamamo/git/ReFinED/src/refined/training/fine_tune/fine_tune.py", line 44, in main
start_fine_tuning_task(refined=refined,
File "/Users/yayamamo/git/ReFinED/src/refined/training/fine_tune/fine_tune.py", line 95, in start_fine_tuning_task
run_fine_tuning_loops(refined=refined, fine_tuning_args=fine_tuning_args,
File "/Users/yayamamo/git/ReFinED/src/refined/training/fine_tune/fine_tune.py", line 114, in run_fine_tuning_loops
for step, batch in tqdm(enumerate(training_dataloader), total=len(training_dataloader)):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/yayamamo/.pyenv/versions/3.12.2/lib/python3.12/site-packages/torch/utils/data/dataloader.py", line 439, in __iter__
return self._get_iterator()
^^^^^^^^^^^^^^^^^^^^
File "/Users/yayamamo/.pyenv/versions/3.12.2/lib/python3.12/site-packages/torch/utils/data/dataloader.py", line 387, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/yayamamo/.pyenv/versions/3.12.2/lib/python3.12/site-packages/torch/utils/data/dataloader.py", line 1040, in __init__
w.start()
File "/Users/yayamamo/.pyenv/versions/3.12.2/lib/python3.12/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
^^^^^^^^^^^^^^^^^
File "/Users/yayamamo/.pyenv/versions/3.12.2/lib/python3.12/multiprocessing/context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/yayamamo/.pyenv/versions/3.12.2/lib/python3.12/multiprocessing/context.py", line 289, in _Popen
return Popen(process_obj)
^^^^^^^^^^^^^^^^^^
File "/Users/yayamamo/.pyenv/versions/3.12.2/lib/python3.12/multiprocessing/popen_spawn_posix.py", line 32, in __init__
super().__init__(process_obj)
File "/Users/yayamamo/.pyenv/versions/3.12.2/lib/python3.12/multiprocessing/popen_fork.py", line 19, in __init__
self._launch(process_obj)
File "/Users/yayamamo/.pyenv/versions/3.12.2/lib/python3.12/multiprocessing/popen_spawn_posix.py", line 47, in _launch
reduction.dump(process_obj, fp)
File "/Users/yayamamo/.pyenv/versions/3.12.2/lib/python3.12/multiprocessing/reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
TypeError: cannot pickle 'Environment' object
from refined.
Related Issues (20)
- The format of pem.lmdb is inconsistent between it in preprocess_all and in LookupsInferenceOnly class
- Early stopping in the preprocessing step
- Issue with running the code on Windows
- Will you release the KBED source code? HOT 1
- How do I add new entities to the EL system?
- How to optimize GPU usage?
- Same wikipedia entity title for all top k candidates HOT 2
- Issue with loading Additional Entities HOT 6
- cannot find the file "chosen_classes.txt" HOT 1
- Missing Wikipedia_data folder after running preprocess_all? HOT 2
- Weird runtime variations - are there any caching effects?
- Upload Multilingual Model
- Some questions about training dataset
- Inefficient Process for Adding New Entities in ReFinED
- Batch support at the time of inference HOT 2
- Support for AWS inferentia HOT 2
- Instructions for training a new model HOT 2
- Approach for limited labelled data
- Train new model on top of ReFinED
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from refined.