Comments (14)
Hi I'll work today on running a basic fine tuning example and looking at the memory footprint and get back to you!
from biomedlm.
Thanks!
from biomedlm.
Hi I'll get you a sample command in the next day or two, but this link here explains using deepspeed:
https://huggingface.co/docs/transformers/main_classes/deepspeed#deployment-with-one-gpu
You should be using the deepspeed
command and make sure you've installed deepspeed.
I think if you only have 1 GPU you're going to need to try ZeRO-offload ... this is explained in that link I provided.
But I'll try to get this working on my own and let you know ...
from biomedlm.
This is another good link:
huggingface/transformers#8771 (comment)
from biomedlm.
Over the next few days I'll try to get this working so we have a great working example of fine-tuning the model with 1 GPU !
from biomedlm.
Thank you, I will read up n try out also on my 1 GPU!
from biomedlm.
I've gotten the code running and it uses 20GB of GPU memory and 50GB of RAM. So as long as the machine with your A100 has plenty of RAM this could work with 1 GPU.
Set up environment:
# create conda environment
conda create -n biomedlm python=3.8.12 pytorch=1.12.1 torchdata cudatoolkit=11.6.0 -c pytorch -c nvidia
# activate conda environment
conda activate biomedlm
# install python dependencies
# note that flash attention can take 30m to install so it is normal for it to do nothing for 30m
pip install flash-attn
pip install numpy
pip install transformers==4.26.0 datasets=2.9.0 omegaconf wandb
pip install fairscale
pip install accelerate
DeepSpeed config: deepspeed_config.json
{
"optimizer": {
"type": "AdamW",
"params": {
"lr": 2e-06,
"betas": [
0.9,
0.999
],
"eps": 1e-8,
"weight_decay": 0.0
}
},
"scheduler": {
"type": "WarmupDecayLR",
"params": {
"total_num_steps": "auto",
"warmup_max_lr": 2e-06,
"warmup_num_steps": "auto"
}
},
"zero_optimization": {
"stage": 1,
"allgather_partitions": true,
"allgather_bucket_size": 5e8,
"reduce_scatter": true,
"reduce_bucket_size": 5e8,
"overlap_comm": true,
"contiguous_gradients": true,
"cpu_offload": true
},
"train_batch_size": "auto",
"fp16": {
"enabled": true
}
}
Command I ran in seqcls directory:
task=pubmedqa_hf ; datadir=data/$task ; export WANDB_PROJECT=biomedical-nlp-eval
deepspeed --num_gpus 1 --num_nodes 1 run_seqcls_gpt.py --tokenizer_name stanford-crfm/pubmed_gpt_tokenizer --model_name_or_path /path/to/model --train_file $datadir/train.json --validation_file $datadir/dev.json --test_file $datadir/test.json --do_train --do_eval --do_predict --per_device_train_batch_size 1 --gradient_accumulation_steps 2 --learning_rate 2e-06 --warmup_ratio 0.5 --num_train_epochs 20 --max_seq_length 560 --logging_steps 100 --save_strategy no --evaluation_strategy no --output_dir pubmedqa-finetune-demo --overwrite_output_dir --fp16 --use_flash true --seed 1 --run_name pubmedqa-finetune-demo --deepspeed deepspeed_config.json
Please let me know if you can get this working!
from biomedlm.
So to summarize it looks like you can run the sequence classification with 1 GPU and 40GB GPU memory (maybe even 20GB GPU memory) ... but I do think you are going to need something like 50GB of machine RAM to take advantage of the CPU offloading
from biomedlm.
That's really great news! Thank you so much for your help! I will try out and let you know. This machine has plenty of RAM too.
from biomedlm.
Hi J38, I am happy to share that I am able to complete the training following your instructions, without using --use_flash true. If I include --use_flash true, it will give me the following error. Still trying to troubleshoot what could be the cause. If you have any clue, do let me know. Thanks.
(biomedlm) dro@dro-DGX-Station:~/guathwa/pubmedgpt/finetune/seqcls_tr_dro$ deepspeed --num_gpus 1 --num_nodes 1 run_seqcls_gpt.py --tokenizer_name stanford-crfm/pubmed_gpt_tokenizer --model_name_or_path /home/dro/guathwa/pubmedgpt/finetune/seqcls_tr_dro/stanford-crfm-pubmedgpt --train_file $datadir/train.csv --validation_file $datadir/dev.csv --test_file $datadir/test.csv --do_train --do_eval --do_predict --per_device_train_batch_size 1 --gradient_accumulation_steps 2 --learning_rate 2e-06 --warmup_ratio 0.5 --num_train_epochs 20 --max_seq_length 560 --logging_steps 100 --save_strategy no --evaluation_strategy no --output_dir tr-finetune-demo --overwrite_output_dir --fp16 --use_flash true --seed 1 --run_name tr-finetune-demo --deepspeed deepspeed_config.json
[2023-02-09 12:38:17,346] [WARNING] [runner.py:186:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2023-02-09 12:38:17,590] [INFO] [runner.py:548:main] cmd = /home/dro/anaconda3/envs/biomedlm/bin/python -u -m deepspeed.launcher.launch --world_info=xxx --master_addr=127.0.0.1 --master_port=29500 --enable_each_rank_log=None run_seqcls_gpt.py --tokenizer_name stanford-crfm/pubmed_gpt_tokenizer --model_name_or_path /home/dro/guathwa/pubmedgpt/finetune/seqcls_tr_dro/stanford-crfm-pubmedgpt --train_file data/tr/train.csv --validation_file data/tr/dev.csv --test_file data/tr/test.csv --do_train --do_eval --do_predict --per_device_train_batch_size 1 --gradient_accumulation_steps 2 --learning_rate 2e-06 --warmup_ratio 0.5 --num_train_epochs 20 --max_seq_length 560 --logging_steps 100 --save_strategy no --evaluation_strategy no --output_dir tr-finetune-demo --overwrite_output_dir --fp16 --use_flash true --seed 1 --run_name tr-finetune-demo --deepspeed deepspeed_config.json
[2023-02-09 12:38:18,956] [INFO] [launch.py:142:main] WORLD INFO DICT: {'localhost': [0]}
[2023-02-09 12:38:18,956] [INFO] [launch.py:148:main] nnodes=1, num_local_procs=1, node_rank=0
[2023-02-09 12:38:18,956] [INFO] [launch.py:161:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0]})
[2023-02-09 12:38:18,956] [INFO] [launch.py:162:main] dist_world_size=1
[2023-02-09 12:38:18,956] [INFO] [launch.py:164:main] Setting CUDA_VISIBLE_DEVICES=0
[2023-02-09 12:38:24,928] [INFO] [comm.py:657:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
Traceback (most recent call last):
File "run_seqcls_gpt.py", line 634, in
main()
File "run_seqcls_gpt.py", line 221, in main
model_args, data_args, training_args = parser.parse_args_into_dataclasses()
File "/home/dro/anaconda3/envs/biomedlm/lib/python3.8/site-packages/transformers/hf_argparser.py", line 341, in parse_args_into_dataclasses
raise ValueError(f"Some specified arguments are not used by the HfArgumentParser: {remaining_args}")
ValueError: Some specified arguments are not used by the HfArgumentParser: ['--use_flash', 'true']
[2023-02-09 12:38:25,968] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 78542
[2023-02-09 12:38:25,969] [ERROR] [launch.py:324:sigkill_handler] ['/home/dro/anaconda3/envs/biomedlm/bin/python', '-u', 'run_seqcls_gpt.py', '--local_rank=0', '--tokenizer_name', 'stanford-crfm/pubmed_gpt_tokenizer', '--model_name_or_path', '/home/dro/guathwa/pubmedgpt/finetune/seqcls_tr_dro/stanford-crfm-pubmedgpt', '--train_file', 'data/tr/train.csv', '--validation_file', 'data/tr/dev.csv', '--test_file', 'data/tr/test.csv', '--do_train', '--do_eval', '--do_predict', '--per_device_train_batch_size', '1', '--gradient_accumulation_steps', '2', '--learning_rate', '2e-06', '--warmup_ratio', '0.5', '--num_train_epochs', '20', '--max_seq_length', '560', '--logging_steps', '100', '--save_strategy', 'no', '--evaluation_strategy', 'no', '--output_dir', 'tr-finetune-demo', '--overwrite_output_dir', '--fp16', '--use_flash', 'true', '--seed', '1', '--run_name', 'tr-finetune-demo', '--deepspeed', 'deepspeed_config.json'] exits with return code = 1
from biomedlm.
By the way I wasn't seeing any performance gain using flash attention, not sure if it just doesn't help or a bug in my system ... this bug you're reporting is because I forgot to push the updated code that has the flash attention option ... will try to push that soon !
from biomedlm.
Okay I pushed the updated code!
from biomedlm.
Saw the updated codes. I will close this issue. Thanks for the great help!
from biomedlm.
is there any code that works? e.g. colab? thanks!
from biomedlm.
Related Issues (20)
- BioMedLm for NER and sentiment analysis HOT 3
- 2pac style rap
- How to run the evaluator for MedQA-USMLE HOT 20
- Evaluate MedQA_USMLE on a saved model HOT 1
- Using a UMLS based retriever to enhance MedQA-USMLE performance HOT 1
- sentence embedding HOT 1
- How to cite BioMedLM HOT 3
- How can I try question answering ?
- Unexpected bug for generate function
- Seqcls for multi-label task HOT 2
- Finetuning BioMedLM for Medical QA HOT 10
- Running generation batch misses file HOT 4
- Max Input and Output length HOT 2
- Generation is suspiciously slow for long sequences HOT 4
- The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. HOT 2
- torch.distributed.launch on eight 40G A100, CUDA out of memory.
- Requesting for dataset used in the section "Free Response Question Answering"
- I set tokenizer.pad_token = tokenizer.eos_token and found tokenizer.pad_token_id==None, which leads to an error.
- can it be fine tuned in samller GPU HOT 10
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from biomedlm.