Coder Social home page Coder Social logo

biomedlm's People

Contributors

j38 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

biomedlm's Issues

Using a UMLS based retriever to enhance MedQA-USMLE performance

We intended to supplement this MedQA-USMLE evaluation with a search on UMLS.
UMLS is a large biomedical corpus and can be queried using API’s. We expect the
new accuracy to go above the baseline, when adding searched medical term descriptions to the prompt during evaluation.

The techniques for adding additional context is:
a) for each answer choice, do a lookup against UMLS, and find additional context, based on the answer choice. Concatenate new context with each question/answer pair.

For example prompt is the original question and actual answer is c3: Common iliac artery aneurysm
The UMLS retriever returns searcher1 for c1, searcher2 for c2, searcher3 for c3, searcher4 for c4

prompt: A 68-year-old male comes to the physician for evaluation of right flank pain. He has a history of diabetes and peripheral artery disease. His blood pressure is 160/90 mm Hg. Physical examination shows abdominal tenderness and right flank tenderness. An ultrasound shows dilation of the right ureter and renal pelvis. Which of the following is the most likely underlying cause of this patient's condition? c1: Renal artery stenosis c2: Benign prostatic hyperplasia c3: Common iliac artery aneurysm c4: Urethral stricture
searcher1: Narrowing of a main artery in the kidney. searcher2: Obstructive nephropathy which has developed in a patient with evidence of bladder outflow obstruction caused by benign prostatic hypertrophy. searcher3: An artery arising from the bifurcation of the abdominal aorta which then bifurcates forming the internal and external iliac arteries. searcher4: Narrowing of the urethra associated with inflammation or scar tissue. [HPO:probinson]
predicted_label: 0 actual_label: 2

Our hypothesis is that the model will have better accuracy when answers are supplemented with the definitions from searcher1 , searcher2 etc.

Here is how the supplemented test data with searcher results looks like (see bolded above and below)
{"id": "test-00006", "sent1": "A 68-year-old male comes to the physician for evaluation of right flank pain. He has a history of diabetes and peripheral artery disease. His blood pressure is 160/90 mm Hg. Physical examination shows abdominal tenderness and right flank tenderness. An ultrasound shows dilation of the right ureter and renal pelvis. Which of the following is the most likely underlying cause of this patient's condition?", "sent2": "", "ending0": "Narrowing of a main artery in the kidney. Renal artery stenosis", "ending1": "Obstructive nephropathy which has developed in a patient with evidence of bladder outflow obstruction caused by benign prostatic hypertrophy. Benign prostatic hyperplasia", "ending2": "An artery arising from the bifurcation of the abdominal aorta which then bifurcates forming the internal and external iliac arteries. Common iliac artery aneurysm", "ending3": "Narrowing of the urethra associated with inflammation or scar tissue. [HPO:probinson] Urethral stricture", "label": 2}

However, the accuracy actually slightly drops when using the retriever.
Do we need to change any command line parameters because the answers are longer?Any thoughts would be welcome on why we are not seeing improvement in results
Here is what we used:
deepspeed --num_gpus 1 --num_nodes 1 run_multiple_choice.py
--tokenizer_name stanford-crfm/pubmed_gpt_tokenizer
--model_name_or_path "/content/drive/MyDrive/Colab Notebooks/SavedModel300"
--train_file $datadir/train300.json
--validation_file $datadir/dev300.json
--test_file $datadir/test300newRetr.json
--do_predict
--per_device_train_batch_size 1
--gradient_accumulation_steps 2
--learning_rate 2e-06
--warmup_ratio 0.5
--num_train_epochs 20
--max_seq_length 560
--logging_steps 100
--save_strategy no
--evaluation_strategy no
--output_dir medqa-finetune-demo
--overwrite_output_dir
--fp16
--seed 1
--run_name medqa-finetune-demo
--deepspeed deepspeed_config.json

can it be fine tuned in samller GPU

Hi, could the model be fine-tuned in just a few smaller GPUs, like 4 A40 with 48Gb memory. I am trying to use deepspeed, but still OOM.
thanks

How to run the evaluator for MedQA-USMLE

We were able to run preprocess_medqa.py based on the steps in https://github.com/stanford-crfm/BioMedLM/tree/main/finetune/mc

Next we wanted to run the evaluator as we already downloaded the question and answers

We went here https://github.com/stanford-crfm/BioMedLM/tree/main/finetune and ran
task=medqa_usmle_hf
datadir=data/$task
outdir=runs/$task/GPT2
mkdir -p $outdir
python -m torch.distributed.launch --nproc_per_node={num_devices} --nnodes=1 --node_rank=0
run_multiple_choice.py --tokenizer_name stanford-crfm/pubmed_gpt_tokenizer --model_name_or_path
{checkpoint} --train_file data/medqa_usmle_hf/train.json --validation_file data/medqa_usmle_hf/dev.json
--test_file data/medqa_usmle_hf/test.json --do_train --do_eval --do_predict --per_device_train_batch_size
{train_per_device_batch_size} --per_device_eval_batch_size 1 --gradient_accumulation_steps {grad_accum}
--learning_rate {lr} --warmup_ratio 0.5 --num_train_epochs {epochs} --max_seq_length 512
--{numerical_format} --seed {seed} --data_seed {seed} --logging_first_step --logging_steps 20
--save_strategy no --evaluation_strategy steps --eval_steps 500 --run_name {run_name}
--output_dir trash/
--overwrite_output_dir

It asks for various arguments that are missing e.g. {num_devices}, {checkpoint} {train_per_device_batch_size} etc

Can someone give us the command to execute "run_multiple_choice.py" exactly with arguments ?

Unexpected bug for generate function

Hello, when I use this model to tackle VQA task by passing the visual querys to inputs_embeds argument without input_ids, it results in that there is an extra dimension in attention_mask and position_ids, e.g., 442 vs 441 for last dimension. How can I fix it? Thanks.
image

Finetuning BioMedLM for Medical QA

Hi,
I'm trying to finetune the BioMedLM for Medical Question Answering using our custom dataset using Hugging Face's transformer's library. Since we're looking to optimize the memory usage, we're using Low Rank Adaptation as well.
I'm unsure of the format of the dataset that I need to use.
Below is the one I'm using currently:
{ 'instruction': 'xyz', 'output': 'test'}, where instruction is the question and output is the answer.
Below is my code - ```

import logging
import torch
from datasets import Dataset
import pandas as pd
import gc
from transformers import DataCollatorForLanguageModeling
from transformers import TrainingArguments, Trainer, AutoModelForCausalLM, AutoTokenizer
from peft import LoraConfig, get_peft_model, prepare_model_for_int8_training, TaskType

logging.basicConfig(level=logging.DEBUG)
#--------------------------------------------------------------------------------------------------------------
print("creating tokenizer from model")
model_name="stanford-crfm/BioMedLM"

tokenizer = AutoTokenizer.from_pretrained(model_name,add_eos_token=True)
tokenizer.pad_token_id = 0
tokenizer.add_special_tokens({'eos_token':''})
print('eos_token_id:',tokenizer.eos_token_id)

device_type = "cuda" if torch.cuda.is_available() else "cpu"

device = torch.device(device_type)
model = AutoModelForCausalLM.from_pretrained(
model_name,
).to(device)
model.tie_weights() # todo: understand we dont know what this doing

#--------------------------------------------------------------------------------------------------------------
peft_name = 'output/biomedLM-lora'
CUTOFF_LEN = 512

def tokenize(prompt, tokenizer, add_eos_token=True):
result = tokenizer(
prompt+"", # add the end-of-stream token
truncation=True,
max_length=CUTOFF_LEN,
padding="max_length",
)
return {
"input_ids": result["input_ids"],
"attention_mask": result["attention_mask"],
}

print("loading data from csv")
df = pd.read_csv("dataset.csv")
dataset = Dataset.from_pandas(df)
dataset = dataset.select_columns(['instruction', 'output'])

print("splitting dataset")
dataset = dataset.train_test_split(test_size = 0.33)
train_data = dataset["train"]
val_data = dataset["test"]

def generate_prompt(data_point):
return f"""Below is an instruction that describes a task. Write a response that appropriately completes the request.

Instruction:
{data_point["instruction"]}

Response:
{data_point["output"]}"""

print("tokenizing train and val ds")
train_data = train_data.shuffle().map(lambda x: tokenize(generate_prompt(x), tokenizer))
val_data = val_data.shuffle().map(lambda x: tokenize(generate_prompt(x), tokenizer))

lora_config = LoraConfig(
r = 8,
lora_alpha=16,
target_modules=["c_attn"],
lora_dropout=0.05,
bias="none",
task_type=TaskType.CAUSAL_LM
)

eval_steps = 50
save_steps = 50
logging_steps = 20

trainer = Trainer(
model=model,
train_dataset=train_data,
eval_dataset=val_data,
args=TrainingArguments(
num_train_epochs=1,
learning_rate=1e-5,
logging_steps=logging_steps,
evaluation_strategy="steps",
save_strategy="steps",
eval_steps=eval_steps,
save_steps=save_steps,
output_dir="./models", # where model is saved
report_to="none",
save_total_limit=3,
load_best_model_at_end=True,
push_to_hub=False,
per_device_train_batch_size=1, # defines per batch size
per_device_eval_batch_size=1
),
data_collator=DataCollatorForLanguageModeling(tokenizer, mlm=False), # wtf is this
)

model.config.use_cache = False # silence the warnings. Please re-enable for inference!

print("training")
trainer.train()

print("saving model")
trainer.model.save_pretrained(peft_name)
tokenizer.save_pretrained(peft_name)
#--------------------------------------------------------------------------------------------------------------../aten/src/ATen/native/cuda/Indexing.cu:1146: indexSelectLargeIndex: block: [472,0,0], thread: [126,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:1146: indexSelectLargeIndex: block: [472,0,0], thread: [127,0,0] Assertion srcIndex < srcSelectDimSize failed.

print("cleanup")
model = None
tokenizer=None
trainer=None
gc.collect()
torch.cuda.empty_cache()

#--------------------------------------------------------------------------------------------------------------



When I run the above code, I get the following error: 

│ ❱ 118 trainer.train()                                                        │
│   119                                                                        │
│   120 print("saving model")                                                  │
│   121 trainer.model.save_pretrained(peft_name)                               │
│                                                                              │
│ /home/ubuntu/.local/lib/python3.10/site-packages/transformers/trainer.py:164 │
│ 5 in train                                                                   │
│                                                                              │
│   1642 │   │   inner_training_loop = find_executable_batch_size(             │
│   1643 │   │   │   self._inner_training_loop, self._train_batch_size, args.a │
│   1644 │   │   )                                                             │
│ ❱ 1645 │   │   return inner_training_loop(                                   │
│   1646 │   │   │   args=args,                                                │
│   1647 │   │   │   resume_from_checkpoint=resume_from_checkpoint,            │
│   1648 │   │   │   trial=trial,                                              │
│                                                                              │
│ /home/ubuntu/.local/lib/python3.10/site-packages/transformers/trainer.py:193 │
│ 8 in _inner_training_loop                                                    │
│                                                                              │
│   1935 │   │   │   │   │   self.control = self.callback_handler.on_step_begi │
│   1936 │   │   │   │                                                         │
│   1937 │   │   │   │   with self.accelerator.accumulate(model):              │
│ ❱ 1938 │   │   │   │   │   tr_loss_step = self.training_step(model, inputs)  │
│   1939 │   │   │   │                                                         │
│   1940 │   │   │   │   if (                                                  │
│   1941 │   │   │   │   │   args.logging_nan_inf_filter                       │
│                                                                              │
│ /home/ubuntu/.local/lib/python3.10/site-packages/transformers/trainer.py:275 │
│ 9 in training_step                                                           │
│                                                                              │
│   2756 │   │   │   return loss_mb.reduce_mean().detach().to(self.args.device │
│   2757 │   │                                                                 │
│   2758 │   │   with self.compute_loss_context_manager():                     │
│ ❱ 2759 │   │   │   loss = self.compute_loss(model, inputs)                   │
│   2760 │   │                                                                 │
│   2761 │   │   if self.args.n_gpu > 1:                                       │
│   2762 │   │   │   loss = loss.mean()  # mean() to average on multi-gpu para │
│                                                                              │
│ /home/ubuntu/.local/lib/python3.10/site-packages/transformers/trainer.py:278 │
│ 4 in compute_loss                                                            │
│                                                                              │
│   2781 │   │   │   labels = inputs.pop("labels")                             │
│   2782 │   │   else:                                                         │
│   2783 │   │   │   labels = None                                             │
│ ❱ 2784 │   │   outputs = model(**inputs)                                     │
│   2785 │   │   # Save past state if it exists                                │
│   2786 │   │   # TODO: this needs to be fixed and made cleaner later.        │
│   2787 │   │   if self.args.past_index >= 0:                                 │
│                                                                              │
│ /home/ubuntu/.local/lib/python3.10/site-packages/torch/nn/modules/module.py: │
│ 1501 in _call_impl                                                           │
│                                                                              │
│   1498 │   │   if not (self._backward_hooks or self._backward_pre_hooks or s │
│   1499 │   │   │   │   or _global_backward_pre_hooks or _global_backward_hoo │
│   1500 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks │
│ ❱ 1501 │   │   │   return forward_call(*args, **kwargs)                      │
│   1502 │   │   # Do not call functions when jit is used                      │
│   1503 │   │   full_backward_hooks, non_full_backward_hooks = [], []         │
│   1504 │   │   backward_pre_hooks = []                                       │
│                                                                              │
│ /home/ubuntu/.local/lib/python3.10/site-packages/transformers/models/gpt2/mo │
│ deling_gpt2.py:1080 in forward                                               │
│                                                                              │
│   1077 │   │   """                                                           │
│   1078 │   │   return_dict = return_dict if return_dict is not None else sel │
│   1079 │   │                                                                 │
│ ❱ 1080 │   │   transformer_outputs = self.transformer(                       │
│   1081 │   │   │   input_ids,                                                │
│   1082 │   │   │   past_key_values=past_key_values,                          │
│   1083 │   │   │   attention_mask=attention_mask,                            │
│                                                                              │
│ /home/ubuntu/.local/lib/python3.10/site-packages/torch/nn/modules/module.py: │
│ 1501 in _call_impl                                                           │
│                                                                              │
│   1498 │   │   if not (self._backward_hooks or self._backward_pre_hooks or s │
│   1499 │   │   │   │   or _global_backward_pre_hooks or _global_backward_hoo │
│   1500 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks │
│ ❱ 1501 │   │   │   return forward_call(*args, **kwargs)                      │
│   1502 │   │   # Do not call functions when jit is used                      │
│   1503 │   │   full_backward_hooks, non_full_backward_hooks = [], []         │
│   1504 │   │   backward_pre_hooks = []                                       │
│                                                                              │
│ /home/ubuntu/.local/lib/python3.10/site-packages/transformers/models/gpt2/mo │
│ deling_gpt2.py:903 in forward                                                │
│                                                                              │
│    900 │   │   │   │   │   encoder_attention_mask,                           │
│    901 │   │   │   │   )                                                     │
│    902 │   │   │   else:                                                     │
│ ❱  903 │   │   │   │   outputs = block(                                      │
│    904 │   │   │   │   │   hidden_states,                                    │
│    905 │   │   │   │   │   layer_past=layer_past,                            │
│    906 │   │   │   │   │   attention_mask=attention_mask,                    │
│                                                                              │
│ /home/ubuntu/.local/lib/python3.10/site-packages/torch/nn/modules/module.py: │
│ 1501 in _call_impl                                                           │
│                                                                              │
│   1498 │   │   if not (self._backward_hooks or self._backward_pre_hooks or s │
│   1499 │   │   │   │   or _global_backward_pre_hooks or _global_backward_hoo │
│   1500 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks │
│ ❱ 1501 │   │   │   return forward_call(*args, **kwargs)                      │
│   1502 │   │   # Do not call functions when jit is used                      │
│   1503 │   │   full_backward_hooks, non_full_backward_hooks = [], []         │
│   1504 │   │   backward_pre_hooks = []                                       │
│                                                                              │
│ /home/ubuntu/.local/lib/python3.10/site-packages/transformers/models/gpt2/mo │
│ deling_gpt2.py:391 in forward                                                │
│                                                                              │
│    388 │   ) -> Union[Tuple[torch.Tensor], Optional[Tuple[torch.Tensor, Tupl │
│    389 │   │   residual = hidden_states                                      │
│    390 │   │   hidden_states = self.ln_1(hidden_states)                      │
│ ❱  391 │   │   attn_outputs = self.attn(                                     │
│    392 │   │   │   hidden_states,                                            │
│    393 │   │   │   layer_past=layer_past,                                    │
│    394 │   │   │   attention_mask=attention_mask,                            │
│                                                                              │
│ /home/ubuntu/.local/lib/python3.10/site-packages/torch/nn/modules/module.py: │
│ 1501 in _call_impl                                                           │
│                                                                              │
│   1498 │   │   if not (self._backward_hooks or self._backward_pre_hooks or s │
│   1499 │   │   │   │   or _global_backward_pre_hooks or _global_backward_hoo │
│   1500 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks │
│ ❱ 1501 │   │   │   return forward_call(*args, **kwargs)                      │
│   1502 │   │   # Do not call functions when jit is used                      │
│   1503 │   │   full_backward_hooks, non_full_backward_hooks = [], []         │
│   1504 │   │   backward_pre_hooks = []                                       │
│                                                                              │
│ /home/ubuntu/.local/lib/python3.10/site-packages/transformers/models/gpt2/mo │
│ deling_gpt2.py:313 in forward                                                │
│                                                                              │
│    310 │   │   │   key, value = self.c_attn(encoder_hidden_states).split(sel │
│    311 │   │   │   attention_mask = encoder_attention_mask                   │
│    312 │   │   else:                                                         │
│ ❱  313 │   │   │   query, key, value = self.c_attn(hidden_states).split(self │
│    314 │   │                                                                 │
│    315 │   │   query = self._split_heads(query, self.num_heads, self.head_di │
│    316 │   │   key = self._split_heads(key, self.num_heads, self.head_dim)   │
│                                                                              │
│ /home/ubuntu/.local/lib/python3.10/site-packages/torch/nn/modules/module.py: │
│ 1501 in _call_impl                                                           │
│                                                                              │
│   1498 │   │   if not (self._backward_hooks or self._backward_pre_hooks or s │
│   1499 │   │   │   │   or _global_backward_pre_hooks or _global_backward_hoo │
│   1500 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks │
│ ❱ 1501 │   │   │   return forward_call(*args, **kwargs)                      │
│   1502 │   │   # Do not call functions when jit is used                      │
│   1503 │   │   full_backward_hooks, non_full_backward_hooks = [], []         │
│   1504 │   │   backward_pre_hooks = []                                       │
│                                                                              │
│ /home/ubuntu/.local/lib/python3.10/site-packages/transformers/pytorch_utils. │
│ py:103 in forward                                                            │
│                                                                              │
│   100 │                                                                      │
│   101 │   def forward(self, x):                                              │
│   102 │   │   size_out = x.size()[:-1] + (self.nf,)                          │
│ ❱ 103 │   │   x = torch.addmm(self.bias, x.view(-1, x.size(-1)), self.weight │
│   104 │   │   x = x.view(size_out)                                           │
│   105 │   │   return x                                                       │
│   106                                                                        │
╰──────────────────────────────────────────────────────────────────────────────╯
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling 
`cublasCreate(handle)`

I also get this warning: 
../aten/src/ATen/native/cuda/Indexing.cu:1146: indexSelectLargeIndex: block: [472,0,0], thread: [126,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1146: indexSelectLargeIndex: block: [472,0,0], thread: [127,0,0] Assertion `srcIndex < srcSelectDimSize` failed.

How do I proceed?

Can you share preprocessed datasets for fine-tuning?

Thank you for open-sourcing this valuable resource. I am interested in reproducing the experiments in this repository and would like to follow the fine-tuning setup you used.

However, I noticed that the data folders for the fine-tuning tasks do not contain the datasets. Could you please share the preprocessed datasets or provide guidelines on how to preprocess the data for the reproduction of the results?

I would greatly appreciate any assistance you can provide.

torch.distributed.launch on eight 40G A100, CUDA out of memory.

I run:
export CUDA_VISIBLE_DEVICES='0,1,2,3,4,5,6,7'
task=gene
datadir=data/$task
outdir=runs/$task/GPT2
name=gene0913
checkpoint=/root/siton-glusterfs-eaxtsxdfs/xts/data/BioMedLM
python -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --node_rank=0 --use_env run_seqcls_gpt.py
--tokenizer_name $checkpoint --model_name_or_path $checkpoint --train_file
$datadir/train.json --validation_file $datadir/dev.json --test_file $datadir/test.json --do_train
--do_eval --do_predict --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --gradient_accumulation_steps 1
--learning_rate 2e-6 --warmup_ratio 0.5 --num_train_epochs 5 --max_seq_length
32 --logging_steps 1 --save_strategy no --evaluation_strategy no --output_dir
$outdir --overwrite_output_dir --bf16 --seed 1000 --run_name %name

but still get CUDA out of memory.
Anyone know to finetune seqcls how many GPUs must be need?

code for pubmedgpt pre-training

Hi! I could not find pre-training code as it was mentioned in the blog post:

To train Pubmed GPT easily, quickly, and efficiently, we used the MosaicML Cloud for infrastructure and trained the model using MosaicML’s Composer and Streaming Dataset libraries. All model and training code is built off of PyTorch. See the code here!

https://www.mosaicml.com/blog/introducing-pubmed-gpt

Are you planing to make it public?
It could help to understand how the model was actually trained with MosaicML's Composer?
Another question is how the model trained with FlashAttention was converted to Huggingface-compatible GPT2LMHeadModel checkpoint?

zero-shot keyword extraction

Hello,
I am planning to use pubmedgpt for zero-shot keyword extraction on a biomedical text. On my (proprietary dataset) GPT-3 has demonstrated pretty decent performance for keyword extraction; I wanted to get your thoughts on zero-shot generalization capabilities of pubmedgpt? especially for tasks such as keyword extraction?
Also, can you point to helpful prompt(s) format optimized for pubmedgpt?

Many thanks!

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior.

I run the "Example Usage":

import torch

from transformers import GPT2LMHeadModel, GPT2Tokenizer

device = torch.device("cuda")

tokenizer = GPT2Tokenizer.from_pretrained("stanford-crfm/BioMedLM")

model = GPT2LMHeadModel.from_pretrained("stanford-crfm/BioMedLM").to(device)

input_ids = tokenizer.encode(
"Photosynthesis is ", return_tensors="pt"
).to(device)

sample_output = model.generate(input_ids, do_sample=True, max_length=50, top_k=50)

print("Output:\n" + 100 * "-")
print(tokenizer.decode(sample_output[0], skip_special_tokens=True))

Evaluate MedQA_USMLE on a saved model

Hello,

We followed your steps using deepspeed and were able to create a fine tuned model which was basically created as a checkpoint by the run. We saved this model and then loaded it next time using something like this:
tokenizer = GPT2Tokenizer.from_pretrained("/content/drive/MyDrive/Colab Notebooks/SavedModel")
model = GPT2LMHeadModel.from_pretrained("/content/drive/MyDrive/Colab Notebooks/SavedModel")

Now we wanted to run a sample question inference on this model and were using this link
https://huggingface.co/docs/transformers/tasks/multiple_choice#inference

Here is our code:
prompt = ("A 20-year-old woman presents with menorrhagia for the past several years."
"She says that her menses “have always been heavy”, and she has experienced easy bruising for as long as she can remember."
"Family history is significant for her mother, who had similar problems with bruising easily. "
"The patient's vital signs include: heart rate 98/min, respiratory rate 14/min, temperature 36.1°C (96.9°F),"
" and blood pressure 110/87 mm Hg. Physical examination is unremarkable. "
" Laboratory tests show the following: platelet count 200,000/mm3, PT 12 seconds,"
" and PTT 43 seconds. Which of the following is the most likely cause of this patient’s symptoms?")
candidate1 = "Factor V Leiden"
candidate2 = "Hemophilia A"
candidate3 = "Lupus anticoagulant"
candidate4 = "Protein C deficiency"
candidate5 = "Von Willebrand disease"

inputs = tokenizer([[prompt, candidate1], [prompt, candidate2],[prompt, candidate3],[prompt, candidate4],[prompt, candidate5]], return_tensors="pt", padding=True)
labels = torch.tensor(0).unsqueeze(0)

outputs = model(**{k: v.unsqueeze(0) for k, v in inputs.items()}, labels=labels)
logits = outputs.logits

However we get this error:

ValueError Traceback (most recent call last)
in
2
3 #model = AutoModelForMultipleChoice.from_pretrained("my_awesome_swag_model")
----> 4 outputs = model(**{k: v.unsqueeze(0) for k, v in inputs.items()}, labels=labels)
5 logits = outputs.logits

4 frames
/usr/local/lib/python3.9/dist-packages/torch/nn/functional.py in cross_entropy(input, target, weight, size_average, ignore_index, reduce, reduction, label_smoothing)
2844 if size_average is not None or reduce is not None:
2845 reduction = _Reduction.legacy_get_string(size_average, reduce)
-> 2846 return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
2847
2848

ValueError: Expected input batch_size (840) to match target batch_size (0).

Do you have a recommendation on how to run a sample question inference on this model ?

Seqcls for multi-label task

I would like to use the seqcls script for a multilabel task with 41 labels. Please advise what are the changes I need to make for this use case. Thanks.

Tokenizer does not have a padding token

Hi, thanks for sharing your model!

I am trying to use it to generate embeddings of batches of sequences of text of different lengths (Gene Ontology annotations). However, when I try to do this using huggingface, I get the following error at the tokenization stage.

Code:

tokenizer = GPT2Tokenizer.from_pretrained("stanford-crfm/pubmed_gpt_tokenizer")
inputs = tokenizer(sequences, padding=True, return_tensors="pt")

Error:

Asking to pad but the tokenizer does not have a padding token. Please select a token to use as 
`pad_token` `(tokenizer.pad_token = tokenizer.eos_token e.g.)` or add a new pad token via 
`tokenizer.add_special_tokens({'pad_token': '[PAD]'})`.

How should I resolve this?

Thanks!

fine tuning on seqcls task with deepspeed hit RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.

Hi team,

I am trying to fine tune on a seqcls task (using the provided script) on my own dataset but was OOM on my gpu (DGX A100, 40Gb). So now I am trying to run it with deepspeed but encounter the following error.

Please advise. Thanks.

*I am a begineer in deep learning...

================
Try with deepspeed
deepspeed_config.json
{
"zero_optimization": {
"stage": 1,
"reduce_bucket_size": 5e8
},
"train_batch_size":"auto"
}

================

cd /home/dro/guathwa/pubmedgpt/finetune/seqcls_tr
export task=tr
export datadir=data/$task
export outdir=runs/$task/GPT2
export seed=100
export name=test2
export lr=4e-5
export OMP_NUM_THREADS=1
export WANDB_DISABLED=True
export train_batch_size=4
export max_seq_length=128
export grad_accum=4

(pubmedgpt) dro@dro-DGX-Station-A100:~/guathwa/pubmedgpt/finetune/seqcls_tr$ python -m torch.distributed.launch --nproc_per_node=1 --nnodes=1 --node_rank=0 run_seqcls_gpt_tr_v0.2_dro.py --model_name_or_path "stanford-crfm-pubmedgpt" --train_file $datadir/train5000.csv --validation_file $datadir/dev.csv --test_file $datadir/test.csv --do_train --do_eval --do_predict --per_device_train_batch_size $train_batch_size --learning_rate $lr --warmup_ratio 0.5 --num_train_epochs 1 --max_seq_length $max_seq_length --logging_steps 100 --save_strategy no --evaluation_strategy no --output_dir $outdir --overwrite_output_dir --fp16 --seed $seed --run_name $name --ddp_find_unused_parameters False --weight_decay 0.0 --deepspeed deepspeed_config.json

/home/dro/anaconda3/envs/pubmedgpt/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use_env is set by default in torchrun.
If your script expects --local_rank argument to be set, please
change it to read from os.environ['LOCAL_RANK'] instead. See
https://pytorch.org/docs/stable/distributed.html#launch-utility for
further instructions

warnings.warn(
Using the WANDB_DISABLED environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
01/30/2023 17:11:23 - WARNING - main - Process rank: 0, device: cuda:0, n_gpu: 1distributed training: True, 16-bits training: True
01/30/2023 17:11:23 - INFO - main - Training/evaluation parameters TrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=False,
ddp_timeout=1800,
debug=[],
deepspeed=deepspeed_config.json,
disable_tqdm=False,
do_eval=True,
do_predict=True,
do_train=True,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=None,
evaluation_strategy=no,
fp16=True,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
gradient_accumulation_steps=1,
gradient_checkpointing=False,
greater_is_better=None,
group_by_length=False,
half_precision_backend=auto,
hub_model_id=None,
hub_private_repo=False,
hub_strategy=every_save,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
include_inputs_for_metrics=False,
jit_mode_eval=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=4e-05,
length_column_name=length,
load_best_model_at_end=False,
local_rank=0,
log_level=passive,
log_level_replica=passive,
log_on_each_node=True,
logging_dir=runs/tr/GPT2/runs/Jan30_17-11-23_dro-DGX-Station-A100
logging_first_step=False,
logging_nan_inf_filter=True,
logging_steps=100,
logging_strategy=steps,
lr_scheduler_type=linear,
max_grad_norm=1.0,
max_steps=-1,
metric_for_best_model=None,
mp_parameters=,
no_cuda=False,
num_train_epochs=1.0,
optim=adamw_hf,
output_dir=runs/tr/GPT2,
overwrite_output_dir=True,
past_index=-1,
per_device_eval_batch_size=8,
per_device_train_batch_size=4,
prediction_loss_only=False,
push_to_hub=False,
push_to_hub_model_id=None,
push_to_hub_organization=None,
push_to_hub_token=<PUSH_TO_HUB_TOKEN>,
ray_scope=last,
remove_unused_columns=True,
report_to=['tensorboard'],
resume_from_checkpoint=None,
run_name=test2,
save_on_each_node=False,
save_steps=500,
save_strategy=no,
save_total_limit=None,
seed=100,
sharded_ddp=[],
skip_memory_metrics=True,
tf32=None,
torchdynamo=None,
tpu_metrics_debug=False,
tpu_num_cores=None,
use_ipex=False,
use_legacy_prediction_loop=False,
use_mps_device=False,
warmup_ratio=0.5,
warmup_steps=0,
weight_decay=0.0,
xpu_backend=None,
)
01/30/2023 17:11:23 - INFO - main - load a local file for train: data/tr/train5000.csv
01/30/2023 17:11:23 - INFO - main - load a local file for validation: data/tr/dev.csv
01/30/2023 17:11:23 - INFO - main - load a local file for test: data/tr/test.csv
01/30/2023 17:11:23 - WARNING - datasets.builder - Using custom data configuration default-b4448ac955faff7e
01/30/2023 17:11:23 - INFO - datasets.info - Loading Dataset Infos from /home/dro/anaconda3/envs/pubmedgpt/lib/python3.8/site-packages/datasets/packaged_modules/csv
01/30/2023 17:11:23 - INFO - datasets.builder - Overwrite dataset info from restored data version.
01/30/2023 17:11:23 - INFO - datasets.info - Loading Dataset info from /home/dro/.cache/huggingface/datasets/csv/default-b4448ac955faff7e/0.0.0/6b34fb8fcf56f7c8ba51dc895bfa2bfbe43546f190a60fcf74bb5e8afdcc2317
01/30/2023 17:11:23 - WARNING - datasets.builder - Found cached dataset csv (/home/dro/.cache/huggingface/datasets/csv/default-b4448ac955faff7e/0.0.0/6b34fb8fcf56f7c8ba51dc895bfa2bfbe43546f190a60fcf74bb5e8afdcc2317)
01/30/2023 17:11:23 - INFO - datasets.info - Loading Dataset info from /home/dro/.cache/huggingface/datasets/csv/default-b4448ac955faff7e/0.0.0/6b34fb8fcf56f7c8ba51dc895bfa2bfbe43546f190a60fcf74bb5e8afdcc2317
100%|█████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 1399.35it/s]

label_list [0, 1, 2, 3]
[INFO|configuration_utils.py:652] 2023-01-30 17:11:23,271 >> loading configuration file stanford-crfm-pubmedgpt/config.json
[INFO|configuration_utils.py:706] 2023-01-30 17:11:23,272 >> Model config GPT2Config {
"_name_or_path": "stanford-crfm-pubmedgpt",
"activation_function": "gelu_new",
"architectures": [
"GPT2LMHeadModel"
],
"attn_pdrop": 0.1,
"bos_token_id": 28895,
"embd_pdrop": 0.1,
"eos_token_id": 28895,
"id2label": {
"0": "LABEL_0",
"1": "LABEL_1",
"2": "LABEL_2",
"3": "LABEL_3"
},
"initializer_range": 0.02,
"label2id": {
"LABEL_0": 0,
"LABEL_1": 1,
"LABEL_2": 2,
"LABEL_3": 3
},
"layer_norm_epsilon": 1e-05,
"model_type": "gpt2",
"n_ctx": 1024,
"n_embd": 2560,
"n_head": 20,
"n_inner": null,
"n_layer": 32,
"n_positions": 1024,
"reorder_and_upcast_attn": false,
"resid_pdrop": 0.1,
"scale_attn_by_inverse_layer_idx": true,
"scale_attn_weights": true,
"summary_activation": null,
"summary_first_dropout": 0.1,
"summary_proj_to_labels": true,
"summary_type": "cls_index",
"summary_use_proj": true,
"task_specific_params": {
"text-generation": {
"do_sample": true,
"max_length": 50
}
},
"torch_dtype": "float32",
"transformers_version": "4.24.0",
"use_cache": false,
"vocab_size": 28896
}

[INFO|tokenization_utils_base.py:1773] 2023-01-30 17:11:23,273 >> loading file vocab.json
[INFO|tokenization_utils_base.py:1773] 2023-01-30 17:11:23,273 >> loading file merges.txt
[INFO|tokenization_utils_base.py:1773] 2023-01-30 17:11:23,273 >> loading file tokenizer.json
[INFO|tokenization_utils_base.py:1773] 2023-01-30 17:11:23,273 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:1773] 2023-01-30 17:11:23,273 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:1773] 2023-01-30 17:11:23,273 >> loading file tokenizer_config.json
[INFO|modeling_utils.py:2155] 2023-01-30 17:11:23,302 >> loading weights file stanford-crfm-pubmedgpt/pytorch_model.bin
[WARNING|modeling_utils.py:2598] 2023-01-30 17:11:42,278 >> Some weights of the model checkpoint at stanford-crfm-pubmedgpt were not used when initializing GPT2ForSequenceClassification: ['lm_head.weight']

  • This IS expected if you are initializing GPT2ForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
  • This IS NOT expected if you are initializing GPT2ForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
    [WARNING|modeling_utils.py:2610] 2023-01-30 17:11:42,279 >> Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at stanford-crfm-pubmedgpt and are newly initialized: ['classifier.weight']
    You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
    Adding [PAD] token to tokenizer and model word embeddings.
    [INFO|tokenization_utils_base.py:898] 2023-01-30 17:11:42,597 >> Assigning [PAD] to the pad_token key of the tokenizer
    01/30/2023 17:11:43 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/dro/.cache/huggingface/datasets/csv/default-b4448ac955faff7e/0.0.0/6b34fb8fcf56f7c8ba51dc895bfa2bfbe43546f190a60fcf74bb5e8afdcc2317/cache-84075b8e214e8641.arrow
    01/30/2023 17:11:43 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/dro/.cache/huggingface/datasets/csv/default-b4448ac955faff7e/0.0.0/6b34fb8fcf56f7c8ba51dc895bfa2bfbe43546f190a60fcf74bb5e8afdcc2317/cache-f2c30d0b5792242f.arrow
    01/30/2023 17:11:43 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/dro/.cache/huggingface/datasets/csv/default-b4448ac955faff7e/0.0.0/6b34fb8fcf56f7c8ba51dc895bfa2bfbe43546f190a60fcf74bb5e8afdcc2317/cache-a8ef74afd83d72ba.arrow
    [INFO|trainer.py:557] 2023-01-30 17:11:43,480 >> Using cuda_amp half precision backend
    [INFO|trainer.py:725] 2023-01-30 17:11:43,480 >> The following columns in the training set don't have a corresponding argument in GPT2ForSequenceClassification.forward and have been ignored: sentence. If sentence are not expected by GPT2ForSequenceClassification.forward, you can safely ignore this message.
    /home/dro/anaconda3/envs/pubmedgpt/lib/python3.8/site-packages/transformers/optimization.py:306: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set no_deprecation_warning=True to disable this warning
    warnings.warn(
    [2023-01-30 17:11:43,488] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.8.0, git-hash=unknown, git-branch=unknown
    Traceback (most recent call last):
    File "run_seqcls_gpt_tr_v0.1.py", line 638, in
    main()
    File "run_seqcls_gpt_tr_v0.1.py", line 563, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
    File "/home/dro/anaconda3/envs/pubmedgpt/lib/python3.8/site-packages/transformers/trainer.py", line 1501, in train
    return inner_training_loop(
    File "/home/dro/anaconda3/envs/pubmedgpt/lib/python3.8/site-packages/transformers/trainer.py", line 1570, in _inner_training_loop
    deepspeed_engine, optimizer, lr_scheduler = deepspeed_init(
    File "/home/dro/anaconda3/envs/pubmedgpt/lib/python3.8/site-packages/transformers/deepspeed.py", line 344, in deepspeed_init
    deepspeed_engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
    File "/home/dro/anaconda3/envs/pubmedgpt/lib/python3.8/site-packages/deepspeed/init.py", line 125, in initialize
    engine = DeepSpeedEngine(args=args,
    File "/home/dro/anaconda3/envs/pubmedgpt/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 301, in init
    self._configure_distributed_model(model)
    File "/home/dro/anaconda3/envs/pubmedgpt/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1187, in _configure_distributed_model
    self._broadcast_model()
    File "/home/dro/anaconda3/envs/pubmedgpt/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1102, in _broadcast_model
    dist.broadcast(p,
    File "/home/dro/anaconda3/envs/pubmedgpt/lib/python3.8/site-packages/deepspeed/comm/comm.py", line 127, in log_wrapper
    return func(*args, **kwargs)
    File "/home/dro/anaconda3/envs/pubmedgpt/lib/python3.8/site-packages/deepspeed/comm/comm.py", line 232, in broadcast
    return cdb.broadcast(tensor=tensor, src=src, group=group, async_op=async_op)
    File "/home/dro/anaconda3/envs/pubmedgpt/lib/python3.8/site-packages/deepspeed/comm/torch.py", line 70, in broadcast
    return torch.distributed.broadcast(tensor=tensor,
    File "/home/dro/anaconda3/envs/pubmedgpt/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 1201, in broadcast
    work.wait()
    RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.
    ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 48532) of binary: /home/dro/anaconda3/envs/pubmedgpt/bin/python
    Traceback (most recent call last):
    File "/home/dro/anaconda3/envs/pubmedgpt/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
    File "/home/dro/anaconda3/envs/pubmedgpt/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
    File "/home/dro/anaconda3/envs/pubmedgpt/lib/python3.8/site-packages/torch/distributed/launch.py", line 193, in
    main()
    File "/home/dro/anaconda3/envs/pubmedgpt/lib/python3.8/site-packages/torch/distributed/launch.py", line 189, in main
    launch(args)
    File "/home/dro/anaconda3/envs/pubmedgpt/lib/python3.8/site-packages/torch/distributed/launch.py", line 174, in launch
    run(args)
    File "/home/dro/anaconda3/envs/pubmedgpt/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run
    elastic_launch(
    File "/home/dro/anaconda3/envs/pubmedgpt/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in call
    return launch_agent(self._config, self._entrypoint, list(args))
    File "/home/dro/anaconda3/envs/pubmedgpt/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent
    raise ChildFailedError(
    torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
    ============================================================
    run_seqcls_gpt_tr_v0.1.py FAILED

Failures:
<NO_OTHER_FAILURES>

Root Cause (first observed failure):
[0]:
time : 2023-01-30_17:11:47
host : ...
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 48532)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

sentence embedding

Hi.
First of all, thank you for making such a model available to us.
I am trying to get vector embeddings of abstracts of some of the articles in PubMed. But somehow I couldn't get the sentence embeddings. More precisely, I wrote the code below and the dimensions of the vectors I obtained are 2560. But on the huggingface page, it says sequence length is 1024. So I understand that the dimension of an embedding vector should be 1024. Am I wrong?
Can you help with getting sentence embeddings?
Best wishes.
Orhan

tokenizer = AutoTokenizer.from_pretrained("BioMedLM")
model = AutoModel.from_pretrained("BioMedLM")
tokenizer.pad_token = tokenizer.eos_token

f = open('articles.json', "r")
data = json.loads(f.read())
data_abst = [data[i]['abstract'] for i in range(len(data))]
data_title = [data[i]['title'] for i in range(len(data))]

def normalizer(x):     
    normalized_vector = x / np.linalg.norm(x)
    return normalized_vector

class BioMedLM:    
    def __init__(self, model, tokenizer):
        # self.sentence = sentence
        self.model = model
        self.tokenizer = tokenizer

    def sentence_vectors(self,sentence):
        inputs = self.tokenizer(sentence, padding=True, truncation=True, return_tensors="pt")
        w_vectors = self.model(**inputs)

        # return w_vectors
        token_embeddings = w_vectors[0] #First element of model_output contains all token embeddings
        input_mask_expanded = inputs.attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
        vec=torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
        return vec[0]

gpt_class = BioMedLM(model, tokenizer)

def sentence_encoder(data):
    vectors = []
    normalized_vectors = []
    for i in range(len(data)):
        sentence_vectors = gpt_class.sentence_vectors(data[i]).detach().numpy()
        vectors.append(sentence_vectors)
        normalized_vectors.append(normalizer(sentence_vectors))

    vectors = np.squeeze(np.array(vectors))
    normalized_vectors = np.squeeze(np.array(normalized_vectors))

    return vectors, normalized_vectors


abst_vectors, abst_vectors_norm = sentence_encoder(data_abst) 

Max Input and Output length

In the finetine_for_summarization.py, why max_source_length, train_max_target_length, and eval_max_target_length is set to default 510? Is this the max the BioMedLM can take as Input and only generate max 510 tokens? As soon as I increase the value above this default value, I get the error.

max_source_length: Optional[int] = field(
default=510, metadata={"help": "the max source length of summarization data. "}
)
train_max_target_length: Optional[int] = field(
default=510, metadata={"help": "the max target length for training data. "}
)
eval_max_target_length: Optional[int] = field(
default=510, metadata={"help": "the max target length for dev data. "}

Error:
Traceback (most recent call last): File "finetune_for_summarization.py", line 168, in <module> finetune() File "finetune_for_summarization.py", line 162, in finetune trainer.train() File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/trainer.py", line 1534, in train return inner_training_loop( File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/trainer.py", line 1807, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/trainer.py", line 2649, in training_step loss = self.compute_loss(model, inputs) File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/trainer.py", line 2674, in compute_loss outputs = model(**inputs) File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/ubuntu/.local/lib/python3.8/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn ret_val = func(*args, **kwargs) File "/home/ubuntu/.local/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1769, in forward loss = self.module(*inputs, **kwargs) File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl result = forward_call(*args, **kwargs) File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 1075, in forward transformer_outputs = self.transformer( File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl result = forward_call(*args, **kwargs) File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 843, in forward position_embeds = self.wpe(position_ids) File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl result = forward_call(*args, **kwargs) File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/sparse.py", line 162, in forward return F.embedding( File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/functional.py", line 2210, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) RuntimeError: CUDA error: device-side assert triggered Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Distilling PubMedGPT

Thank you very much for this great work and for publishing the model!
Do you have any plans of training / publishing a distilled version of your model, as the current size requires a lot of recources?

Generation is suspiciously slow for long sequences

I am trying to use BioMedLM for generation, but I find that it is very slow at generation for long sequences. Training occurs at a normal speed. I wrote a minimal program (below) to reproduce this, comparing it to GPT-2 (1.5B parameters) and Flan T5-XL (3B parameters) for comparison. I varied the maximum generation length value, and estimated the ratio of the durations of the decoder models (BioMedLM divided by GPT-2):

1024 tokens: 5.9
512 tokens: 3.2
256 tokens: 1.9
128 tokens: 1.3
64 tokens: 1.01

Anecdotally, the generation speed is similar to that of Flan UL2, a 20B parameter model.

I'd like to fix this—I don't know if the issue is in the the BioMedLM code, my software/environment versions/settings, or my hardware A100-80GB.

import transformers
from transformers import AutoTokenizer, AutoModelForCausalLM, AutoModelForSeq2SeqLM
from datetime import datetime

# settings
max_length = 1024

# text
text = 'SRY1 phosphorylates'

# flan-t5-xl - 3B - encoder-decoder model
checkpoint = 'google/flan-t5-xl'
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint)

inputs = tokenizer(text, return_tensors = 'pt')

model = model.to('cuda')
inputs = inputs.to('cuda')

t0 = datetime.now()
output = model.generate(**inputs, max_length = min(512, max_length)
t1 = datetime.now()

print('flan-t5 generation length: ', len(output[0]))
print('flan-t5 duration: ', t1 - t0)

# gpt2 - 1.5B - decoder model
checkpoint = 'gpt2-xl'
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint)

inputs = tokenizer(text, return_tensors = 'pt')

model = model.to('cuda')
inputs = inputs.to('cuda')

t2 = datetime.now()
output = model.generate(**inputs, max_length = max_length)
t3 = datetime.now()

print('GPT-2 generation length: ', len(output[0]) - inputs['input_ids'].size(1))
print('GPT-2 duration: ', t3 - t2)

# BioMedLM - 2.7B - decoder model
checkpoint = 'stanford-crfm/BioMedLM'
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint)

inputs = tokenizer(text, return_tensors = 'pt')

model = model.to('cuda')
inputs = inputs.to('cuda')

t4 = datetime.now()
outputs = model.generate(**inputs, max_length = max_length)
t5 = datetime.now()

print('BioMedLM generation length: ', len(output[0]) - inputs['input_ids'].size(1))

print('BioMedLM duration: ', t5 - t4)

demo.py's unexpected behavior

Because the model is too big for my machine, I get

RuntimeError: CUDA out of memory. Tried to allocate 100.00 MiB (GPU 0; 7.80 GiB total capacity; 7.19 GiB already allocated; 76.00 MiB free; 7.20 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

The first workaround that comes to mind is to use half precision

model = GPT2LMHeadModel.from_pretrained("stanford-crfm/pubmedgpt").half().to(device)

It runs, but the output is

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:28895 for open-end generation.
Output:
----------------------------------------------------------------------------------------------------
Photosynthesis is \~10-fold lower in *gabaculine-treated* ***spmh7*** **plants in comparison to** ***spmh7*** **

Which looks odd.

What have I done wrong?
How can I fix it?

My setting is:

OS: Ubuntu 20.04
GPU:  GeForce RTX 2070 Mobile - 8GiB
Python packages:
Package            Version     
------------------ ------------    
tokenizers         0.13.2      
torch              1.12.1+cu116
torchaudio         0.12.1+cu116
torchvision        0.13.1+cu116 
transformers       4.25.1      

How can I try question answering ?

The demo is about text generation, can you send me how to try question answering? Are there any API for batch processing the content?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.