omar-mohamed / gpt2-chest-x-ray-report-generation Goto Github PK

This is the implementation of the CDGPT2 model mentioned in our paper 'Automated Radiology Report Generation using Conditioned Transformers'

Home Page: https://doi.org/10.1016/j.imu.2021.100557

License: Other

Python 100.00%

deep-learning transformer gpt2 transfer-learning medical-imaging chest-xrays report-generation radiology

gpt2-chest-x-ray-report-generation's People

Contributors

Stargazers

Watchers

gpt2-chest-x-ray-report-generation's Issues

Could not find a version that satisfies the requirement opencv-python==4.1.0.25

Tried installing the requirements using pip install -r full_requirements.txt on macOS ventura 13.5 ran into the following issue:
ERROR: Could not find a version that satisfies the requirement opencv-python==4.1.0.25 (from versions: 3.4.0.14, 3.4.10.37, 3.4.11.41, 3.4.11.43, 3.4.11.45, 3.4.13.47, 3.4.15.55, 3.4.16.57, 3.4.16.59, 3.4.17.61, 3.4.17.63, 3.4.18.65, 4.3.0.38, 4.4.0.40, 4.4.0.42, 4.4.0.44, 4.4.0.46, 4.5.1.48, 4.5.3.56, 4.5.4.58, 4.5.4.60, 4.5.5.62, 4.5.5.64, 4.6.0.66, 4.7.0.68, 4.7.0.72, 4.8.0.74, 4.8.0.76, 4.8.1.78, 4.9.0.80)
ERROR: No matching distribution found for opencv-python==4.1.0.25

Default eos token not working & gap in clinical performance in reproduced results

Hello, I'm currently doing research about medical report generation. And your work CDGPT-2 really has caught my interest.

But currently I'm facing 2 issues: Default EOS Token Not Working and Not being able to reproduce the exact predictions and large gap in clinical accuracy which I will elaborate them.

I've also attached ipynb files for you to investigate and reproduce if needed, each of them are below each issue topic.

Default EOS Token Not Working

CDGPT_2_Reproduce.ipynb

This Python notebook contains code of my experiment on different eos tokens and an attempt to reproduce results

My code modification

I've modified a part of code in test.py to get predictions in batch as following:

def generate_batch(FLAGS, encoder, decoder, tokenizer_wrapper, images, eos_token_ids, no_repeat_ngram_size):
    """ This function was modified from evaluate_full in test.py to predict in batch
    """
    visual_features, tags_embeddings = encoder(images)
    dec_input = tf.convert_to_tensor([tokenizer_wrapper.GPT2_encode("startseq", pad=False)] * len(images))
    
    num_beams = FLAGS.beam_width

    visual_features = tf.tile(visual_features, [num_beams, 1, 1])
    tags_embeddings = tf.tile(tags_embeddings, [num_beams, 1, 1])
    start_time = time.time()
    tokens = decoder.generate(dec_input, max_length=FLAGS.max_sequence_length, num_beams=num_beams, min_length=3,
                              eos_token_ids=eos_token_ids, no_repeat_ngram_size=no_repeat_ngram_size,
                              visual_features=visual_features,
                              tags_embedding=tags_embeddings, do_sample=False, early_stopping=True)
    
    end_time = time.time() - start_time

    sentences = [tokenizer_wrapper.filter_special_words((tokenizer_wrapper.GPT2_decode(toks))) for toks in tokens]
    return sentences

def generate_all_batch(enqueuer, FLAGS, encoder, decoder, tokenizer_wrapper, 
                            test_steps, eos_token_ids, filename=None, no_repeat_ngram_size=None, verbose=False):

    """ This function was modified from evaluate_enqueuer in test.py to predict
        enqueuer data in batch.

    Parameters:
    test_steps (int): Number of test steps should predict
    filename (string): Directory to save predicted results csv file
    verbose (boolean): Set to true to print every predicted results for quick preview

    Returns:
    pandas.dataframe: Predicted results

   """

    tf.keras.backend.set_learning_phase(0)

    if not enqueuer.is_running():
        enqueuer.start(workers=FLAGS.generator_workers, max_queue_size=FLAGS.generator_queue_length)
    start = time.time()
    csv_dict = {"image_path": [], "real": [], "prediction": []}
    generator = enqueuer.get()
    for i in tqdm(range(test_steps)):
        
        
        images, target, img_path = next(generator)
        if verbose:
          print(f'\n({i+1}/{test_steps}) predicting {img_path}...')
        
        start_batch = time.time()
        predicted_sentences = generate_batch(FLAGS, encoder, decoder, tokenizer_wrapper,
                                           images, eos_token_ids, no_repeat_ngram_size)
        time_taken = time.time() - start_batch

        csv_dict["prediction"].extend(predicted_sentences)
        csv_dict["image_path"].extend(img_path)

        target_sentences = [tokenizer_wrapper.filter_special_words((tokenizer_wrapper.GPT2_decode(toks))) for toks in target]
        csv_dict["real"].extend(target_sentences)

        if verbose:
          print('predicted sentences: ')
          for sentence in predicted_sentences:
            print(f'Length: {len(sentence.split())}')
            print(sentence)
          print('')
        
        print(f'Time taken for this batch: {time_taken:.3f}s, ({time_taken/images.shape[0]:.3f}s/image)')

    enqueuer.stop()

    print('Time taken for evaluation {} sec\n'.format(time.time() - start))
    tf.keras.backend.set_learning_phase(1)
    df = pd.DataFrame(csv_dict)
    if filename != None:
      print(f"Saving to {filename}")
      df.to_csv(filename, index=False)
    return df

So what's not working?

With the default eos token used in this code repository here (test.py line 52). The sentences generated seemed to not ended properly.

And the eos token in the paper which was mentioned that it was the standard GPT2 end of sentence token tokenizer_wrapper.GPT2_encode('<|endoftext|>', pad=False)[0] also seem to not work properly.

Both eos token variants generated the exact same sentences as shown below

Generated sentences of encoded "<|endoftext|>" as eos_token (tokenizer_wrapper.GPT2_encode('<|endoftext|>', pad=False)[0])

  0%|          | 0/3 [00:00<?, ?it/s]
(1/3) predicting ['CXR3247_IM-1538-1001.png']...
 33%|███▎      | 1/3 [00:14<00:28, 14.43s/it]predicted sentences: 
Length: 102
"no acute cardiopulmonary disease.
the heart, pulmonary xxxx and mediastinum are within normal limits. there is no pleural effusion or pneumothorax. there is no focal air space opacity to suggest a pneumonia. there are mild degenerative changes of the thoracic spine."  "no radiographic evidence for thoracic injury."  "no radiographic evidence for thoracic injury."  "no radiographic evidence for thoracic injury."  "no radiographic evidence for thoracic injury."  "no radiographic evidence for thoracic injury."  "no radiographic evidence for thoracic injury."  "no radiographic evidence for thoracic injury."  "no radiographic evidence for thoracic injury."  "no radiographic evidence for thoracic injury."  "no radiographic evidence for thoracic injury."  "no

Time taken for this batch: 14.266s, (14.266s/image)

(2/3) predicting ['CXR3483_IM-1692-1001.png']...
 67%|██████▋   | 2/3 [00:28<00:14, 14.09s/it]predicted sentences: 
Length: 122
"no acute pulmonary disease.
the lungs are clear. there is no pleural effusion. the heart and mediastinum are normal. there are atherosclerotic changes of the thoracic aorta. arthritic changes of the skeletal structures are noted."  "1. no pneumothorax or pleural effusion. surgical clips are present in the arthritic changes of the skeletal structures. surgical clips are present in the arthritic changes of the skeletal structures."  "no pneumothorax or pleural effusion."  "no pleural surgical clips are present in the arthritic changes of the skeletal structures."  "no pneumothorax or pleural surgical clips are present in the arthritic changes of the skeletal structures."  "no pleural surgical clips are present in the arthritic changes of the skeletal structures."  "no surgical clips are present in the arthritic

Time taken for this batch: 13.843s, (13.843s/image)

(3/3) predicting ['CXR1353_IM-0230-2001.png']...
100%|██████████| 3/3 [00:42<00:00, 14.01s/it]predicted sentences: 
Length: 121
"right middle lobe infiltrate consistent with pneumonia.
the heart is normal in size. the pulmonary vascularity is within normal limits in the lungs are clear. a large hiatal hernia is noted. calcified left hilar lymph xxxx are noted. there are surgical clips in the left lung base. a hiatal hernia is noted."  "left picc line has been removed."  "left picc line has been removed."  "left picc line has been removed."  "left picc line has been removed."  "left picc line has been removed."  "left picc line has been removed."  "left picc line has been removed."  "left picc line has been removed."  "left picc line has been removed."  "left picc line has been removed."  "left picc line has been removed."  "left picc line

Time taken for this batch: 13.740s, (13.740s/image)
Time taken for evaluation 42.03989219665527 sec

Generated sentences of default eos token (tokenizer_wrapper.GPT2_eos_token_id())

  0%|          | 0/3 [00:00<?, ?it/s]
(1/3) predicting ['CXR3247_IM-1538-1001.png']...
 33%|███▎      | 1/3 [00:14<00:28, 14.31s/it]predicted sentences: 
Length: 102
"no acute cardiopulmonary disease.
the heart, pulmonary xxxx and mediastinum are within normal limits. there is no pleural effusion or pneumothorax. there is no focal air space opacity to suggest a pneumonia. there are mild degenerative changes of the thoracic spine."  "no radiographic evidence for thoracic injury."  "no radiographic evidence for thoracic injury."  "no radiographic evidence for thoracic injury."  "no radiographic evidence for thoracic injury."  "no radiographic evidence for thoracic injury."  "no radiographic evidence for thoracic injury."  "no radiographic evidence for thoracic injury."  "no radiographic evidence for thoracic injury."  "no radiographic evidence for thoracic injury."  "no radiographic evidence for thoracic injury."  "no

Time taken for this batch: 14.129s, (14.129s/image)

(2/3) predicting ['CXR3483_IM-1692-1001.png']...
 67%|██████▋   | 2/3 [00:28<00:14, 14.03s/it]predicted sentences: 
Length: 122
"no acute pulmonary disease.
the lungs are clear. there is no pleural effusion. the heart and mediastinum are normal. there are atherosclerotic changes of the thoracic aorta. arthritic changes of the skeletal structures are noted."  "1. no pneumothorax or pleural effusion. surgical clips are present in the arthritic changes of the skeletal structures. surgical clips are present in the arthritic changes of the skeletal structures."  "no pneumothorax or pleural effusion."  "no pleural surgical clips are present in the arthritic changes of the skeletal structures."  "no pneumothorax or pleural surgical clips are present in the arthritic changes of the skeletal structures."  "no pleural surgical clips are present in the arthritic changes of the skeletal structures."  "no surgical clips are present in the arthritic

Time taken for this batch: 13.827s, (13.827s/image)

(3/3) predicting ['CXR1353_IM-0230-2001.png']...
100%|██████████| 3/3 [00:41<00:00, 13.93s/it]predicted sentences: 
Length: 121
"right middle lobe infiltrate consistent with pneumonia.
the heart is normal in size. the pulmonary vascularity is within normal limits in the lungs are clear. a large hiatal hernia is noted. calcified left hilar lymph xxxx are noted. there are surgical clips in the left lung base. a hiatal hernia is noted."  "left picc line has been removed."  "left picc line has been removed."  "left picc line has been removed."  "left picc line has been removed."  "left picc line has been removed."  "left picc line has been removed."  "left picc line has been removed."  "left picc line has been removed."  "left picc line has been removed."  "left picc line has been removed."  "left picc line has been removed."  "left picc line

Time taken for this batch: 13.628s, (13.628s/image)
Time taken for evaluation 41.78707838058472 sec

Summary of what's wrong

Both eos token variants took around 40 seconds to generated 3 sentences of batch_size=1, and all sentences seemed to not have ended properly.

The possible fix

I've discovered that tokenizer_wrapper.GPT2_encode("seq", pad=False)[0] works as a valid eos token, as in it manages to end generated sentences properly.

Generated sentences of encoded "seq" as eos token (tokenizer_wrapper.GPT2_encode("seq", pad=False)[0])

  0%|          | 0/3 [00:00<?, ?it/s]
(1/3) predicting ['CXR3247_IM-1538-1001.png']...
 33%|███▎      | 1/3 [00:04<00:08,  4.14s/it]predicted sentences: 
Length: 30
"no acute pulmonary disease.
the lungs are clear. there is no pleural effusion or pneumothorax. the heart and mediastinum are normal. the skeletal structures and soft tissues are normal." end

Time taken for this batch: 3.929s, (3.929s/image)

(2/3) predicting ['CXR3483_IM-1692-1001.png']...
 67%|██████▋   | 2/3 [00:09<00:04,  4.71s/it]predicted sentences: 
Length: 36
"no acute pulmonary disease.
the lungs are clear. there is no pleural effusion. the heart and mediastinum are normal. there are atherosclerotic changes of the thoracic aorta. arthritic changes of the skeletal structures are noted." end

Time taken for this batch: 5.100s, (5.100s/image)

(3/3) predicting ['CXR1353_IM-0230-2001.png']...
100%|██████████| 3/3 [00:13<00:00,  4.58s/it]predicted sentences: 
Length: 35
"right middle lobe and lower lobe pneumonia.
right middle lobe and lower lobe consolidation and bilateral costophrenic xxxx blunting is present. heart size normal. pulmonary vascularity is normal. there is a large hiatal hernia." end

Time taken for this batch: 4.461s, (4.461s/image)
Time taken for evaluation 13.733989238739014 sec

Each sentence now took much shorter time to generate (around 13s), and they seem to have ended properly.

Summary of this section

The default GPT2 end of sentence token and a manually encoded string <|endoftext|> did not seem to work as a proper eos token. Instead, a manually encoded string seq seems to work with unknown reason. If possible I would like to know the reasons behind this.

Not being able to reproduce the exact predictions and large gap in clinical accuracy

VisualCheXbert_CDGPT2.ipynb

This Python notebook contains code to evaluate predicted results generated from CDGPT-2 in form of clinical accuracy

The issue

I cannot seem to reproduce the exact prediction results which is attached in your provided model checkpoint folder here

And while I haven't evaluated my prediction results with the metrics in the paper to compare if the results were close enough, I've evaluated the results with clinical accuracy using VisualCheXbert

To explain briefly, VisualCheXbert is a model that take chest x-ray report as input, and then output the labels of diseases found in the text in the following categories: Fracture, Consolidation, Enlarged Cardiomediastinum, No Finding, Pleural Other, Cardiomegaly, Pneumothorax, Atelectasis, Support Devices, Edema, Pleural Effusion, Lung Lesion, and Lung Opacity

I tried to reproduce results with the same test case you used (testing_set.csv), changed the configuration to be as close as possible to the config.json file you've provided. What I've changed are:

tokenizer_vocab_size from default of 1001 to 2000
tags_threshold from default of -1 to 0.1

However, the model seems to predict a completely different sentences which can be found here. I've also attached a side-by-side comparison between your predictions and my reproduced prediction attempt here. original column is your predictions from your checkpoint folder, and reproduced is my reproducing attempt.

I've evaluated the clinical accuracy of both your predictions.csv and my attempt to reproduce prediction_reproduced.csv by comparing the labels generated by VisualCheXbert between ground truth of the predictions and the predicted sentences.

The clinical accuracy evaluated by VisualCheXbert has shown a large gap in performance between your provided predictions.csv and the prediction_reproduced.csv despite the effort to adjust all the settings to be as close as the config.json file. The largest gap can be seen in precision metrics, while on recall metrics both your predictions and reproduced predictions seem to be closer to each other. This results in a significant difference in F1 score metrics. I've attached bar charts to further provide information in the next section as well (which are the same charts found in VisualCheXbert_CDGPT2.ipynb

Even without changing any settings from the default configs in this code repository, the clinical accuracy is still not as close as your original results (unfortunately I didn't keep any results here, but if needed I can try evaluating with different configs).

Precision

original: Evaluation results of your provided predictions.csv
reproduced: Evaluation results of my attempt to reproduce prediction_reproduced.csv

Recall

original: Evaluation results of your provided predictions.csv
reproduced: Evaluation results of my attempt to reproduce prediction_reproduced.csv

F1 Score

original: Evaluation results of your provided predictions.csv
reproduced: Evaluation results of my attempt to reproduce prediction_reproduced.csv

Error : AttributeError: 'generator' object has no attribute 'steps'

Hello Dear Omer....!
When I run this code, after running some epochs I found an error like this :

Epoch 4 Batch 1726 Loss 0.0887
Epoch 4 Batch 1727 Loss 0.0891
Epoch 4 Batch 1728 Loss 0.0794
Epoch 4 Batch 1729 Loss 0.0219
Epoch 4 Batch 1730 Loss 0.1107
Epoch 4 Batch 1731 Loss 0.0695
Epoch 4 Batch 1732 Loss 0.0707
Epoch 4 Loss 0.057019
Time taken for 1 epoch 3001.0791635513306 sec

Batches that took long: 0
Evaluating on test set..

AttributeError Traceback (most recent call last)
in ()
214 print("Evaluating on test set..")
215 train_enqueuer.stop()
--> 216 current_scores = evaluate_enqueuer(test_enqueuer, FLAGS, encoder, decoder, tokenizer_wrapper)
217 time_csv['epoch'].append(epoch + 1)
218 time_csv['time_taken'].append(pure_training_time)

/content/drive/My Drive/Colab Notebooks/GPT2-Chest-X-Ray-Report-Generation-master/test.py in evaluate_enqueuer(enqueuer, FLAGS, encoder, decoder, tokenizer_wrapper, name, verbose, write_json, write_images, test_mode)
112 csv_dict = {"image_path": [], "real": [], "prediction": []}
113 generator = enqueuer.get()
--> 114 for batch in tqdm(list(range(generator.steps))):
115 images, target, img_path = next(generator)
116

AttributeError: 'generator' object has no attribute 'steps'

------------> Now, I am runing "test.py" file and same error occurred. <-------------

AttributeError Traceback (most recent call last)
in ()
175 ckpt.restore(ckpt_manager.latest_checkpoint)
176 print("Restored from checkpoint: {}".format(ckpt_manager.latest_checkpoint))
--> 177 evaluate_enqueuer(test_enqueuer, FLAGS, encoder, decoder, tokenizer_wrapper, write_images=True, test_mode=True)

in evaluate_enqueuer(enqueuer, FLAGS, encoder, decoder, tokenizer_wrapper, name, verbose, write_json, write_images, test_mode)
112 csv_dict = {"image_path": [], "real": [], "prediction": []}
113 generator = enqueuer.get()
--> 114 for batch in tqdm(list(range(generator.steps))):
115 images, target, img_path = next(generator)
116

AttributeError: 'generator' object has no attribute 'steps'

Missing License

Hi @omar-mohamed,
Thanks for making such beautiful code repository and I'd learn from it.

You forgot to include the license, since I want to modify it in the future,
Thanks,
aviezab

Dataset did not match with paper code.

Hi,
I am implementing (Automated radiology report generation using conditioned transformers) code, but there is an error of "/IU-XRay/images/CXR1657_IM-0432-3003.png'" path issue. There are no .png images are available in IU-Xray dataset. Only .csv file are available by this image name.
Please if you have a dataset regarding this paper, share a downloadable link with me. So that I proceed my implementation task.
Thanks .
Qaiser Shahzad
[email protected]
Comsats University Islamabad, Pakistan

ModuleNotFoundError: No module named 'transformers.configuration_gpt2'

Hi, your research is very interesting; right now I'm trying to replicate it for leukemia issues, but I have a problem when I try to run the pretrained_model.py file, it throws me the following error
ModuleNotFoundError: No module named 'transformers.configuration_gpt2'

I hope you can help me.

Cheers

Why is there an optimizer in the test.py file?

Hey, I found the following lines in test.py:

`
optimizer = tf.keras.optimizers.Adam()

ckpt = tf.train.Checkpoint(encoder=encoder,
                           decoder=decoder,
                           optimizer=optimizer)

ckpt_manager = tf.train.CheckpointManager(ckpt, FLAGS.ckpt_path, max_to_keep=1)`

And I am also not sure where are the model weights loaded.
Why are the above lines needed at inference time?

ModuleNotFoundError: No module named 'gpt2.gpt2_model'

Hi pls help,
Getting this error when I run python3.7 test.py
ModuleNotFoundError: No module named 'gpt2.gpt2_model'

Type Error how to solve

New Text Document.txt
Throwing TypeError

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

omar-mohamed / gpt2-chest-x-ray-report-generation Goto Github PK

gpt2-chest-x-ray-report-generation's People

Contributors

Stargazers

Watchers

Forkers

gpt2-chest-x-ray-report-generation's Issues

Default EOS Token Not Working

My code modification

So what's not working?

Generated sentences of encoded "<|endoftext|>" as eos_token (tokenizer_wrapper.GPT2_encode('<|endoftext|>', pad=False)[0])

Generated sentences of default eos token (tokenizer_wrapper.GPT2_eos_token_id())

Summary of what's wrong

The possible fix

Generated sentences of encoded "seq" as eos token (tokenizer_wrapper.GPT2_encode("seq", pad=False)[0])

Summary of this section

Not being able to reproduce the exact predictions and large gap in clinical accuracy

The issue

Precision

Recall

F1 Score

Hello Dear Omer....! When I run this code, after running some epochs I found an error like this :

Batches that took long: 0 Evaluating on test set..

Recommend Projects

Recommend Topics

Recommend Org

Hello Dear Omer....!
When I run this code, after running some epochs I found an error like this :

Batches that took long: 0
Evaluating on test set..