ashologn / paragraph-level-simplification-of-medical-texts Goto Github PK

License: Creative Commons Attribution 4.0 International

Python 94.31% Shell 5.69%

paragraph-level-simplification-of-medical-texts's Introduction

Paragraph-level Simplification of Medical Texts

Code and data for our NAACL 2021 paper "Paragraph-level Simplification of Medical Texts," which can be found here. If you have any questions about the code or the paper, feel free to email me at [email protected]. If you find our data and/or code useful in your work, please include the following citation:

@inproceedings{devaraj-etal-2021-paragraph,
    title = "Paragraph-level Simplification of Medical Texts",
    author = "Devaraj, Ashwin and Marshall, Iain and Wallace, Byron and Li, Junyi Jessy",
    booktitle = "Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics",
    month = jun,
    year = "2021",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2021.naacl-main.395",
    pages = "4972--4984",
}

Dependencies

pytorch
pytorch-lightning==0.9.0
transformers==3.3.1
rouge_score
nltk

Data

The Cochrane dataset that we used can be found in the data directory. To scrape your own dataset from the Cochrane website, run the following command, which creates a directory called scraped_data and populates it with the raw dataset data.json.

python prepare_data/scrape.py

Then run the following command to process the raw dataset generated by the above script. It creates a cleaned up and length-filtered version of the original dataset, which can be found at scraped_data/data_final_1024.json.

python prepare_data/process.py

Finally, create a train-val-test split of this dataset by running the following command, which creates the directory scraped_data/data-1024, which contains the split data.

python prepare_data/split_dataset.py

To use this dataset for training or text generation, copy the directory into the data folder.

Train Model

There are 4 different training settings explored in the paper: no unlikelihood training and unlikelihood training with 3 different sets of weights used in the loss function (Cochrane, Newsela, and both). To train a model under one of these settings, run one of the following scripts: scripts/train/bart-no-ul.sh or scripts/bart-ul_{cochrane, newsela, both}.sh.

Pretrained Models

The pretrained models corresponding to the 4 training settings can be found here. To use these models, unzip the directories and place them in the trained_models directory.

Generate Text

To generate text using trained models, run one of the following scripts: scripts/gen/bart_gen_{no-ul, cochrane, newsela, both}.sh. Decoding hyperparameters can be controlled by modifying the command-line arguments listed in the scripts. Both a text and JSON version of the generated text will be written to the directory containing the model (i.e. trained_models/bart-ul_cochrane).

paragraph-level-simplification-of-medical-texts's People

Contributors

Stargazers

Watchers

Forkers

qq12487 abhinavbh08 jantrienes julianpollmann utkarsh5026

paragraph-level-simplification-of-medical-texts's Issues

Tools to calculate fk score, ari score and sciBert score

Hi,

May I ask what tools you use to calculate readability scores and SciBERT scores? I tested fk scores and ari scores with different tools but they reported quite different results.

text: "We identified five RCTs (1330 participants) that met the inclusion criteria. None of the included trials examined regimens of less than six months duration. Fluoroquinolones added to standard regimens A single trial (174 participants) added levofloxacin to the standard first-line regimen. Relapse and treatment failure were not reported. For death, sputum conversion, and adverse events we are uncertain if there is an effect (one trial, 174 participants, very low quality evidence for all three outcomes). Fluoroquinolones substituted for ethambutol in standard regimens Three trials (723 participants) substituted ethambutol with moxifloxacin, gatifloxacin, and ofloxacin into the standard first-line regimen. For relapse, we are uncertain if there is an effect (one trial, 170 participants, very low quality evidence). No trials reported on treatment failure. For death, sputum culture conversion at eight weeks, or serious adverse events we do not know if there was an effect (three trials, 723 participants, very low quality evidence for all three outcomes). Fluoroquinolones substituted for isoniazid in standard regimens A single trial (433 participants) substituted moxifloxacin for isoniazid. Treatment failure and relapse were not reported. For death, sputum culture conversion, or serious adverse events the substitution may have little or no difference (one trial, 433 participants, low quality evidence for all three outcomes). Fluoroquinolines in four month regimens Six trials are currently in progress testing shorter regimens with fluoroquinolones. Ofloxacin, levofloxacin, moxifloxacin, and gatifloxacin have been tested in RCTs of standard first-line regimens based on rifampicin and pyrazinamide for treating drug-sensitive TB. There is insufficient evidence to be clear whether addition or substitution of fluoroquinolones for ethambutol or isoniazid in the first-line regimen reduces death or relapse, or increases culture conversion at eight weeks. Much larger trials with fluoroquinolones in short course regimens of four months are currently in progress."

FK score:
textstat (https://pypi.org/project/textstat/): 12.8;
readability (https://pypi.org/project/py-readability-metrics/): 14.2

How are ROUGE/BLEU/SARI calculated?

Hi @AshOlogn,

I'd like to replicate the evaluation of the pre-trained models. How exactly were ROUGE/BLEU/SARI (Table 6 in paper) computed? Could you provide your evaluation script? I made an attempt with a custom evaluation script and get results which are quite different from the paper (see below).

Thanks!

Attachment

This is what I tried.

Create environment (see below)
Download pre-trained model bart-no-ul as per README
Run generation for test set sh scripts/generate/bart_gen_no-ul.sh (with --generate_end_index=None)
Evaluate with a custom script (see below)

R-1 = 46.94      // paper: 40.0
R-2 = 19.22      // paper: 15.0
R-L = 43.77      // paper: 37.0
BLEU = 15.73     // paper: 44.0
SARI = 35.44     // paper: 38.0

environment

conda create -n parasimp \
  python=3.7 \
  pytorch=1.7.1 \
  cudatoolkit=11.0 \
  -c pytorch -c defaults

conda activate parasimp
pip install pytorch-lightning==0.9.0 transformers==3.3.1 rouge_score nltk gdown
pip install -U "protobuf<=3.21" 

git clone https://github.com/feralvam/easse.git
cd easse
pip install -e .

evaluate.py

import json

from easse.sari import corpus_sari
from easse.bleu import corpus_bleu
from utils import calculate_rouge

with open('trained_models/bart-no-ul/gen_nucleus_test_1_0-none.json') as fin:
    sys_sents = json.load(fin)
    sys_sents = [x['gen'] for x in sys_sents]
with open('data/data-1024/test.source') as fin:
    orig_sents = [l.strip() for l in fin.readlines()]
with open('data/data-1024/test.target') as fin:
    refs_sents = [l.strip() for l in fin.readlines()]

scores = calculate_rouge(sys_sents, refs_sents)
print('R-1 = {:.2f}'.format(scores['rouge1']))
print('R-2 = {:.2f}'.format(scores['rouge2']))
print('R-L = {:.2f}'.format(scores['rougeLsum']))

bleu = corpus_bleu(
    sys_sents=sys_sents,
    refs_sents=[[t for t in refs_sents]],
    lowercase=False
)
print(f'BLEU = {bleu:.2f}')

sari = corpus_sari(
    orig_sents=orig_sents,  
    sys_sents=sys_sents, 
    refs_sents=[[t for t in refs_sents]]
)
print(f'SARI = {sari:.2f}')

How to generate the bart_freq_normalized_idx.txt

Hello,

I wanted to ask how did you generate the files in logr_weights ? Also, can you provide the code to generate these weights ?
Also where is the code for MLM based metric(SCIBERT) as proposed in paper ?

Cannot create data_final_1024.json from data.json

The script process.py does not create data_final_1024.json from data.json. The file given as data_final_1024.json has different format when generated using process.py

Is it the right script to generate it ?

How is the "Masked language models" for readability evaluation calculated?

I notice you propose adopting modern masked language models as another means of scoring the ‘technicality’ of text.
But i didn't find your code for this part in repo.
Can you figure it out for me ?