ashologn / paragraph-level-simplification-of-medical-texts Goto Github PK

View Code? Open in Web Editor NEW

22.0 22.0 5.0 20.07 MB

License: Creative Commons Attribution 4.0 International

Python 94.31% Shell 5.69%

paragraph-level-simplification-of-medical-texts's Issues

How to generate the bart_freq_normalized_idx.txt

Hello,

I wanted to ask how did you generate the files in logr_weights ? Also, can you provide the code to generate these weights ?
Also where is the code for MLM based metric(SCIBERT) as proposed in paper ?

How are ROUGE/BLEU/SARI calculated?

Hi @AshOlogn,

I'd like to replicate the evaluation of the pre-trained models. How exactly were ROUGE/BLEU/SARI (Table 6 in paper) computed? Could you provide your evaluation script? I made an attempt with a custom evaluation script and get results which are quite different from the paper (see below).

Thanks!

Attachment

This is what I tried.

Create environment (see below)
Download pre-trained model bart-no-ul as per README
Run generation for test set sh scripts/generate/bart_gen_no-ul.sh (with --generate_end_index=None)
Evaluate with a custom script (see below)

R-1 = 46.94      // paper: 40.0
R-2 = 19.22      // paper: 15.0
R-L = 43.77      // paper: 37.0
BLEU = 15.73     // paper: 44.0
SARI = 35.44     // paper: 38.0

environment

conda create -n parasimp \
  python=3.7 \
  pytorch=1.7.1 \
  cudatoolkit=11.0 \
  -c pytorch -c defaults

conda activate parasimp
pip install pytorch-lightning==0.9.0 transformers==3.3.1 rouge_score nltk gdown
pip install -U "protobuf<=3.21" 

git clone https://github.com/feralvam/easse.git
cd easse
pip install -e .

evaluate.py

import json

from easse.sari import corpus_sari
from easse.bleu import corpus_bleu
from utils import calculate_rouge

with open('trained_models/bart-no-ul/gen_nucleus_test_1_0-none.json') as fin:
    sys_sents = json.load(fin)
    sys_sents = [x['gen'] for x in sys_sents]
with open('data/data-1024/test.source') as fin:
    orig_sents = [l.strip() for l in fin.readlines()]
with open('data/data-1024/test.target') as fin:
    refs_sents = [l.strip() for l in fin.readlines()]

scores = calculate_rouge(sys_sents, refs_sents)
print('R-1 = {:.2f}'.format(scores['rouge1']))
print('R-2 = {:.2f}'.format(scores['rouge2']))
print('R-L = {:.2f}'.format(scores['rougeLsum']))

bleu = corpus_bleu(
    sys_sents=sys_sents,
    refs_sents=[[t for t in refs_sents]],
    lowercase=False
)
print(f'BLEU = {bleu:.2f}')

sari = corpus_sari(
    orig_sents=orig_sents,  
    sys_sents=sys_sents, 
    refs_sents=[[t for t in refs_sents]]
)
print(f'SARI = {sari:.2f}')

How is the "Masked language models" for readability evaluation calculated?

I notice you propose adopting modern masked language models as another means of scoring the ‘technicality’ of text.
But i didn't find your code for this part in repo.
Can you figure it out for me ?

Tools to calculate fk score, ari score and sciBert score

Hi,

May I ask what tools you use to calculate readability scores and SciBERT scores? I tested fk scores and ari scores with different tools but they reported quite different results.

text: "We identified five RCTs (1330 participants) that met the inclusion criteria. None of the included trials examined regimens of less than six months duration. Fluoroquinolones added to standard regimens A single trial (174 participants) added levofloxacin to the standard first-line regimen. Relapse and treatment failure were not reported. For death, sputum conversion, and adverse events we are uncertain if there is an effect (one trial, 174 participants, very low quality evidence for all three outcomes). Fluoroquinolones substituted for ethambutol in standard regimens Three trials (723 participants) substituted ethambutol with moxifloxacin, gatifloxacin, and ofloxacin into the standard first-line regimen. For relapse, we are uncertain if there is an effect (one trial, 170 participants, very low quality evidence). No trials reported on treatment failure. For death, sputum culture conversion at eight weeks, or serious adverse events we do not know if there was an effect (three trials, 723 participants, very low quality evidence for all three outcomes). Fluoroquinolones substituted for isoniazid in standard regimens A single trial (433 participants) substituted moxifloxacin for isoniazid. Treatment failure and relapse were not reported. For death, sputum culture conversion, or serious adverse events the substitution may have little or no difference (one trial, 433 participants, low quality evidence for all three outcomes). Fluoroquinolines in four month regimens Six trials are currently in progress testing shorter regimens with fluoroquinolones. Ofloxacin, levofloxacin, moxifloxacin, and gatifloxacin have been tested in RCTs of standard first-line regimens based on rifampicin and pyrazinamide for treating drug-sensitive TB. There is insufficient evidence to be clear whether addition or substitution of fluoroquinolones for ethambutol or isoniazid in the first-line regimen reduces death or relapse, or increases culture conversion at eight weeks. Much larger trials with fluoroquinolones in short course regimens of four months are currently in progress."

FK score:
textstat (https://pypi.org/project/textstat/): 12.8;
readability (https://pypi.org/project/py-readability-metrics/): 14.2

Cannot create data_final_1024.json from data.json

The script process.py does not create data_final_1024.json from data.json. The file given as data_final_1024.json has different format when generated using process.py

Is it the right script to generate it ?

ashologn / paragraph-level-simplification-of-medical-texts Goto Github PK

paragraph-level-simplification-of-medical-texts's Issues

How to generate the bart_freq_normalized_idx.txt

How are ROUGE/BLEU/SARI calculated?

Attachment

How is the "Masked language models" for readability evaluation calculated?

Tools to calculate fk score, ari score and sciBert score

Cannot create data_final_1024.json from data.json

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent