Coder Social home page Coder Social logo

paragraph-level-simplification-of-medical-texts's Issues

How to generate the bart_freq_normalized_idx.txt

Hello,

  • I wanted to ask how did you generate the files in logr_weights ? Also, can you provide the code to generate these weights ?
  • Also where is the code for MLM based metric(SCIBERT) as proposed in paper ?

How are ROUGE/BLEU/SARI calculated?

Hi @AshOlogn,

I'd like to replicate the evaluation of the pre-trained models. How exactly were ROUGE/BLEU/SARI (Table 6 in paper) computed? Could you provide your evaluation script? I made an attempt with a custom evaluation script and get results which are quite different from the paper (see below).

Thanks!

Attachment

This is what I tried.

  1. Create environment (see below)
  2. Download pre-trained model bart-no-ul as per README
  3. Run generation for test set sh scripts/generate/bart_gen_no-ul.sh (with --generate_end_index=None)
  4. Evaluate with a custom script (see below)
R-1 = 46.94      // paper: 40.0
R-2 = 19.22      // paper: 15.0
R-L = 43.77      // paper: 37.0
BLEU = 15.73     // paper: 44.0
SARI = 35.44     // paper: 38.0 

environment

conda create -n parasimp \
  python=3.7 \
  pytorch=1.7.1 \
  cudatoolkit=11.0 \
  -c pytorch -c defaults

conda activate parasimp
pip install pytorch-lightning==0.9.0 transformers==3.3.1 rouge_score nltk gdown
pip install -U "protobuf<=3.21" 

git clone https://github.com/feralvam/easse.git
cd easse
pip install -e .

evaluate.py

import json

from easse.sari import corpus_sari
from easse.bleu import corpus_bleu
from utils import calculate_rouge

with open('trained_models/bart-no-ul/gen_nucleus_test_1_0-none.json') as fin:
    sys_sents = json.load(fin)
    sys_sents = [x['gen'] for x in sys_sents]
with open('data/data-1024/test.source') as fin:
    orig_sents = [l.strip() for l in fin.readlines()]
with open('data/data-1024/test.target') as fin:
    refs_sents = [l.strip() for l in fin.readlines()]

scores = calculate_rouge(sys_sents, refs_sents)
print('R-1 = {:.2f}'.format(scores['rouge1']))
print('R-2 = {:.2f}'.format(scores['rouge2']))
print('R-L = {:.2f}'.format(scores['rougeLsum']))

bleu = corpus_bleu(
    sys_sents=sys_sents,
    refs_sents=[[t for t in refs_sents]],
    lowercase=False
)
print(f'BLEU = {bleu:.2f}')

sari = corpus_sari(
    orig_sents=orig_sents,  
    sys_sents=sys_sents, 
    refs_sents=[[t for t in refs_sents]]
)
print(f'SARI = {sari:.2f}')

Tools to calculate fk score, ari score and sciBert score

Hi,

May I ask what tools you use to calculate readability scores and SciBERT scores? I tested fk scores and ari scores with different tools but they reported quite different results.

text: "We identified five RCTs (1330 participants) that met the inclusion criteria. None of the included trials examined regimens of less than six months duration. Fluoroquinolones added to standard regimens A single trial (174 participants) added levofloxacin to the standard first-line regimen. Relapse and treatment failure were not reported. For death, sputum conversion, and adverse events we are uncertain if there is an effect (one trial, 174 participants, very low quality evidence for all three outcomes). Fluoroquinolones substituted for ethambutol in standard regimens Three trials (723 participants) substituted ethambutol with moxifloxacin, gatifloxacin, and ofloxacin into the standard first-line regimen. For relapse, we are uncertain if there is an effect (one trial, 170 participants, very low quality evidence). No trials reported on treatment failure. For death, sputum culture conversion at eight weeks, or serious adverse events we do not know if there was an effect (three trials, 723 participants, very low quality evidence for all three outcomes). Fluoroquinolones substituted for isoniazid in standard regimens A single trial (433 participants) substituted moxifloxacin for isoniazid. Treatment failure and relapse were not reported. For death, sputum culture conversion, or serious adverse events the substitution may have little or no difference (one trial, 433 participants, low quality evidence for all three outcomes). Fluoroquinolines in four month regimens Six trials are currently in progress testing shorter regimens with fluoroquinolones. Ofloxacin, levofloxacin, moxifloxacin, and gatifloxacin have been tested in RCTs of standard first-line regimens based on rifampicin and pyrazinamide for treating drug-sensitive TB. There is insufficient evidence to be clear whether addition or substitution of fluoroquinolones for ethambutol or isoniazid in the first-line regimen reduces death or relapse, or increases culture conversion at eight weeks. Much larger trials with fluoroquinolones in short course regimens of four months are currently in progress."

FK score:
textstat (https://pypi.org/project/textstat/): 12.8;
readability (https://pypi.org/project/py-readability-metrics/): 14.2

Cannot create data_final_1024.json from data.json

The script process.py does not create data_final_1024.json from data.json. The file given as data_final_1024.json has different format when generated using process.py

Is it the right script to generate it ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.