Coder Social home page Coder Social logo

neuraltextsimplification's Introduction

Exploring Neural Text Simplification

Abstract

We present the first attempt at using sequence to sequence neural networks to model text simplification (TS). Unlike the previously proposed automated methods, our neural text simplification (NTS) systems are able to simultaneously perform lexical simplification and content reduction. An extensive human evaluation of the output has shown that NTS systems achieve good grammaticality and meaning preservation of output sentences and higher level of simplification than the state-of-the-art automated TS systems. We train our models on the Wikipedia corpus containing good and good partial alignments.

	@InProceedings{neural-text-simplification,
	  author    = {Sergiu Nisioi and Sanja Štajner and Simone Paolo Ponzetto and Liviu P. Dinu},
	  title     = {Exploring Neural Text Simplification Models},
	  booktitle = {{ACL} {(2)}},
	  publisher = {The Association for Computational Linguistics},
	  year      = {2017}
	}

Simplify Text | Generate Predictions (no GPUs needed)

  1. OpenNMT dependencies
    1. Install Torch
    2. Install additional packages:
    luarocks install tds
  2. Checkout this repository including the submodules:
   git clone --recursive https://github.com/senisioi/NeuralTextSimplification.git
  1. Download the pre-trained released models NTS and NTS-w2v (NOTE: when using the released pre-trained models, due to recent changes in third party software, the output of our systems might not be identical to the one reported in the paper.)
   python src/download_models.py ./models
  1. Run translate.sh from the scripts dir:
   cd src/scripts
   ./translate.sh
  1. Check the predictions in the results directory:
   cd ../../results_NTS
  1. Run automatic evaluation metrics
    1. Install the python requirements (only nltk is needed)
       pip install -r src/requirements.txt
    1. Run the evaluate script
       python src/evaluate.py ./data/test.en ./data/references/references.tsv ./predictions/

The Content of this Repository

./src

  • download_models.py a script to download the pre-trained models. The models are released to be usable on machines with or without GPUs. They can't be used to continue the training session. In case the download script fails, you may use the direct links for NTS and NTS-w2v
  • train_word2vec.py a script that creates a word2vec model from a local corpus, using gensim
  • SARI.py a copy of the SARI implementation
  • evaluate.py evaluates BLEU and SARI scores given a source file, a directory of predictions and a reference file in tsv format
  • ./scripts - contains some of our scripts that we used to preprocess the data, output translations, and create the concatenated embeddings
  • ./patch - the patch with some changes that need to be applied, in case you may want to use the latest checkout of OpenNMT. Alternatively, you may use our forked code which comes directly as a submodule.

./configs

Contains the OpenNMT config file. To train, please update the config file with the appropriate data on your local system and run

	th train -config $PATH_TO_THIS_DIR/configs/NTS.cfg

./predictions

Contains predictions from previous systems (Wubben et al., 2012), (Glavas and Stajner, 2015), and (Xu et al., 2016), and the generated predictions of the NTS models reported in the paper:

  • NTS_default_b5_h1 - the default model, beam size 5, hypothesis 1

  • NTS_BLEU_b12_h1 - the BLEU best ranked model, beam size 12, hypothesis 1

  • NTS_SARI_b5_h2 - the SARI best ranked model, beam size 12, hypothesis 1

  • NTS-w2v_default_b5_h1 - the default model, beam size 5, hypothesis 1

  • NTS-w2v_BLEU_b12_h1 - the BLEU best ranked model, beam size 12, hypothesis 1

  • NTS-w2v_SARI_b12_h2 - the SARI best ranked model, beam size 12, hypothesis 2

./data

Contains the training, testing, and reference sentences used to train and evaluate our models.

neuraltextsimplification's People

Contributors

baragona avatar senisioi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

neuraltextsimplification's Issues

Running Problem for the third command

Can anyone tell me why the process of the third command stopped at this stage?
(py3) Apples-iMac:NeuralTextSimplification apple$ python src/download_models.py ./models [LINE:53]# INFO [2019-01-22 15:06:32,095] Saving files to: ./models [LINE:10]# INFO [2019-01-22 15:06:32,096] Downloading 0B_pjS_ZjPfT9dEtrbV85UXhSelU to ./models/NTS_epoch11_10.19.t7 [LINE:11]# INFO [2019-01-22 15:06:32,096] Please be patient, it may take a while... [LINE:824]# DEBUG [2019-01-22 15:06:32,222] Starting new HTTPS connection (1): docs.google.com [LINE:396]# DEBUG [2019-01-22 15:06:32,599] https://docs.google.com:443 "GET /uc?export=download&id=0B_pjS_ZjPfT9dEtrbV85UXhSelU HTTP/1.1" 200 None [LINE:16]# INFO [2019-01-22 15:06:32,600] ... [LINE:824]# DEBUG [2019-01-22 15:06:32,602] Starting new HTTPS connection (2): docs.google.com [LINE:396]# DEBUG [2019-01-22 15:06:32,891] https://docs.google.com:443 "GET /uc?export=download&confirm=5MMl&id=0B_pjS_ZjPfT9dEtrbV85UXhSelU HTTP/1.1" 302 None [LINE:824]# DEBUG [2019-01-22 15:06:32,894] Starting new HTTPS connection (1): doc-0o-1g-docs.googleusercontent.com [LINE:396]# DEBUG [2019-01-22 15:06:35,749] https://doc-0o-1g-docs.googleusercontent.com:443 "GET /docs/securesc/ha0ro937gcuc7l7deffksulhg5h7mbp1/q8872a9n7poj4sqt0mouh1jdqbo4ou8h/1548136800000/06297222380386734599/*/0B_pjS_ZjPfT9dEtrbV85UXhSelU?e=download HTTP/1.1" 200 None

License?

Hi Sergiu, would you mind indicating what license this code is under? E.g. is it open source (like MIT, Apache...)

Translator.lua

user@zhomart:~/torch/src/scripts$ ./translate.sh
/home/user/torch/install/bin/luajit: ./onmt/translate/Translator.lua:94: attempt to index field 'opt' (a nil value)
stack traceback:
./onmt/translate/Translator.lua:94: in function '__init'
/home/user/torch/install/share/lua/5.1/torch/init.lua:91: in function 'new'
translate.lua:53: in function 'main'
translate.lua:206: in main chunk
[C]: in function 'dofile'
...user/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x00405d50
Check results in /home/user/torch/results_NTS/result_NTS_epoch11_10.19.t7_5

Unable to get the output

I followed the exact instructions given in the Readme section but I do not get the correct output. The output generated is the same as the input. Can you please help me out? @senisioi

I am using Lua 5.3.4 and torch 7.
The model I am testing is NTS_epoch11_10.19_release.t7.

Error: Inconsistent tensor size

image

While running translate.sh, I am getting the following error:
/home/ubuntu/torch/install/bin/luajit: /home/ubuntu/torch/install/share/lua/5.1/nn/JoinTable.lua:39: inconsistent tensor size at /home/ubuntu/torch/pkg/torch/lib/TH/generic/THTensorCopy.c:17

Attached is a screenshot for your reference. Any idea why this is happening?

What is considered the best simplification result?

Which of these is the best general purpose model to use for text simplification?
Is there any way to improve the result grammatical quality?
Thank you!

Inputs:

A jet aircraft is an aircraft propelled by jet engines.
A jet engine is a reaction engine discharging a fast-moving jet that generates thrust by jet propulsion.
An engine or motor is a machine designed to convert one form of energy into mechanical energy.
An aircraft is a machine that is able to fly by gaining support from the air.

Outputs:

/root/NeuralTextSimplification/results_NTS/result_NTS_epoch11_10.19.t7_5
A jet aircraft is an aircraft propelled by jet engines. .
A jet engine is a reaction engine .
An engine or motor is a machine designed to convert one form of energy into mechanical energy. .
An aircraft is a machine that is able to fly by gaining support from the air. .

/root/NeuralTextSimplification/results_NTS/result_NTS_epoch11_10.19.t7_5_h1
A jet aircraft is an aircraft propelled by jet engines. .
A jet engine is a reaction engine .
An engine or motor is a machine designed to convert one form of energy into mechanical energy. .
An aircraft is a machine that is able to fly by gaining support from the air. .

/root/NeuralTextSimplification/results_NTS/result_NTS_epoch11_10.19.t7_5_h2
A jet aircraft is an aircraft by jet aircraft .
A jet engine is a reaction engine that generates thrust by jet engines .
An engine or motor is a machine designed to convert one form of energy into mechanical energy. into mechanical energy. .
An aircraft is a machine that is able to get support from the air. .

/root/NeuralTextSimplification/results_NTS/result_NTS_epoch11_10.19.t7_5_h3
A jet aircraft is an aircraft propelled by jet engines. by jet engines. .
A jet engine is a jet engine that generates thrust by jet engines .
An engine or motor is a machine designed to convert one form of energy into mechanical energy. ( mechanical energy. ) into mechanical energy. .
An aircraft is a machine that is able to fly by gaining support from the air. air. .

/root/NeuralTextSimplification/results_NTS/result_NTS_epoch11_10.19.t7_5_h4
A jet aircraft is an aircraft that is propelled by jet engines. .
A jet engine is a small reaction engine that generates thrust by jet engines .
An engine or motor is a machine designed to convert one form of energy into mechanical energy. ( mechanical energy. ) .
An aircraft is a machine that is able to run by gaining support from the air. .

Predictions on Validation Set

Hi,
I was able to find the predictions of the models in the test set, but I was wondering if you could make available the predictions of the models (or even just the best one) on the validation set.

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.