UniCMR

This is the source code for the paper Towards End-to-End Open Conversational Machine Reading.

(Codes are being further cleaned and updated)

1. Datasets

Please refer to MUDERN and OSCAR for preparing the OR-CMR raw datasets under the folder ./data and then begin the following processing steps.

2. Discourse Segmentation

For convenience, we make a discourse segmented version of our rule text knowledge base beforehand.

Requirement

Pytorch==0.4.1
NLTK==3.4.5
numpy==1.18.1
pycparser==2.20
six==1.14.0
tqdm==4.44.1

Instruction

Run cd segedu
Run pip install -r requirements.txt
Run python open_sharc_discourse_segmentation.py

3. TF-IDF Retrieval

For convenience, we make a retrieved rule texts for every single rule text beforehand.

Requirement

numpy
scikit-learn
regex
tqdm
Scipy
NLTK
elasticsearch
pexpect==4.2.1

Instruction

Run pip install -r requirements.txt

Build Sqlite DB via:

Here base_dir=./data

db_path =./data/sharc_raw/json/sharc_open_id2snippet.json

mkdir -p {base_dir}/tfidf
python3 build_db.py ${db_path} ${base_dir}/tfidf/db.db --num-workers 60`.

Run the following command to build TF-IDF index:
```
python3 build_tfidf.py ${base_dir}/tfidf/db.db ${base_dir}/tfidf
```
It will save TF-IDF index in ${base_dir}/tfidf
Run inference code to save retrieval results.
```
bash inference_tfidf.sh
```

4. Preprocess

Tokenize the user information and construct the dialogue tree.

Requirement

Python 3.6
Pytorch (1.6.0)
NLTK (3.4.5)
spacy (2.0.16)
transformers (4.3.2)

Instruction

Run cd ./UniCMR
Run pip install -r requirements.txt
Run bash preprocess.sh

5. Decision Making and Question Generation

Training and inference of our UniCMR.

Requirement

Python 3.6
Pytorch (1.6.0)
NLTK (3.4.5)
spacy (2.0.16)
transformers (4.3.2)

Instruction

Run cd ./UniCMR
Run pip install -r requirements.txt
Run bash run.sh

Acknowledgement

Part of our codes are borrowed from the codes of Open-Retrieval Conversational Machine Reading, many thanks.

kevinsrr / unicmr Goto Github PK

unicmr's Introduction

UniCMR

1. Datasets

2. Discourse Segmentation

Requirement

Instruction

3. TF-IDF Retrieval

Requirement

Instruction

4. Preprocess

Requirement

Instruction

5. Decision Making and Question Generation

Requirement

Instruction

Acknowledgement

unicmr's People

Contributors

Watchers

Recommend Projects

Recommend Topics

Recommend Org