Coder Social home page Coder Social logo

empatheticdialogues's Introduction

EmpatheticDialogues

PyTorch original implementation of Towards Empathetic Open-domain Conversation Models: a New Benchmark and Dataset (https://arxiv.org/abs/1811.00207).

We provide a novel dataset of 25k conversations grounded in emotional situations. The code in this repo demonstrates that automated metrics (P@1,100 and BLEU) are improved both when using candidates from our dataset and when fine-tuning on it.

This repo contains code for:

  • Transformer-based retrieval (pretraining, fine-tuning)
  • BERT-based retrieval (pretraining, fine-tuning)
  • Prepending classifier labels (e.g. EmoPrepend-1)

Dataset

To download the EmpatheticDialogues dataset:

wget https://dl.fbaipublicfiles.com/parlai/empatheticdialogues/empatheticdialogues.tar.gz

Models

To reproduce paper numbers, see the evaluation commands in the Commands section, and use the following trained models:

wget https://dl.fbaipublicfiles.com/parlai/empatheticdialogues/models/normal_transformer_pretrained.mdl  # Normal Transformer, pretrained
wget https://dl.fbaipublicfiles.com/parlai/empatheticdialogues/models/normal_transformer_finetuned.mdl  # Normal Transformer, fine-tuned
wget https://dl.fbaipublicfiles.com/parlai/empatheticdialogues/models/bert_pretrained.mdl  # BERT, pretrained
wget https://dl.fbaipublicfiles.com/parlai/empatheticdialogues/models/bert_finetuned.mdl  # BERT, fine-tuned
wget https://dl.fbaipublicfiles.com/parlai/empatheticdialogues/models/bert_finetuned_emoprepend1.mdl  # BERT, fine-tuned (EmoPrepend-1)
wget https://dl.fbaipublicfiles.com/parlai/empatheticdialogues/models/fasttext_empathetic_dialogues.mdl  # fastText classifier used for EmoPrepend-1

Dependencies

Versions given are what the code has been tested on.

Required

Optional

Commands

Transformer-based retrieval

Pretraining

python retrieval_train.py \
--batch-size 512 \
--cuda \
--dataset-name reddit \
--dict-max-words 250000 \
--display-iter 250 \
--embeddings ${REDDIT_EMBEDDINGS_PATH} \
--empchat-folder ${EMPATHETIC_DIALOGUES_DATA_FOLDER} \
--learn-embeddings \
--learning-rate 8e-4 \
--model transformer \
--model-dir ${TRAIN_SAVE_FOLDER} \
--model-name model \
--n-layers 4 \
--num-epochs 10000 \
--optimizer adamax \
--reddit-folder ${REDDIT_DATA_FOLDER} \
--transformer-dim 300 \
--transformer-n-heads 6

Fine-tuning

python retrieval_train.py \
--batch-size 512 \
--cuda \
--dataset-name empchat \
--dict-max-words 250000 \
--display-iter 250 \
--empchat-folder ${EMPATHETIC_DIALOGUES_DATA_FOLDER} \
--learn-embeddings \
--learning-rate 8e-4 \
--load-checkpoint ${PRETRAINED_MODEL_PATH} \
--max-hist-len 4 \
--model transformer \
--model-dir ${TRAIN_SAVE_FOLDER} \
--model-name model \
--n-layers 4 \
--num-epochs 10 \
--optimizer adamax \
--reddit-folder ${REDDIT_DATA_FOLDER} \
--transformer-dim 300 \
--transformer-n-heads 6

Evaluation

# P@1,100
python retrieval_train.py \
--batch-size 512 \
--cuda \
--dataset-name empchat \
--dict-max-words 250000 \
--display-iter 250 \
--empchat-folder ${EMPATHETIC_DIALOGUES_DATA_FOLDER} \
--max-hist-len 4 \
--model transformer \
--model-dir ${EVAL_SAVE_FOLDER} \
--model-name model \
--n-layers 4 \
--optimizer adamax \
--pretrained ${TRAIN_SAVE_FOLDER}/model.mdl \
--reactonly \
--transformer-dim 300 \
--transformer-n-heads 6

# BLEU (EmpatheticDialogues context/candidates)
python retrieval_eval_bleu.py \
--empchat-cands \
--empchat-folder ${EMPATHETIC_DIALOGUES_DATA_FOLDER} \
--max-hist-len 4 \
--model ${TRAIN_SAVE_FOLDER}/model.mdl \
--name model \
--output-folder ${EVAL_SAVE_FOLDER} \
--reactonly \
--task empchat

BERT-based retrieval

Pretraining

python retrieval_train.py \
--batch-size 256 \
--bert-dim 300 \
--cuda \
--dataset-name reddit \
--dict-max-words 250000 \
--display-iter 100 \
--embeddings None \
--empchat-folder ${EMPATHETIC_DIALOGUES_DATA_FOLDER} \
--learning-rate 6e-5 \
--model bert \
--model-dir ${TRAIN_SAVE_FOLDER} \
--model-name model \
--num-epochs 10000 \
--optimizer adamax \
--reddit-folder ${BERT_TOKENIZED_REDDIT_DATA_FOLDER}

Fine-tuning

python retrieval_train.py \
--batch-size 256 \
--bert-dim 300 \
--cuda \
--dataset-name empchat \
--dict-max-words 250000 \
--display-iter 100 \
--embeddings None \
--empchat-folder ${EMPATHETIC_DIALOGUES_DATA_FOLDER} \
--learning-rate 1e-5 \
--load-checkpoint ${PRETRAINED_MODEL_PATH} \
--max-hist-len 4 \
--model bert \
--model-dir ${TRAIN_SAVE_FOLDER} \
--model-name model \
--num-epochs 100 \
--optimizer adamax \
--stop-crit-num-epochs 10

Evaluation

# P@1,100
python retrieval_train.py \
--batch-size 256 \
--bert-dim 300 \
--cuda \
--dataset-name empchat \
--dict-max-words 250000 \
--display-iter 100 \
--embeddings None \
--empchat-folder ${EMPATHETIC_DIALOGUES_DATA_FOLDER} \
--max-hist-len 4 \
--model bert \
--model-dir ${EVAL_SAVE_FOLDER} \
--model-name model \
--optimizer adamax \
--pretrained ${TRAIN_SAVE_FOLDER}/model.mdl \
--reactonly

# BLEU (EmpatheticDialogues context/candidates)
python retrieval_eval_bleu.py \
--bleu-dict ${PATH_TO_MODEL_WITH_TRANSFORMER_DICT} \
--empchat-cands \
--empchat-folder ${EMPATHETIC_DIALOGUES_DATA_FOLDER} \
--max-hist-len 4 \
--model ${TRAIN_SAVE_FOLDER}/model.mdl \
--name model \
--output-folder ${EVAL_SAVE_FOLDER} \
--reactonly \
--task empchat

Note: we pass in a separate dictionary (--bleu-dict) in order to use the same tokenization when calculating the BLEU of both Transformer and BERT models. For this, you can use the pretrained normal Transformer model listed in the Models section above.

EmoPrepend-1

Add the following flags when calling retrieval_train.py or retrieval_eval_bleu.py:

--fasttext 1 \
--fasttext-path ${PATH_TO_TRAINED_FASTTEXT_MODEL} \
--fasttext-type emo

For ${PATH_TO_TRAINED_FASTTEXT_MODEL}, you can pass in the fastText classifier in the Models section above.

References

Please cite [1] if you found the resources in this repository useful.

Towards Empathetic Open-domain Conversation Models: a New Benchmark and Dataset

[1] H. Rashkin, E. M. Smith, M. Li, Y. Boureau Towards Empathetic Open-domain Conversation Models: a New Benchmark and Dataset

@inproceedings{rashkin2019towards,
  title = {Towards Empathetic Open-domain Conversation Models: a New Benchmark and Dataset},
  author = {Hannah Rashkin and Eric Michael Smith and Margaret Li and Y-Lan Boureau},
  booktitle = {ACL},
  year = {2019},
}

License

See the LICENSE file in the root repo folder for more details.

empatheticdialogues's People

Contributors

ericmichaelsmith avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

empatheticdialogues's Issues

Query about additional column in valid and test

Hello, I was going through the dataset and I noticed that the valid and test sets have an additional column containing candidate responses. I don't see this additional column being referenced in the dataloader.

The original paper mentions that the candidates were sampled from three sources - the training set, dailydialogs and reddit conversations and the validation code seems to do exactly that.

I'm not sure if I'm misreading the code or if this repo's evaluation code does not represent the current state of how training and evaluation is done for this dataset. Can I get some clarity on this? Thanks in advance for your help!

Logical bug?

This code looks illogical.

If new_opt doesn't have fasttext set new_opt.fasttext on saved_opt.

if not (hasattr(new_opt, "fasttext")):
    setattr(saved_opt, "fasttext", new_opt.fasttext)

if not (hasattr(new_opt, "fasttext")):

Emotion Classifiers

Hi,
I would like to ask in which dataset the classifier was pretrained and what was the architecture of the classifier.

Generative model comparison on Full-Transformer and GPT2 both fine-tuned by ED

Hi,
I have a problem for the generative model. In the paper, you used the full-transformer model for pre-train and fine-tune. I am wondering another generative model - GPT2. Since you haven't released the fine-tuned full-transformer generative model and I don't have enough resources to replicate your outcome for comparison, I would like to ask:

From your perspective, if I use ED to fine-tune GPT2 model, what will be the performance for that(both automated metric and human ratings)? will there be a sacrifice compared with full-transformer model? Since there is no encoder parts in GPT2, but there are multiple layer of decoders. Thanks

how do you evaluate this case?

Thanks for reading this post~
I wonder how you design your human-evaluation standard?
Here is a case that responsor give a responce in very empathetic way but totally not at the point.
A: Oh,Shit!I fell down just now,and hurt myself.
B:It must be horrible for you to stand under the sun at such a hot day.
Will this dialogue get high empathy score in human evaluation?

About the Generative Prepend Models

Hi,

I was wondering if the generative prepend models (EmoPrepend and TopicPrepend) involve any pre-trained BERT weights? From my understanding, it seems that you first trained the prepend models on Reddit and then fine-tuned them on ED, right? And for prepend models, you only experimented with 4-layer transformers but not 5-layer (denoted as "Large" in the paper)?

Another related question would be, when you trained the prepend models on Reddit, you still predicted the labels based on the input context and prepended them in the front, am I correct?

Thanks for your time!
Yubo

The Emotion Classifier

Hi there,

I have a few questions regarding the emotion classifier in the EmoPrepend model:

  1. Is the pre-trained classifier available?
  2. What data did you use to train the emotion classifier? Is it the context-prompt pairs in the ED dataset?
  3. It is mentioned in the paper that history utterances are concatenated into one and sent to the encoder, so in the case of EmoPrepend, is it correct that the input has the following format: [emotion_label_1] utterance_1 [emotion_label_2] utterance_2 ...?

Thanks for your response!

Generation results ?

Since there are still no generation codes released, it would be nice to have some generation results.
Could you provide some generation results of the baseline models?

P100@1 is too low in retrieval model

Hi, I construct my dataset for train, valid and test split for multi-turn response selection task:
for a session: a, b, c, d, e, f
get dataset using this way(for reactonly, randomly sample 99 negative responses for one positive sample):
a, b
a, b, c, d
a, b, c, d, e, f
I conduct experiment using interaction-based bert model (easy to conduct). Concretely, I concatenate context and response, and find this task is too easy as P100@1, MAP is 0.9~. There is a huge difference between 0.9~ and your result 0.5~. I think there shouldn't be such a large difference between biencoder-type and concatenate-type results. But I cann't explain it. Could you analysis the reason? Thanks.

Files in the Reddit_data_folder

Hi, May I know what files should be present in the REDDIT _DATA_FOLDER and what are the formats of those files? so that it becomes easy for me to convert raw Reddit dataset into required files necessary for pre-training the model.

How do you decide these 32 emotion labels?

I am kind of confused about where the 32 emotion labels from. Because some emotions are really close to another, I try to find some resources to find the difference between them. I read all of the references mentioned in the image below but I failed to find all of the 32 emotions in these papers.
Would you provide some information about this?

image

The Reddit Dataset

Hi! Is there a way to obtain the 1.7B Reddit dataset? Thanks!

Best,
Yubo

The retrieval-based model

Hi, I have several questions regarding retrieval-based model

1. How do you get 100 candidates at inference time in calculating P@1, 100
2. At training time, you use all of the utterances from the batch as candidates to minimize the negative log-likelihood of selecting the correct candidate. Why not sample negative examples of a certain proportion. For example, sample 9 negative examples for one positive example. Did you compare these two methods?

Looking forward to your reply.
Best wishes

The interpretation of selfeval in the dataset

What's the interpretation of the selfeval field in the dataset?

For example, what's the meaning of 4|3|4_3|5|5 in

hit:1_conv:2,1,afraid, i used to scare for darkness,2, it feels like hitting to blank wall when i see the darkness,4|3|4_3|5|5,

Thanks

Problem with empatheticdialogues/valid.csv

Hello, I fine-tuned your model and now I would like to evaluate it, to have a feeling of how it works. When I give the command

python retrieval_train.py \
--batch-size 32 \
--bert-dim 300 \
--cuda \
--dataset-name empchat \
--dict-max-words 250000 \
--display-iter 100 \
--embeddings None \
--empchat-folder ./empatheticdialogues/ \
--max-hist-len 4 \
--model bert \
--model-dir ./test1 \
--model-name model \
--optimizer adamax \
--pretrained ./models/bert_finetuned_emoprepend1.mdl \
--reactonly

I get the following error:


Loading dictionary from None
Traceback (most recent call last):
  File "retrieval_train.py", line 297, in <module>
    main(opt)
  File "retrieval_train.py", line 262, in main
    valid_data = env.build_valid_dataloader(False)
  File "/home/ubuntu/Documents/EmpatheticDialogues/empchat/datasets/loader.py", line 187, in build_valid_dataloader
    fasttext_path=self.opt.fasttext_path,
  File "/home/ubuntu/Documents/EmpatheticDialogues/empchat/datasets/empchat.py", line 84, in __init__
    df = open(os.path.join(data_folder, f"{splitname}.csv")).readlines()
FileNotFoundError: [Errno 2] No such file or directory: '/checkpoint/ems/empathetic_dialogue/standalone/s2019_06_12__reproducing_results/bert_finetuning__emoprepend1/empatheticdialogues/valid.csv'

How do you suggest to fix this? Thanks!

how do you decide uids of a comment and which will be 6?

the original name and parent_id of a commemt are all a str like "t1_XXX", and when will it be 6 as a deleted comment?
if "bert_tokenizer" in dict_: self.using_bert = True assert BERT_ID == "bert-base-cased" deleted_uid = -1 else: self.using_bert = False deleted_uid = 6

Issues in the dataset

When I tried to load the train.csv, I observed these errors:

  1. Rows 58466, 2355, 37523, 67237 have the same text for 'prompt' and 'utterance' columns. The real text for utterance is in a wrong column.
  2. Due to these wrong indentations, pandas always throws a error as these rows have unequal number of columns with the rest.

I haven't checked for valid.csv and test.csv. please, fix these in the files.

Thank you

ED architecture?

As I understand, ED used

  • bert tokenizer to embed
  • and use embedding output as bert encoder input,
  • bert encoder try to minimized negative loglikelihood of y* and y^ , in this case, y^ is the responses ground truth for each input y and x, y* is response predicted through bert encoder model? > is that right?

and another phase is generative base I marked it like a bert decoder - because bert doen't have a tokenizer decoder , so we train a transformer like a decoder to get a sentence from bert encoder output?

I also mention before that transformer has many architure right now (huggingface), so it makes confuse to everybody come up with this method.
Hope you answer these questions

RuntimeError: a view of a leaf Variable that requires grad is being used in an in-place operation.

In google colab this command raises exception.

!python retrieval_train.py --batch-size 32 --bert-dim 300 --cuda --dataset-name empchat --dict-max-words 250000 --display-iter 100 --embeddings None --empchat-folder ./empatheticdialogues/ --max-hist-len 4 --model bert --model-dir ./test1 --model-name model --optimizer adamax --pretrained bert_finetuned_emoprepend1.mdl --reactonly

Traceback (most recent call last):
  File "retrieval_train.py", line 297, in <module>
    main(opt)
  File "retrieval_train.py", line 254, in main
    net, dictionary = load_model(opt_.pretrained, opt_)
  File "/content/EmpatheticDialogues/empchat/models.py", line 74, in load
    net = create(saved_opt, word_dict["words"])
  File "/content/EmpatheticDialogues/empchat/models.py", line 81, in create
    return BertAdapter(opt, dict_words)
  File "/content/EmpatheticDialogues/empchat/bert_local.py", line 61, in __init__
    embeddings.weight[token_idx] = rand_embedding
RuntimeError: a view of a leaf Variable that requires grad is being used in an in-place operation.

Found fix at https://stackoverflow.com/questions/49161652/how-to-get-around-in-place-operation-error-if-index-leaf-variable-for-gradient-u

Notebook is here gist

Excuse me, there is somthing I feel confused

I don't really understand the relationship betwwen hit id and conv id, and can you explain the case in dataset below :
image
I think in hit 1: conv 3, speaker and listener should talk about "I showed a guy how to run a good bead in welding class and he caught on quick" with "proud" emotion

Why the train file smaller than test file?

I have read the ED article, the train/valid/test conversations is 19533 / 2770 / 2547. But the train file is 16.9M. the valid, test file is about 36M. I want to know what make the contradiction between the conversations number and the file size. Any response will help me a lot.

Issue with BertTokenizer

Hi, I am getting after running the retrieval command I am getting the error:
File "/Users/ikram/opt/anaconda3/lib/python3.7/site-packages/pytorch_pretrained_bert/tokenization.py", line 109, in tokenize
if self.do_basic_tokenize:
AttributeError: 'BertTokenizer' object has no attribute 'do_basic_tokenize'

However I cant locate the file to make requisite changes. What would you advise? Thanks!

chunk.pth in Reddit dataset

Hi, may I know how to convert the raw data in Reddit dataset to chunk.pth loaded in reddit.py? I have downloaded reddit dataset, but I have no idea how to process the raw data so that this raw data can work in RedditDataset class in reddit.py.

I have checked the issue, but I still can not understand how to deal with the format in the required file.

Reddit Data

Hello since you can't provide the dataset, in what format should we prepare the data from reddit to match the program requirements

The retrieval-based method

Hi, I have several questions regarding retrieval-based model

1. How do you get 100 candidates at inference time in calculating P@1, 100
2. At training time, you use all of the utterances from the batch as candidates to minimize the negative log-likelihood of selecting the correct candidate. Why not sample negative examples of a certain proportion. For example, sample 9 negative examples for one positive example. Did you compare these two methods?

Looking forward to your reply.
Best wishes

data file

Hello, I recently read your paper. Your work is very meaningful for the dialogue system, so I want to track your work. However, due to my limited technical level, I have some questions about the code and hope to get your advice.
When using your code, how should the files in the data folder mentioned in the parameters be obtained? For example, how does "word dictionary" in "reddit dir" get?

Download pretrained models?

Can we download pretrained models to use them for inference?
(Also what is time-to-response for this?)
As i see in paper, it takes 2-3 days to train myself

Query regarding data preparation.

Hi @EricMichaelSmith, I was going through the data preparation for ED dataset in empchat.py file and found that the speaker utterance is also taken as label with prev conversations utterances as context. I am a little confused as to why would you take speaker utterance as a label when you want a response in listener role only?

Is data preparation different for generation and retrieval tasks?

Please clarify this.
Thank you.

about details of dataset

Hi, I'm confused about the dataset since there is no readme file. I found the column names are the following:

conv_id | utterance_idx | context | prompt | speaker_idx | utterance | selfeval | tags

does 'context' means the sentiment of each utterence ? And what do 'prompt', 'selfeval' and 'tags' mean ?

thanks,

Code for generative transformer

Do you have any plans to release the code for the generative model using the Transformer. I trained the full Transformer on Reddit dataset but got random responses. I got low cross-entropy loss for my validation set, so I don't know why is the case.

Thanks,
Peixiang

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.