Coder Social home page Coder Social logo

fusedchat's Introduction

Overview

FusedChat is an inter-mode dialogue dataset. It contains dialogue sessions fusing task-oriented dialogues (TOD) and open-domain dialogues (ODD). Based on MultiWOZ, FusedChat appends or prepends an ODD to every existing TOD. See more details in the paper.

Updates

09/19/2021 Dataset released.

02/23/2022 Dataset was further augmented and reorganized.

04/10/2022 Added author-trained checkpoints, baseline code and evaluation code.

Code

Context classification models

run_prepare_classification_data.py Prepare context classification data.

set --context_type to last_turn or multi_turn to generate the last-turn or multi-turn data respectively.

run_train_context_classification_model.py Train the cross-encoder-based classifier.

run_test_context_classification_model.py Test the cross-encoder-based classifier.

Response generation models

You have to generate the data first using the 3 scripts below before evaluation. We overloaded the training scripts with data generation purposes. Each mode has its own data format.

run_train_tod_single.py Train the TOD (single mode) model. This model is trained on FusedChat data where the response is in the TOD mode. Setting only_generating_data to 'yes' will only generate the data (tokenized dataset and tensor cache).

run_train_chitchat_single.py Train the chitchat (or ODD, single mode) model. This model is trained on FusedChat data where the response is in the ODD mode. Setting only_generating_data to 'yes' will only generate the data (tokenized dataset and tensor cache).

run_train_fused.py Train the fused model. This model is trained on all FusedChat data. Setting only_generating_data to 'yes' will only generate the data (tokenized dataset and tensor cache).

run_evaluate_classification_based.py Evaluate the classification-based response generation models.

run_evaluate_fused.py Evaluate the two-in-one response generation models.

run_evaluate_ppl_classification_based.py Evaluate perplexity in a mode-aware manner. Specifically, the negative log-likelihood of each token is modified by the probablity of determining the correct mode of the response (according to the classifier).

run_evaluate_ppl_fused.py Evaluate perplexity in a mode-aware manner. Specifically, the negative log-likelihood of each token is modified by the probablity of determining the correct mode of the response (according to token generation).

Author-trained checkpoints

Download the following checkpoint files here.

(1) fused.zip checkpoint file for the fused (two-in-one) model. Put under runs/.

(2) tod_single.zip checkpoint file for the TOD (single mode) model. Put under runs/.

(3) chitchat_single.zip checkpoint file for the ODD (chitchat, single mode) model. Put under runs/.

(4) last_turn.mdl checkpoint file for the context classification model (last-turn) model. Put under cls_models/.

(5) multi_turn.mdl checkpoint file for the context classification model (multi-turn) model. Put under cls_models/.

References

@article{young2021fusing,
  title={Fusing task-oriented and open-domain dialogues in conversational agents},
  author={Young, Tom and Xing, Frank and Pandelea, Vlad and Ni, Jinjie and Cambria, Erik},
  journal={arXiv preprint arXiv:2109.04137},
  year={2021}
}

Baseline Performance

Mode classification accuracy

Context option Accuracy
Single-turn 0.993
Multi-turn 0.995

Inter-mode dialogue evaluation (on full FusedChat testset)


Models
TOD metrics ODD metrics
Slot Accuracy Joint SA Inform Inform_mct Success Success_mct BLEU PPL Sensibleness Specificity SSA
Two-in-one model 0.972 0.592 70.4 90.1 57.0 72.7 12.05 10.49 0.52 0.47 0.50
Classification-based model 0.973 0.600 75.1 90.8 60.9 74.4 12.17 10.50 0.58 0.51 0.55

Here we additionally report inform_mct and success_mct. MCT stands for "multi-choice tolerant". "multi-choice tolerant" evaluation ignores the requisite of generating entity names. We think this may better measure the accuracy of the model because sometimes the model may choose to ask for additional restraints, instead of directly providing a recommendation, as shown in the example below.

user: I need some time in the sun, can you help me find a park to visit?

system (groundtruth): Cherry Hinton Water Play is in the east and is free admission.

system (model): Yes I have several parks in the city. What area are you looking for?

Under traditional evaluation, the model's response is considered a failure because the entity's name is never mentioned. However, it recognized the dialogue state correctly and the dialogue flow is normal. Under MCT evaluation, the model's response is considered a success because explicitly mentioning an entity name is no longer considered a requisite.

Credits

We would like to thank the numerous creators who contributed to FusedChat. We thank Lu Cheng for creating the data collection interface and Low Shi Min and Arya Shashwat for quality control. We thank Peter Young for communication with the creators and dialogue assignment. The code for baselines is based on NeuralPipeline. The code for evaluation is MultiWOZ_Evaluation.

fusedchat's People

Contributors

tomyoung903 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

senticnet fxing79

fusedchat's Issues

some questions

Excuse me, Mr Tom. I have several questions that I want to ask you again.

  1. what do 'cs' and 'dp' mean in your code? Maybe they mean 'dialog state' and 'dialog act'?
    And what does 'dc' mean like "food:"?
    chitchat_double: [[bos]] + history + [[sor] + [chitchat] + reply +([eos] if with_eos else [])]

tod_double: [[bos]] + history + [[sor] + [cstok] + cs + dp + reply +([eos] if with_eos else [])]

  1. In the lanuage sequence, why '' is needed when there is already ''?

  2. How does the GPT2 model learn the language sequence?Only throw the processed sequence like ('input_ids'..'token_type_ids'..) in the model and it will learn itself due to the features of GPT2?

An error in PMUL3416

There is an error in the total number of turns in "prepended.json"->PMUL3416. The correct data should be:
"system: Booking was successful, the total fee is 140.8 GBP payable at the station .\n Reference number is : DK26RL4O. Can I help you with anything else?\n", thus aligning with MultiWOZ2.4

Solved:RuntimeError: Could not infer dtype of NoneType

INFO:pytorch_transformers.modeling_utils:loading weights file .\gpt2\pytorch_model.bin

67 332
INFO:D:\python\FusedChat-main\util.py:Load tokenized dataset from cache at ./data_cache/fused_cache
100%|██████████| 8439/8439 [00:08<00:00, 1032.53it/s]
100%|██████████| 999/999 [00:01<00:00, 948.01it/s]
100%|██████████| 1000/1000 [00:01<00:00, 938.89it/s]
Traceback (most recent call last):
File "D:\python\FusedChat-main\train.py", line 409, in
train()
File "D:\python\FusedChat-main\train.py", line 280, in train
train_loader, val_loader, train_sampler, valid_sampler = get_data_loaders(args, tokenizer)
File "D:\python\FusedChat-main\train.py", line 164, in get_data_loaders
tensor = torch.tensor(dataset[input_name])
RuntimeError: Could not infer dtype of NoneType

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.