Coder Social home page Coder Social logo

gmftbygmftby / whentospeak Goto Github PK

View Code? Open in Web Editor NEW
9.0 1.0 2.0 3.76 MB

The codes of our paper When to Talk: Chatbot Controls the Timing of Talking during Multi-turn Open-domain Dialogue Generation

Home Page: https://arxiv.org/abs/1912.09879

License: MIT License

Shell 2.64% Python 40.15% Jupyter Notebook 57.21%
hred seq2seq timing multi-turn open-domain dialogue generative gnn gcn gat

whentospeak's Introduction

WhenToTalk

Make the model decide when to utter the utterances in the conversation, which can make the interaction more engaging.

Model architecture:

  1. GCN for predicting the timing of speaking
    • Dialogue-sequence: Sequence of the dialogue history
    • User-sequence: User utterance sequence
    • PMI: Context relationship
  2. (Seq2Seq/HRED) for language generation
  3. Multi-head attention for dialogue context (use GCN hidden state)

Requirements

  1. Pytorch 1.2
  2. PyG
  3. numpy
  4. tqdm
  5. nltk: word tokenize and sent tokenize
  6. BERTScore 0.2.1

Dataset

Format:

  1. Corpus folder have lots of sub folder, each named as the turn lengths of the conversations.
  2. Each sub folder have lots of file which contains one conversation.
  3. Each conversation file is the tsv format, each line have four element:
    • time
    • poster
    • reader
    • utterance

Create the dataset

# ubuntu / cornell, cf / ncf. Then the ubuntu-corpus folder will be created
# ubuntu-corpus have two sub folder (cf / ncf) for each mode
./data/run.sh ubuntu cf

Metric

  1. Language Model: BLEU4, PPL, Distinct-1, Distinct-2
  2. Talk timing: F1, Acc
  3. Human Evaluation: Engaging evaluation

Baselines

1. Traditional methods

  1. Seq2Seq
  2. HRED / HRED + CF

2. Graph ablation learning

  1. w/o BERT Embedding cosine similarity
  2. w/o User-sequence
  3. w/o Dialogue-sequence

How to use

Generate the graph of the context

# generate the graph information of the train/test/dev dataset
./run.sh graph cornell when2talk 0

Analyze the graph context coverage information

# The average context coverage in the graph: 0.7935/0.7949/0.7794 (train/test/dev) dataset
./run.sh stat cornell 0 0

Generate the vocab of the dataset

./run.sh vocab ubuntu 0 0

Train the model (seq2seq / seq2seq-cf / hred / hred-cf):

# train the hred model on the 4th GPU
./run.sh train ubuntu hred 4

Translate the test dataset by applying the model

# translate the test dataset by applying the hred model on 4th GPU
./run.sh translate ubuntu hred 4

Evaluate the result of the translated utterances

# evaluate the translated result of the model on 4th GPU (BERTScore need it)
./run.sh eval ubuntu hred 4

Generate performance curve

./run.sh curve dailydialog hred-cf 0

Chat with the model

./run.sh chat dailydialog GatedGCN 0

Experiment Result

wait to do:
1. add GatedGCN to all the graph-based method
2. add BiGRU to all the graph-based method
3. refer the DialogueGCN to construct the graph
    * the complete graph in the **p** windows size
    * add one long edge out of the windows size to explore long context sentence
    * user embedding as the node for processing
4. Layers analyse of the GatedGCN in this repo and mutli-turn modeling
  1. Methods

    • Seq2Seq: seq2seq with attention
    • HRED: hierarchical context modeling
    • HRED-CF: HRED model with classification for talk timing
    • When2Talk: GCNContext modeling first and RNN Context later
    • W2T_RNN_First: BiRNN Context modeling first and GCNContext later
    • GCNRNN: combine the Gated GCNContext and RNNContext together (?)
    • GatedGCN: combine the Gated GCNContext and RNNContext together
      1. BiRNN for background modeling
      2. Gated GCN for context modeling
      3. Combine GCN embedding and BiRNN embedding, final embedding
      4. Low-turn examples trained without the GCNConv (only use the BiRNN)
      5. Separate the decision module and generation module is better
    • W2T_GCNRNN: RNN + GCN combine RNN together (W2T_RNN_First + GCNRNN)
  2. Automatic evaluation

    • Compare the PPL, BLEU4, Disctint-1, Distinct-2 score for all the models.

      Proposed classified methods need to be cascaded to calculate the BLEU4, BERTScore (the same format as the traditional models' results)

      Model Dailydialog Cornell
      BLEU Dist-1 Dist-2 PPL BLEU Dist-1 Dist-2 PPL
      Seq2Seq 0.1038 0.0178 0.072 29.0640 0.0843 0.0052 0.0164 45.1504
      HRED 0.1175 0.0176 0.0571 29.7402 0.0823 0.0227 0.0524 39.9009
      HRED-CF 0.1268 0.0435 0.1567 29.0111 0.1132 0.0221 0.0691 38.5633
      When2Talk 0.1226 0.0211 0.0608 24.0131 0.0996 0.0036 0.0073 32.9503
      W2T_RNN_First 0.1244 0.0268 0.0787 24.5056 0.1118 0.0065 0.0147 33.754
      GCNRNN 0.1250 0.0214 0.0624 25.8213 0.1072 0.0077 0.0188 33.9572
      W2T_GCNRNN 0.1246 0.0152 0.0400 23.4434 0.1107 0.0063 0.0142 34.4256
      GatedGCN 0.1231 0.0423 0.1609 27.1615 0.1157 0.0261 0.0873 34.4256
    • F1 metric for measuring the accuracy for the timing of the speaking, only for classified methods (hred-cf, ...). The stat data shows that the number of the negative label is the half of the number of the positive label. F1 and Acc maybe suitable for mearusing the result instead of the F1. In this settings, we care more about the precision in F1 metric.

      Model Dailydialog Cornell
      Acc F1 Acc F1
      HRED-CF 0.8272 0.8666 0.7708 0.8427
      When2Talk 0.7992 0.8507 0.7616 0.8388
      W2T_RNN_First 0.8144 0.8584 0.7481 0.8312
      GCNRNN 0.8176 0.8635 0.7598 0.8445
      W2T_GCNRNN 0.7565 0.8434 0.7853 0.8466
      GatedGCN 0.8226 0.8663 0.738 0.8181
  3. Human judgments (engaging, ...)

    Invit the volunteer to chat with these models (seq2seq, hred, seq2seq-cf, hred-cf,) and score the models' performance accorading to the Engaging, Fluent, ...

    • Dailydialog dataset

      Model When2Talk vs. kappa
      win(%) loss(%) tie(%)
      Seq2Seq
      HRED
      HRED-CF
    • Cornell dataset

      Model When2Talk vs. kappa
      win(%) loss(%) tie(%)
      Seq2Seq
      HRED
      HRED-CF
  4. Graph ablation learning

    • F1 accuracy of predicting the speaking timing (hred-cf,)
    • BLEU4, BERTScore, Distinct-1, Distinct-2

whentospeak's People

Contributors

gmftbygmftby avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

whentospeak's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.