Coder Social home page Coder Social logo

aucson / sedst Goto Github PK

View Code? Open in Web Editor NEW
22.0 3.0 7.0 9.47 MB

code for CIKM'18 long paper, Explicit state tracking with semi-supervision for neural dialogue generation

License: MIT License

Python 100.00%
sedst dialogue-generation dialogue-state-tracking

sedst's Introduction

Explicit State Tracking with Semi-supervision for Neural Dialogue Generation

Code for CIKM'18 long paper: Explicit state tracking with semi-supervision for neural dialogue generation.

Paper on Arxiv

Requirements

The project was developed upon PyTorch 0.3.0 and now tested on PyTorch 0.4.0 and Python 3.6.

  • Experiments on task-orientend dialogues run well on a CPU.
  • Experiments on non-task-oriented dialogues need GPUs to run.

Dataset

Task oriented datasets:

Non-task oriented datasets:

Running Experiments

For running the model, run:

python [semi_sup_model.py|unsup_model.py] -mode [train|adjust|test] -model [camrest|kvret|ubuntu|jd] -c spv_proportion=XXX OTHER_CONFIGS_IN_CONFIG_PY=VALUE

For running metrics, run:

python metric.py -f RESULT_FILE -t [camrest|kvret]

For running embedding base metrics, please follow here

Directory

├───data
│   ├───CamRest676
│   ├───glove
│   ├───fasttext
│   ├───ubuntu
│   ├───jd
│   └───kvret
├───log
├───models
├───results
├───sheets
└───vocab

Use Case

  • Task oriented dialogue systems, such as restaurant reservation systems, where tracking user's intention and knowledge-base interactions are necessary, but you don't have enough(while still have some) labeled data for this.

  • Non-task oriented dialogue systems, such as technique question answering conversations, where all the domain knowledge is embedded in the corpus and does not require knowledge base interaction. User's intention is completely not annotated, but the model can still extract context-aware "state spans" which helps response generation and visualization for humans. For example:

Turn User input State span Response
0 I just realised nothing can access my mysql database anymore , it says cannot connect , why would this be ? I did set it to be remote remote would anymore mysql access you can use the mysql client to connect to the server
1 is it running ? running I says mysql access yes, I am running it now

For Reference

Please consider citing:

@inproceedings{jin2018explicit,
  title={Explicit State Tracking with Semi-supervision for Neural Dialogue Generation},
  author={Jin, Xisen and Lei, Wenqiang and Ren, Zhaochun and Chen, Hongshen and Liang, Shangsong and Zhao, Yihong and Yin, Dawei},
  year={2018},
  booktitle={CIKM}
}

% additionally for experiments on task-oriented datasets

@inproceedings{lei2018sequicity,
  title={Sequicity: Simplifying task-oriented dialogue systems with single sequence-to-sequence architectures},
  author={Lei, Wenqiang and Jin, Xisen and Kan, Min-Yen and Ren, Zhaochun and He, Xiangnan and Yin, Dawei},
  booktitle={ACL},
  year={2018}
}

Easter Egg: The last sentence of Section 6 in our ACL 2018 paper Sequicity

Misc

  • Some portion of the code is developed upon Sequicity.

sedst's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

sedst's Issues

Minor mistake with Evaluator.

In semi_sup_model.py, Line 153,
ev = CamRestEvaluator(cfg.result_path)
I think it should be:
if self.dataset == 'camrest':
ev = CamRestEvaluator(cfg.result_path)
elif self.dataset == 'kvret':
ev = KvretEvaluator(cfg.result_path)

Am I right?

question about the role of "shift"

Hi,

I'm confused about the role of function "shift" in your code. It seems that you put the generative probability of the z decoder backward one time-step through this function, which is later used to multiply with the z decoder output for the implicit copy score calculation. My question is why should the probability be put one step backward? Since the z decoder output at time t is used to calculate the probability of zt instead of z_(t-1), the shift operation is hard to understand.

However, when I tried to remove "shift", the performance dropped obviously and became much more unstable. Could you explain the principle behind this operation? Thanks!

Loss

Hi,

I tried to train the model with ubuntu dataset but the loss is very high at around 6-7 after 8,000 iterations. I keep the same settings but with small batch because I only have 1 GPU with limited memory. Can you please advise in this case? Thank you :)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.