Coder Social home page Coder Social logo

sekharvth / simple-chatbot-keras Goto Github PK

View Code? Open in Web Editor NEW
34.0 2.0 13.0 30 KB

Design and build a chatbot using data from the Cornell Movie Dialogues corpus, using Keras

License: MIT License

Python 100.00%
keras chatbot python cornell-corpus-dataset glove-vectors lstm-neural-networks lstm encoder-decoder-model language-generation chatbot-keras word-level-lstm

simple-chatbot-keras's Introduction

Chatbot using Keras

Design and build a simple chatbot using data from the Cornell Movie Dialogues corpus, using Keras

Most of the ideas used in this model comes from the original seq2seq model made by the Keras team. It also serves as a brillant tutorial on the working of the architecture, and how it is developed: https://blog.keras.io/a-ten-minute-introduction-to-sequence-to-sequence-learning-in-keras.html

In short, the input sequence (the question asked to the chatbot) is passed into the encder LSTM, which outputs the final states of the encoder LSTM. These final states are passed into the decoder LSTM, along with the output sequence (the reply for the question, in the training data). The output of this decoder LSTM is the same as the actual reply, but shifted one time step to the left. That is, if the reply (aka, the input to the decoder lstm) is 'I am fine', the output for first time step with input 'I' will be 'am', the input for the second time step will be 'am', with output 'fine', and so on.

In the inference mode, the 'BOS'(beginning of sentence) tag is the initial input to the decoder lstm, along with the final encoder states of the encoder lstm (obtained after passing new query into the encoder lstm). The output of this time step is used as input for the next time step, along with cell states of the current time step. This process repeats till 'EOS' tag (end of sentence) is generated.

But the model in the page above uses a character level model, which at first puzzled me, especially when most of the literature on the subject overwhelmingly adopted word level models. However, when I started with the word level model, I quickly found why the Keras team opted for the char level model.

When using word level models, the vocabulary (no. of unique words) of the enire data set (the Cornell Movie Dialogues corpus in this case) would be more than 50,000. And the number of examples for training amounted to ~300k (150000 pairs). When defining the outputs to the decoder lstm in the decoder model, the shape would be (num_examples, max_length_of_sentences, vocab_size). This would in effect, mean (150000, 20, 50000), which would raise memory errors. When using the char level, instead of 50,000 for the vocab_size, it would reduce to something in the range of 70-80(26 for lowercase alphabets, 26 uppercase, 10 digits, unique symbols like '!', '?' etc), which would have better chances of going through without too many memory constraints. The downside is that it will take an insane amount of epochs to converge, and can only be done on a powerful GPU, which is beyond my current capabilities.

The model shown here is the simplest of models, and for further improvement (definite requirement), more tweaking has to be done (increase the number of LSTM layers, introduce Dropout, play around with the optimizers etc)

simple-chatbot-keras's People

Contributors

sekharvth avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

simple-chatbot-keras's Issues

The Glove file is missing

Traceback (most recent call last):
  File "chatbot_training.py", line 56, in <module>
    words, word_to_vec_map = read_glove_vecs('.../data/glove.6B.50d.txt')
  File "chatbot_training.py", line 44, in read_glove_vecs
    with open(file, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: '.../data/glove.6B.50d.txt'

ValueError

Traceback (most recent call last):
  File "chatbot_training.py", line 119, in <module>
    output = dense(vocab_size, activation = 'softmax')(decoder_lstm)
  File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow\python\keras\engine\base_layer.py", line 982, in __call__
    self._maybe_build(inputs)
  File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow\python\keras\engine\base_layer.py", line 2643, in _maybe_build
    self.build(input_shapes)  # pylint:disable=not-callable
  File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow\python\keras\layers\wrappers.py", line 173, in build
    'with at least 3 dimensions, received: ' + str(input_shape))
ValueError: `TimeDistributed` Layer should be passed an `input_shape ` with at least 3 dimensions, received: []

MemoryError

when i try to run chatbot_training.py, it show me a MemoryError on line 71.how to fix it, ple?

Memory Error

What should I do in code to overcome from memory error?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.