bme-chatbots / dialogue-generation Goto Github PK

Generating responses with pretrained XLNet and GPT-2 in PyTorch.

License: MIT License

Python 84.90% Cython 15.10%

chatbot dialogue-generation xlnet gpt2 apex topical-chat

dialogue-generation's Introduction

Dialogue generation

Implementation of a neural dialogue generator model with pretrained XLNet Yang et al. (2019) and GPT2 architecture Radford et al. (2019) on currently three datasets: DailyDialog Li et al. (2017) , PersonaChat Zhang et al. (2018) and the new TopicalChat Gopalakrishnan et al. (2019) from Alexa Prize Socialbot Grand Challenge 3. Top-k sampling Fan et al. (2018) and nucleus decoding Holtzman et al. (2019) are available as decoding techniques. The training objective is autoregressive language modeling on the utterances and dialogue histories.

Installation

The model can leverage mixed precision training from nvidia/apex. Note that apex is not required and is only used if it is available. For installation guide see the official instructions. Using this module is not useful for all GPUs ( only Volta and Turing ) and you should check in prior if your instance supports mixed precision training.

To train the model clone this repository and install dependecies. The project uses cython to assemble batches for faster input pipeline. It also preferred to use a python virtualenv.

git clone https://github.com/bme-chatbots/dialogue-generation.git

cd dialogue-generation

pip install -r requirements.txt

python setup.py build_ext --inplace

Training

The following command will start training on a single GPU/CPU with gpt2-medium model on PersonaChat. --name is the name of the subdirectory in the model folder, where logs and checkpoints are saved.

python -m src.train --model gpt2-medium --data personachat --name my_test_run

For distributed multi-gpu training the train script should be called like this.

python -m torch.distributed.launch --nproc_per_node=NUM_GPUS src/train.py --model gpt2

You can also use predefined configs by passing the path of the config json file as --config argument. These are available in src/configs folder and their training results can be seen below the results section.

python -m src.train --config src/configs/xlnet-dailydialog.json

Training the model is fast and easy on Google Colaboratory or Kaggle kernel. It is important to set the runtime type to GPU with the new Tesla P100 or Tesla T4 unit as it can fully leverage mixed-precision training and is much faster than the older Tesla K80 version. You can check the current type by running !nvidia-smi in a cell of your colab.

As a shortcut here is a complete example gist, which you can simply import to your Google Drive as a colaboratory file.

Copy and run the following code in a cell of your colab ( or Kaggle kernel ) file to install the model. If you use Kaggle kernel you also have to enable internet access.

!git clone https://github.com/bme-chatbots/dialogue-generation.git
!python -m pip install --upgrade pip

# installing apex is optional and is only useful if Colab's Tesla P100 or T4 is used
# !git clone https://github.com/NVIDIA/apex
# !cd apex; pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" .

# building the cython code and installing the required packages
!cd dialogue-generation; pip install -r requirements.txt; python setup.py build_ext --inplace

The training and validation metrics are logged to Tensorboard, which can also be tracked in the colab file if the below code is run before the training cell.

%load_ext tensorboard

%tensorboard --logdir "dialogue-generation/model"

The model can be trained then by simply running the train script with the default flags. You can see all flags accepted by the train.py script by providing -h flag.

!cd dialogue-generation; python -m src.train

After training the model can be downloaded by setting the download link in the following snippet to the one logged by the script after evaluation. ( Saving model to dialogue-generation/src/../model/gpt2/19.11.03-12:59:47/model.pt )

from IPython.display import FileLink

# note that in case of kaggle kernel you have to give path
# relative to your working directory
FileLink(r'dialogue-generation/src/../model/gpt2/19.11.03-12:59:47/model.pt')

Interaction

An interactive evaluation mode is available on the trained model by running the interact script and providing the path of the trained model with --model_file. You can also provide the --config file or just simply give the same --model and --name argument, which was used during training.

python -m src.interact --model gpt2-medium --name my_test_run

python -m src.interact --config src/configs/xlnet-dailydialog.json

Customization

To train any model on your own dataset you simply have to subclass from DialogDataset and implement data generation from the raw files. Given a train.txt, valid.txt and test.txt placed in data\<name of your data>, where each turn in a dialog is in a new line and separate dialogs are divided by an extra empty line.

├── data
|   ├── dailydialog
|   └── custom_dataset   # name of your data ( custom_dataset by default )
|       ├── train.txt    # the correctly formated train, valid and test files
|       ├── valid.txt
|       └── test.txt
├── src

Hello how are you?
Hi I'm fine thanks. And you?
Me too thanks for asking.

Hi my name Peter.
Nice to meet you I am Eric.

An example custom dataset class named CustomDataset is implemented in data.py that reads a dataset with these properties.

Results

The below dialog is sampled from gpt2 trained with default parameters for 7 epochs ( 149051 steps ) and --max_hist 3 on PersonaChat dataset.

User: hi

Bot: greetings, i'm gerald.

User: how are you ?

Bot: good, thanks. getting ready for another hike, i love hiking!

User: oh realy? where are you going?

Bot: pikes peak. i am curently exploring famous mountains when i have time.

User: that is cool are you going alone?

Bot: my brother is in the us, we go hiking with him every month.

User: how old are you by the way?

Bot: a little under 46 years old

User: are you in a relationship?

Bot: yes i am married, 3 boys, i am pregnant again. i am too busy to focus

The below dialog is sampled from gpt2 trained with default parameters for 8 epochs ( 48824 steps ) and --max_hist 3 on DailyDialog dataset.

User: hi how are you ?

Bot: fine. i got a promotion at work earlier this week.

User: where do you work ?

Bot: I work at the polling place.

User: where is that ?

Bot: I live in the dorms across the street.

User: do you like it ?

Bot: it's great. I heard it is far from here.

User: are you a student ?

Bot: yes. I am a student too.

dialogue-generation's People

Contributors

Stargazers

Watchers

Forkers

ecnuhp weiwancheng zhangjiekui vinwinter xeddmc biddwan09 h2k cyn7hia xuehuiping jsedoc mma1979 cameylon luomuqinghan aeonsolutions lukliz macaixia84 mehrdad-shokri kunyuni chriss-0x01

dialogue-generation's Issues

Sampling quality is bad.

I trained my model on high-quality data for 280,000 steps and during evaluation, the model had a really bad sense of the context and the generated replies were pretty poor. I'm gonna give a shot to https://github.com/nshepperd/gpt-2 to see if it's a problem in your code or something that I did. I thoough it would be good to share my experience.

Gradient overflow on distributed training. Training stuck.

Machine: AWS provided 8 Nvidia V100 GPU's.
Environment: https://docs.nvidia.com/deeplearning/frameworks/tensorflow-release-notes/rel_19.02.html

Selected optimization level O2:  FP16 training with FP32 batchnorm and FP32 master weights.

Defaults for this optimization level are:
keep_batchnorm_fp32    : True
loss_scale             : dynamic
patch_torch_functions  : False
cast_model_type        : torch.float16
enabled                : True
master_weights         : True
opt_level              : O2
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
keep_batchnorm_fp32    : True
loss_scale             : dynamic
patch_torch_functions  : False
cast_model_type        : torch.float16
enabled                : True
master_weights         : True
opt_level              : O2
2019-08-29 11:43:18,309 - INFO - {'max_epochs': 15, 'model_dir': '/Downloads/dialogue-generation/src/../model', 'top_k': 100, 'batch_size': 64, 'patience': 5, 'download_dir': '/Downloads/dialogue-generation/src/../data', 'max_hist': 4, 'max_len': 100, 'eval_every_step': 3000, 'grad_accum_steps': 2, 'model_name': 'gpt2-medium', 'mode': 'train', 'cuda': True, 'data_name': 'dailydialog', 'master_addr': '127.0.0.1', 'distributed': True, 'method': 'nucleus', 'master_port': '29500', 'num_devices': 8, 'top_p': 0.9, 'file_size': 100000, 'mixed': True, 'data_dir': '/Downloads/dialogue-generation/src/../data', 'learning_rate': 0.0001}
Train 0:   0%|                                                                                              | 0/468 [00:00<?, ?it/s]/usr/local/lib/python3.5/dist-packages/torch/distributed/distributed_c10d.py:100: UserWarning: torch.distributed.reduce_op is deprecated, please use torch.distributed.ReduceOp instead
  warnings.warn("torch.distributed.reduce_op is deprecated, please use "
/usr/local/lib/python3.5/dist-packages/torch/distributed/distributed_c10d.py:100: UserWarning: torch.distributed.reduce_op is deprecated, please use torch.distributed.ReduceOp instead
  warnings.warn("torch.distributed.reduce_op is deprecated, please use "
/usr/local/lib/python3.5/dist-packages/torch/distributed/distributed_c10d.py:100: UserWarning: torch.distributed.reduce_op is deprecated, please use torch.distributed.ReduceOp instead
  warnings.warn("torch.distributed.reduce_op is deprecated, please use "
/usr/local/lib/python3.5/dist-packages/torch/distributed/distributed_c10d.py:100: UserWarning: torch.distributed.reduce_op is deprecated, please use torch.distributed.ReduceOp instead
  warnings.warn("torch.distributed.reduce_op is deprecated, please use "
/usr/local/lib/python3.5/dist-packages/torch/distributed/distributed_c10d.py:100: UserWarning: torch.distributed.reduce_op is deprecated, please use torch.distributed.ReduceOp instead
  warnings.warn("torch.distributed.reduce_op is deprecated, please use "
/usr/local/lib/python3.5/dist-packages/torch/distributed/distributed_c10d.py:100: UserWarning: torch.distributed.reduce_op is deprecated, please use torch.distributed.ReduceOp instead
  warnings.warn("torch.distributed.reduce_op is deprecated, please use "
/usr/local/lib/python3.5/dist-packages/torch/distributed/distributed_c10d.py:100: UserWarning: torch.distributed.reduce_op is deprecated, please use torch.distributed.ReduceOp instead
  warnings.warn("torch.distributed.reduce_op is deprecated, please use "
/usr/local/lib/python3.5/dist-packages/torch/distributed/distributed_c10d.py:100: UserWarning: torch.distributed.reduce_op is deprecated, please use torch.distributed.ReduceOp instead
  warnings.warn("torch.distributed.reduce_op is deprecated, please use "
Train 0:   0%|▏                                                           | 1/468 [00:06<52:48,  6.79s/it, acc=0, loss=39.7, skip=0]Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16384.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16384.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16384.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16384.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16384.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16384.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16384.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16384.0
Train 0:   1%|▍                                                           | 3/468 [00:07<28:02,  3.62s/it, acc=0, loss=40.4, skip=0]Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 4096.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 4096.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 4096.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 4096.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 4096.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 4096.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 4096.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 4096.0
Train 0:   1%|▋                                                           | 5/468 [00:09<15:54,  2.06s/it, acc=0, loss=40.1, skip=0]Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 1024.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 1024.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 1024.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 1024.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 1024.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 1024.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 1024.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 1024.0
Train 0:   1%|▉                                                           | 7/468 [00:10<09:58,  1.30s/it, acc=0, loss=39.8, skip=0]Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 256.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 256.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 256.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 256.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 256.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 256.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 256.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 256.0
Train 0:   2%|█▏                                                          | 9/468 [00:11<07:05,  1.08it/s, acc=0, loss=40.5, skip=0]Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 128.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 128.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 128.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 128.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 128.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 128.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 128.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 128.0
Train 0:   3%|█▋                                                         | 13/468 [00:13<05:02,  1.50it/s, acc=0, loss=26.2, skip=0]Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 64.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 64.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 64.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 64.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 64.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 64.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 64.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 64.0
Train 0:   3%|█▉                                                         | 15/468 [00:14<04:38,  1.63it/s, acc=0, loss=24.6, skip=0]

Would version 2 support dialogue with bigger length?

Can version 2 support dialogue length bigger than 1024? Say 2000?

fp16 doesn't work with xlnet

When using xlnet models and fp16 is set to true in the config, fp16 is not used. With GPT2 models fp16 does work. Why is that?

Problem relating to max_hist or max_len

Hello, I'm getting this error which seems to be related either to max_hist or max_len. I can't point to which but changing them to lower values make training goes alright, and higher values (max_hist=10, max_len=1000, for example) raise the error.

/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = c10::Half, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [108,0,0], thread: [93,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = c10::Half, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [108,0,0], thread: [94,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = c10::Half, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [108,0,0], thread: [95,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
....
....

Training seems to continue regularly but I'm not sure if everything's sound.

Error related to latest changes in repo

I didn't get this error before some changes done to the repo. Trying to run the model I get this:

Traceback (most recent call last):
  File "./src/train.py", line 519, in <module>
    main()
  File "./src/train.py", line 287, in main
    model, optimizer = amp.initialize(
NameError: name 'amp' is not defined

I couldn't figure out how to solve this.

Feature Request: Gradient checkpointing to fit larger models.

Gradient checkpointing is implemented here: https://github.com/nshepperd/gpt-2
Maybe we can add this somehow so we can train larger models like gpt2-large?

Training is slow because apex is not recognized

Mixed precision is set to False because apex is not recognized as installed. Which make the training slow I think. I followed the commands in the README.

try:
    from apex import amp
    APEX_INSTALLED = True
    print("Yes")
except ImportError:
    APEX_INSTALLED = False
    print("No")

But still 'No' is getting printed here. Help debugging?

Error adding special tokens to GPT2 tokenizer

Using GPT2 model and running your script with default settings get me the error below, which is probably related to the issue discussed here: huggingface/transformers#799

Can you help solve this, please?

Using unk_token, but it is not set yet.
Using unk_token, but it is not set yet.
Using unk_token, but it is not set yet.
Using unk_token, but it is not set yet.
Using unk_token, but it is not set yet.
Traceback (most recent call last):
  File "./src/train.py", line 515, in <module>
    main()
  File "./src/train.py", line 257, in main
    args=args, device=device)
  File "/Downloads/GPT2/src/data.py", line 525, in create_dataset
    tokenizer = create_tokenizer(args)
  File "/Downloads/GPT2/src/data.py", line 498, in create_tokenizer
    'additional_special_tokens': special_tokens})
  File "/usr/local/lib/python3.6/dist-packages/pytorch_transformers/tokenization_utils.py", line 335, in add_special_tokens
    added_special_tokens = self.add_tokens(special_tokens_dict.values())
  File "/usr/local/lib/python3.6/dist-packages/pytorch_transformers/tokenization_utils.py", line 311, in add_tokens
    if self.convert_tokens_to_ids(token) == self.convert_tokens_to_ids(self.unk_token):
  File "/usr/local/lib/python3.6/dist-packages/pytorch_transformers/tokenization_utils.py", line 381, in convert_tokens_to_ids
    for token in tokens:
TypeError: 'NoneType' object is not iterable

Update GPT-2 model to 774M

12 hours ago, OpenAI released their GPT-2 774M pre-trained model through their commit on GitHub. Can you please update this repository to use that model? It is more powerful than the current GPT-2 355M pre-trained model.

Distributed training doesn't work.

At least using xlnet model. When using high max_len, it doesn't print any error just crashes. Training with 1 GPU works well. When setting low max_len I get the error below. I'm using 4 Nvidia V100.

Traceback (most recent call last):
  File "src/train.py", line 830, in <module>
    main()
  File "src/train.py", line 690, in main
    train_step(dummy_batch)
  File "src/train.py", line 566, in train_step
    loss, acc, ppl = forward_step(batch)
  File "src/train.py", line 556, in forward_step
    acc = reduce_tensor(acc)
  File "src/train.py", line 530, in reduce_tensor
    reduced = tensor.clone()
AttributeError: 'float' object has no attribute 'clone'
Traceback (most recent call last):
  File "src/train.py", line 830, in <module>
    main()
  File "src/train.py", line 690, in main
    train_step(dummy_batch)
  File "src/train.py", line 566, in train_step
    loss, acc, ppl = forward_step(batch)
  File "src/train.py", line 556, in forward_step
    acc = reduce_tensor(acc)
  File "src/train.py", line 530, in reduce_tensor
    reduced = tensor.clone()
AttributeError: 'float' object has no attribute 'clone'
Traceback (most recent call last):
  File "src/train.py", line 830, in <module>
Traceback (most recent call last):
  File "src/train.py", line 830, in <module>
    main()
  File "src/train.py", line 690, in main
    main()
  File "src/train.py", line 690, in main
    train_step(dummy_batch)
    train_step(dummy_batch)
  File "src/train.py", line 566, in train_step
  File "src/train.py", line 566, in train_step
    loss, acc, ppl = forward_step(batch)
  File "src/train.py", line 556, in forward_step
    loss, acc, ppl = forward_step(batch)
  File "src/train.py", line 556, in forward_step
    acc = reduce_tensor(acc)
  File "src/train.py", line 530, in reduce_tensor
    acc = reduce_tensor(acc)
  File "src/train.py", line 530, in reduce_tensor
    reduced = tensor.clone()
AttributeError: 'float' object has no attribute 'clone'
    reduced = tensor.clone()
AttributeError: 'float' object has no attribute 'clone'

Custom data fix

A fix to the custom data class:

class CustomDataset(DialogDataset):
    """
    Example for defining a custom dataset.
    """

    name = 'custom_dataset'

    @classmethod
    def download(cls, args):
        # this method would normally download the
        # dataset but it is assumed that custom data
        # is already present
        return [
            (join(args.data_dir, split) + '.txt', split)
            for split in ['train', 'valid', 'test']
        ]

    @classmethod
    def read_file(cls, data_path):
        """
        Reads the contents of a raw file.
        """
        with open(data_path, 'r') as fh:
            for line in fh:
                yield line.strip()

    @classmethod
    def generate_splits(cls, extracted_files):
        """
        Creates splits from the extracted_files.
        """
        def generate_uttrs(split):
            """
            Generates data from text data.
            """
            split_read = cls.read_file(split)
            dialog = []
            for utterance in split_read:
                if utterance == '':
                    yield dialog
                    dialog = []
                else:
                    dialog.append(utterance)

        return [
            (generate_uttrs(file_path), name)
            for file_path, name in extracted_files
        ]

Issue with latest merge [torch.distributed]

When running the run.py script with torch==1.2.0 I get the following error:

Traceback (most recent call last): File "run.py", line 16, in <module> from src.train import ( File "C:\Users\altoz\Documents\Projects\dialogue-generation\src\train.py", line 50, in <module> from torch.distributed import all_reduce, reduce_op ImportError: cannot import name 'all_reduce'

Incorporate PyTorch-Reformer

This repo https://github.com/lucidrains/reformer-pytorch implements the Reformer model that can handle inputs up to 32K tokens (theoretically even more). The implementation seems to be straightforward to incorporate into this repo.

Errors regarding the latest merge

First error:

Traceback (most recent call last):
  File "run.py", line 88, in <module>
    main()
  File "run.py", line 68, in main
    create_dataset(args=args)
  File "/Downloads/GPT2/src/data.py", line 540, in create_dataset
    files = data_cls.download(args=args)
  File "/Downloads/GPT2/src/data.py", line 299, in download
    if args.local_rank in [-1, 0]:
AttributeError: 'Namespace' object has no attribute 'local_rank'

I just commented the if-check.

Then, another error:

Traceback (most recent call last):
  File "run.py", line 16, in <module>
    from src.train import (
  File "/Downloads/GPT2/src/train.py", line 26, in <module>
    from src.data import (
  File "/Downloads/GPT2/src/data.py", line 38, in <module>
    from collate import COLLATE
ImportError: cannot import name 'COLLATE'

gpt2-large seem to be supported but doesn't work in practice.

Is it just not ready yet or is this a bug?

Model name 'gpt2-large' was not found in model name list (gpt2-medium, gpt2). We assumed 'gpt2-large' was a path or url but couldn't find tokenizer filesat this path or url.
Traceback (most recent call last):
  File "run.py", line 92, in <module>
    main()
  File "run.py", line 69, in main
    _, tokenizer = create_dataset(args=args)
  File "/Downloads/dialogue-generation/src/data.py", line 533, in create_dataset
    tokenizer = create_tokenizer(args)
  File "/Downloads/dialogue-generation/src/data.py", line 505, in create_tokenizer
    instance.add_special_tokens({
AttributeError: 'NoneType' object has no attribute 'add_special_tokens'

how about the result of xlnet

Not enough values to unpack (expected 3, got 0)

I have run below command successfully :

git clone https://github.com/bme-chatbots/dialogue-generation.git
cd dialogue-generation
pip install -r requirements.txt
python setup.py build_ext --inplace

After That I use my custom dataset which folder structure is: data/custom_dataset /train.txt, valid.txt, test.txt
Is there any change in data.py to read custom_dataset???
then I execute
python -m src.train --model gpt2-medium --data custom_dataset --name mychatbot

finally i got this error. How can i sove it????

epoch train_loss train_acc train_ppl valid_loss valid_acc valid_ppl
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\ProgramData\Anaconda3\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "C:\Users\Md. Nadim Kaysar\Documents\Gpt2 V2\dialogue-generation\src\train.py", line 830, in
main()
File "C:\Users\Md. Nadim Kaysar\Documents\Gpt2 V2\dialogue-generation\src\train.py", line 758, in main
valid_loss, valid_acc, valid_ppl = [
ValueError: not enough values to unpack (expected 3, got 0)

Loading metadata stuck on distributed machine

I'm trying to run your model on a machine with 8 Nvidia V100 GPUs.
I used this command:

torch.distributed.init_process_group( backend='nccl', init_method='env://', rank=args.local_rank, world_size=8)

Where local_rank is set to 1. But it gets stuck on:

Loading metadata from ./data/dailydialog/xlnet/metadata.json

How can I resolve this, please?

training error in python 3.7

epoch    train_loss    train_acc    train_ppl    valid_loss    valid_acc    valid_ppl
-------  ------------  -----------  -----------  ------------  -----------  -----------
Traceback (most recent call last):                                                                                                                         
  File "/home/uname/.pyenv/versions/3.7.6/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/uname/.pyenv/versions/3.7.6/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/uname/proj/github/dialogue-generation/src/train.py", line 830, in <module>
    main()
  File "/home/uname/proj/github/dialogue-generation/src/train.py", line 760, in main
    for values in valid_metrics
ValueError: not enough values to unpack (expected 3, got 0)

Recovering from distributed training OOM error

Currently during distributed training the loop will enter a deadlock upon any kind of exception (typicaly OOM error). There are two solutions:

Write a sync recovery function that computes dummy backward pass upon exception so other processes can resume training.
Compute the largest batch size in the training set before training and raise an error if an OOM is raised, so batch size or max sequence length can be lowered.