devjwsong / gpt2-dialogue-generation-pytorch Goto Github PK

View Code? Open in Web Editor NEW

173.0 3.0 24.0 58 KB

The PyTorch implementation of fine-tuning the GPT-2(Generative Pre-trained Transformer 2) for dialogue generation.

Home Page: https://songstudio.info/tech/tech-35

License: MIT License

Python 97.93% Shell 2.07%

pytorch gpt-2 natural-language-processing natural-language-generation nlp nlg multiturn

gpt2-dialogue-generation-pytorch's People

Contributors

Stargazers

Watchers

gpt2-dialogue-generation-pytorch's Issues

TypeError: expected str, bytes or os.PathLike object, not NoneType

Hi! When I run sh exec_data_load.sh, the following error occurs. Could you please tell me how to solve it? Looking forward to your reply, thank you very much！

Traceback (most recent call last):
File "src/data_load.py", line 78, in
tokenizer = GPT2Tokenizer.from_pretrained(args.model_type)
File "/home/njtech/conda/envs/pytorch2/lib/python3.7/site-packages/transformers/tokenization_utils_base.py", line 1425, in from_pretrained
return cls._from_pretrained(*inputs, **kwargs)
File "/home/njtech/conda/envs/pytorch2/lib/python3.7/site-packages/transformers/tokenization_utils_base.py", line 1572, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "/home/njtech/conda/envs/pytorch2/lib/python3.7/site-packages/transformers/tokenization_gpt2.py", line 167, in init
with open(vocab_file, encoding="utf-8") as vocab_handle:
TypeError: expected str, bytes or os.PathLike object, not NoneType

Pretrained model

Hi, do you have a pretrained model. I would like to test it. Thanks

Cannot fine the specified checkpoint.

I have a checkpoint named test.cpkt but when I run sh exec_infer.sh --ckpt_name="test" (with our without quotations, and with or without /gpt2), the script cant find the checkpoint. I'm guessing this is a really dumb syntax error.

Running on Colab takes 200+ days on GPU

Hi,

First, I thank you for providing a working repo.

I tried your code on Google Colab, but it seems to take almost a year to train the model on STANDARD (NVIDIA T4 Tensor Core) GPU. Is there a way we can test this small scale? Perhaps even just a CPU?

Thank you.

Error:Couldn't reach https://raw.githubusercontent.com/huggingface/datasets/1.16.1/datasets/daily_dialog/daily_dialog.py

"Download & Preprocess all datasets", I run the code as the Readme, but it remind me that "Couldn't reach https://raw.githubusercontent.com/huggingface/datasets/1.16.1/datasets/daily_dialog/daily_dialog.py".

Add sentencepiece to requirements.txt

maybe wrong in PadCollate

class PadCollate():
def init(self, eos_id):
self.eos_id = eos_id

def pad_collate(self, batch):
    input_ids, token_type_ids, labels =[], [], []
    for idx, seqs in enumerate(batch):
        input_ids.append(torch.LongTensor(seqs[0]))
        token_type_ids.append(torch.LongTensor(seqs[0]))
        labels.append(torch.LongTensor(seqs[2]))
        
    input_ids = torch.nn.utils.rnn.pad_sequence(input_ids, batch_first=True, padding_value=self.eos_id)
    token_type_ids = torch.nn.utils.rnn.pad_sequence(token_type_ids, batch_first=True, padding_value=self.eos_id)
    labels = torch.nn.utils.rnn.pad_sequence(labels, batch_first=True, padding_value=-100)

    return input_ids, token_type_ids, labels

token_type_ids.append(torch.LongTensor(seqs[0])) should change 0 to 1

Cuda out of memory

I tried to run the code on my own laptop and I got this error

Training starts.
#################### Epoch: 1 ####################
0%| | 0/9555 [00:00<?, ?it/s]
Traceback (most recent call last):
File "src/main.py", line 283, in
manager.train()
File "src/main.py", line 106, in train
outputs = self.model(
File "/home/reda/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/reda/anaconda3/lib/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 941, in forward
transformer_outputs = self.transformer(
File "/home/reda/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/reda/anaconda3/lib/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 789, in forward
outputs = block(
File "/home/reda/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/reda/anaconda3/lib/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 317, in forward
attn_outputs = self.attn(
File "/home/reda/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/reda/anaconda3/lib/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 258, in forward
attn_output, attn_weights = self._attn(query, key, value, attention_mask, head_mask)
File "/home/reda/anaconda3/lib/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 194, in _attn
attn_weights = self.attn_dropout(attn_weights)
File "/home/reda/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/reda/anaconda3/lib/python3.8/site-packages/torch/nn/modules/dropout.py", line 58, in forward
return F.dropout(input, self.p, self.training, self.inplace)
File "/home/reda/anaconda3/lib/python3.8/site-packages/torch/nn/functional.py", line 1076, in dropout
return VF.dropout(input, p, training) if inplace else _VF.dropout(input, p, training)
RuntimeError: CUDA out of memory. Tried to allocate 192.00 MiB (GPU 0; 3.95 GiB total capacity; 2.92 GiB already allocated; 95.25 MiB free; 2.96 GiB reserved in total by PyTorch)

Result after training

Do you also get a run folder after training with a meta, index, data file etc? So that I can use it in the interactive_conditional_samples.py from the gpt reposotory?

some problem in main.py line254

token_type_ids = [start_sp_id] + list(chain.from_iterable(input_hists)) + [self.args.sp2_id]

Sorry but I cannot understand the parameters in the chain.from_iterable( ), i think it should be "token_type_ids" rather than "input_hists". Did I misunderstand your code here？

devjwsong / gpt2-dialogue-generation-pytorch Goto Github PK

gpt2-dialogue-generation-pytorch's People

Contributors

Stargazers

Watchers

Forkers

gpt2-dialogue-generation-pytorch's Issues

TypeError: expected str, bytes or os.PathLike object, not NoneType

Pretrained model

Cannot fine the specified checkpoint.

Running on Colab takes 200+ days on GPU

Error:Couldn't reach https://raw.githubusercontent.com/huggingface/datasets/1.16.1/datasets/daily_dialog/daily_dialog.py

Add sentencepiece to requirements.txt

maybe wrong in PadCollate

Cuda out of memory

Result after training

some problem in main.py line254

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent