Coder Social home page Coder Social logo

devjwsong / gpt2-dialogue-generation-pytorch Goto Github PK

View Code? Open in Web Editor NEW
173.0 3.0 24.0 58 KB

The PyTorch implementation of fine-tuning the GPT-2(Generative Pre-trained Transformer 2) for dialogue generation.

Home Page: https://songstudio.info/tech/tech-35

License: MIT License

Python 97.93% Shell 2.07%
pytorch gpt-2 natural-language-processing natural-language-generation nlp nlg multiturn

gpt2-dialogue-generation-pytorch's People

Contributors

devjwsong avatar jim-dev avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

gpt2-dialogue-generation-pytorch's Issues

TypeError: expected str, bytes or os.PathLike object, not NoneType

Hi! When I run sh exec_data_load.sh, the following error occurs. Could you please tell me how to solve it? Looking forward to your reply, thank you very much!

Traceback (most recent call last):
File "src/data_load.py", line 78, in
tokenizer = GPT2Tokenizer.from_pretrained(args.model_type)
File "/home/njtech/conda/envs/pytorch2/lib/python3.7/site-packages/transformers/tokenization_utils_base.py", line 1425, in from_pretrained
return cls._from_pretrained(*inputs, **kwargs)
File "/home/njtech/conda/envs/pytorch2/lib/python3.7/site-packages/transformers/tokenization_utils_base.py", line 1572, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "/home/njtech/conda/envs/pytorch2/lib/python3.7/site-packages/transformers/tokenization_gpt2.py", line 167, in init
with open(vocab_file, encoding="utf-8") as vocab_handle:
TypeError: expected str, bytes or os.PathLike object, not NoneType

Pretrained model

Hi, do you have a pretrained model. I would like to test it. Thanks

Cannot fine the specified checkpoint.

I have a checkpoint named test.cpkt but when I run sh exec_infer.sh --ckpt_name="test" (with our without quotations, and with or without /gpt2), the script cant find the checkpoint. I'm guessing this is a really dumb syntax error.

Running on Colab takes 200+ days on GPU

Hi,

First, I thank you for providing a working repo.

I tried your code on Google Colab, but it seems to take almost a year to train the model on STANDARD (NVIDIA T4 Tensor Core) GPU. Is there a way we can test this small scale? Perhaps even just a CPU?

Thank you.

maybe wrong in PadCollate

class PadCollate():
def init(self, eos_id):
self.eos_id = eos_id

def pad_collate(self, batch):
    input_ids, token_type_ids, labels =[], [], []
    for idx, seqs in enumerate(batch):
        input_ids.append(torch.LongTensor(seqs[0]))
        token_type_ids.append(torch.LongTensor(seqs[0]))
        labels.append(torch.LongTensor(seqs[2]))
        
    input_ids = torch.nn.utils.rnn.pad_sequence(input_ids, batch_first=True, padding_value=self.eos_id)
    token_type_ids = torch.nn.utils.rnn.pad_sequence(token_type_ids, batch_first=True, padding_value=self.eos_id)
    labels = torch.nn.utils.rnn.pad_sequence(labels, batch_first=True, padding_value=-100)

    return input_ids, token_type_ids, labels

token_type_ids.append(torch.LongTensor(seqs[0])) should change 0 to 1

Cuda out of memory

I tried to run the code on my own laptop and I got this error

Training starts.
#################### Epoch: 1 ####################
0%| | 0/9555 [00:00<?, ?it/s]
Traceback (most recent call last):
File "src/main.py", line 283, in
manager.train()
File "src/main.py", line 106, in train
outputs = self.model(
File "/home/reda/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/reda/anaconda3/lib/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 941, in forward
transformer_outputs = self.transformer(
File "/home/reda/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/reda/anaconda3/lib/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 789, in forward
outputs = block(
File "/home/reda/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/reda/anaconda3/lib/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 317, in forward
attn_outputs = self.attn(
File "/home/reda/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/reda/anaconda3/lib/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 258, in forward
attn_output, attn_weights = self._attn(query, key, value, attention_mask, head_mask)
File "/home/reda/anaconda3/lib/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 194, in _attn
attn_weights = self.attn_dropout(attn_weights)
File "/home/reda/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/reda/anaconda3/lib/python3.8/site-packages/torch/nn/modules/dropout.py", line 58, in forward
return F.dropout(input, self.p, self.training, self.inplace)
File "/home/reda/anaconda3/lib/python3.8/site-packages/torch/nn/functional.py", line 1076, in dropout
return VF.dropout(input, p, training) if inplace else _VF.dropout(input, p, training)
RuntimeError: CUDA out of memory. Tried to allocate 192.00 MiB (GPU 0; 3.95 GiB total capacity; 2.92 GiB already allocated; 95.25 MiB free; 2.96 GiB reserved in total by PyTorch)

Result after training

Do you also get a run folder after training with a meta, index, data file etc? So that I can use it in the interactive_conditional_samples.py from the gpt reposotory?

some problem in main.py line254

token_type_ids = [start_sp_id] + list(chain.from_iterable(input_hists)) + [self.args.sp2_id]

Sorry but I cannot understand the parameters in the chain.from_iterable( ), i think it should be "token_type_ids" rather than "input_hists". Did I misunderstand your code here?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.