Coder Social home page Coder Social logo

wanyao1992 / code_summarization_public Goto Github PK

View Code? Open in Web Editor NEW
74.0 6.0 30.0 102.28 MB

source code for 'Improving automatic source code summarization via deep reinforcement learning'

Python 60.09% ANTLR 2.05% Java 37.39% Perl 0.48%
code summarization reinforcement deep-reinforcement-learning ast tree-structure pytorch comment-generation

code_summarization_public's People

Contributors

wanyao1992 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

code_summarization_public's Issues

代码调试出现除0错误

我在运行时,报如下错误:
Traceback (most recent call last):
File "a2c-train.py", line 348, in
main()
File "a2c-train.py", line 330, in main
xent_trainer.train(opt.start_epoch, opt.start_reinforce - 1, start_time)
File "F:\code_summarization_public-master\lib\train\Trainer.py", line 30, in train
train_loss = self.train_epoch(epoch)
File "F:\code_summarization_public-master\lib\train\Trainer.py", line 103, in train_epoch
return total_loss / total_words
ZeroDivisionError: division by zero
failed.
您在实验中是否遇到类似的问题,希望您可以帮我解答一下,万分感谢!

Error while training the Hybrid Model: Function CatBackward returned an invalid gradient at index 1 - got [85, 1, 512] but expected shape compatible with [57, 1, 512] failed.

### Run time Log:
python a2c-train.py -data dataset/train/processed_all.train.pt -save_dir dataset//result/ -embedding_w2v dataset/train/ -start_reinforce 10 -end_epoch 30 -critic_pretrain_epochs 10 -data_type hybrid -has_attn 1 -gpus 0
Start...

  • vocabulary size. source = 50004; target = 31415
  • number of XENT training sentences. 54426
  • number of PG training sentences. 54426
  • maximum batch size. 32
    Building model...
    use_critic: True
    /usr/local/lib/python3.6/dist-packages/torch/nn/modules/rnn.py:50: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.3 and num_layers=1
    "num_layers={}".format(dropout, num_layers))
    model: Hybrid2SeqModel(
    (code_encoder): TreeEncoder(
    (word_lut): Embedding(50004, 512, padding_idx=0)
    (leaf_module): BinaryTreeLeafModule(
    (cx): Linear(in_features=512, out_features=512, bias=True)
    (ox): Linear(in_features=512, out_features=512, bias=True)
    )
    (composer): BinaryTreeComposer(
    (ilh): Linear(in_features=512, out_features=512, bias=True)
    (irh): Linear(in_features=512, out_features=512, bias=True)
    (lflh): Linear(in_features=512, out_features=512, bias=True)
    (lfrh): Linear(in_features=512, out_features=512, bias=True)
    (rflh): Linear(in_features=512, out_features=512, bias=True)
    (rfrh): Linear(in_features=512, out_features=512, bias=True)
    (ulh): Linear(in_features=512, out_features=512, bias=True)
    (urh): Linear(in_features=512, out_features=512, bias=True)
    )
    )
    (text_encoder): Encoder(
    (word_lut): Embedding(50004, 512, padding_idx=0)
    (rnn): LSTM(512, 512, dropout=0.3)
    )
    (decoder): HybridDecoder(
    (word_lut): Embedding(31415, 512, padding_idx=0)
    (rnn): StackedLSTM(
    (dropout): Dropout(p=0.3, inplace=False)
    (layers): ModuleList(
    (0): LSTMCell(1024, 512)
    )
    )
    (attn): HybridAttention(
    (linear_in): Linear(in_features=512, out_features=512, bias=False)
    (sm): Softmax(dim=None)
    (linear_out): Linear(in_features=2048, out_features=512, bias=False)
    (tanh): Tanh()
    )
    (dropout): Dropout(p=0.3, inplace=False)
    )
    (generator): BaseGenerator(
    (generator): Linear(in_features=512, out_features=31415, bias=True)
    )
    )
    optim: <lib.train.Optim.Optim object at 0x7f34d70f0c50>
    opt.start_reinforce: 10
  • number of parameters: 92592823
    opt.eval: False
    opt.eval_sample: False
    supervised_data.src: 54426
    supervised_data.tgt: 54426
    supervised_data.trees: 54426
    supervised_data.leafs: 54426
    supervised training..
    start_epoch: 1
  • XENT epoch *
    Model optim lr: 0.001
    <class 'lib.data.Dataset.Dataset'> 54426
    /usr/local/lib/python3.6/dist-packages/torch/nn/functional.py:1351: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.
    warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.")
    /usr/local/lib/python3.6/dist-packages/torch/nn/functional.py:1340: UserWarning: nn.functional.tanh is deprecated. Use torch.tanh instead.
    warnings.warn("nn.functional.tanh is deprecated. Use torch.tanh instead.")
    /content/drive/My Drive/notebooks/Python_method_name_prediction/code_summarization_public/lib/model/HybridAttention.py:34: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
    attn_tree = self.sm(attn_tree)
    /content/drive/My Drive/notebooks/Python_method_name_prediction/code_summarization_public/lib/model/HybridAttention.py:36: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
    attn_txt = self.sm(attn_txt)
    outputs: torch.Size([26, 32, 512])
    /content/drive/My Drive/notebooks/Python_method_name_prediction/code_summarization_public/lib/metric/Loss.py:8: UserWarning: Implicit dimension choice for log_softmax has been deprecated. Change the call to include dim=X as an argument.
    log_dist = F.log_softmax(logits)
    loss value: 3042.23095703125
    ---else---
    torch.Size([26, 32, 512])
    torch.Size([26, 32, 512])
    Traceback (most recent call last):
    File "a2c-train.py", line 339, in
    main()
    File "a2c-train.py", line 321, in main
    xent_trainer.train(opt.start_epoch, opt.start_reinforce - 1, start_time)
    File "/content/drive/My Drive/notebooks/Python_method_name_prediction/code_summarization_public/lib/train/Trainer.py", line 30, in train
    train_loss = self.train_epoch(epoch)
    File "/content/drive/My Drive/notebooks/Python_method_name_prediction/code_summarization_public/lib/train/Trainer.py", line 85, in train_epoch
    loss = self.model.backward(outputs, targets, weights, num_words, self.loss_func)
    File "/content/drive/My Drive/notebooks/Python_method_name_prediction/code_summarization_public/lib/model/EncoderDecoder.py", line 547, in backward
    outputs.backward(grad_output)
    File "/usr/local/lib/python3.6/dist-packages/torch/tensor.py", line 195, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
    File "/usr/local/lib/python3.6/dist-packages/torch/autograd/init.py", line 99, in backward
    allow_unreachable=True) # allow_unreachable flag
    RuntimeError: Function CatBackward returned an invalid gradient at index 1 - got [85, 1, 512] but expected shape compatible with [57, 1, 512]
    failed.

关于参数数量减少

您好,您论文中 We can see that the actor and critic networks share the modules (a)-(b)-(c), reducing the number of learning parameters a lot. 意思是它俩共享参数吗?如果是的话,我在代码中好像没有看到;如果不是的话,请问您这是什么意思?

Unpickling error

Get this error on running the testing part (tried it on two separate systems to get the same error):

Start...

  • vocabulary size. source = 50004; target = 31280
  • number of XENT training sentences. 1000
  • number of PG training sentences. 1000
  • maximum batch size. 64
    Building model...
    ('use_critic: ', False)
    Loading from checkpoint at /media/BACKUP/ghproj_d/code_summarization/github-python/result/model_rf_hybrid_1_29_reinforce.pt
    Traceback (most recent call last):
    File "a2c-train.py", line 349, in
    main()
    File "a2c-train.py", line 254, in main
    checkpoint = torch.load(opt.load_from, map_location=lambda storage, loc: storage)
    File "/usr/local/lib/python2.7/dist-packages/torch/serialization.py", line 231, in load
    return _load(f, map_location, pickle_module)
    File "/usr/local/lib/python2.7/dist-packages/torch/serialization.py", line 369, in _load
    magic_number = pickle_module.load(f)
    cPickle.UnpicklingError: could not find MARK

Any help would be appreciated.

关于mlp在哪里的问题

您好,您这篇文章很好,我正在研究代码。考虑到您也是**人,我就直接用中文提问了。
我在论文中看到您有一个mlp,但在代码里我在create_critic里好像没看到相关的操作,与actor的区别仅仅是gen_out_size=1。
我接触pytorch不久,或许是我遗漏了什么,希望您能指明mlp的相关操作在哪个文件哪个函数下面,非常感谢您的帮助。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.