Coder Social home page Coder Social logo

cognac's Introduction

This is the online repository of the ESEC/FSE2021 paper titled "Lightweight Global and Local Contexts Guided Method Name Recommendation with Prior Knowledge".

Datasets

All datasets used in our study are open-sourced. We provide the links to each of them below.

  • Empirical dataset (here)
  • MNR task datasets: Java-small, Java-med, Java-large (here)
  • MNR task dataset: MNire's (here)
  • MCC task dataset (here)

Source Code

Requirements

Our Cognac is implemented by following the PyTorch version of pointer generator network. It is built on PyTorch-1.5 and TensorFlow-1.12. We use FastText to embed each token and utilize the Python package javalang to perform program analysis. Link to the installation of this package is here.

Reproduction Steps

To reproduce our study, you need to:

  1. Execute dataextractor.py to extract the inputs of Cognac;
  2. Execute train_fasttext.py to train the FastText model with using the extracted data from the last step.
  3. Train, validate, and test the model by executing start_train.sh, start_eval.sh, and start_decode.sh respectively.
  4. If you want to reproduce the MCC task, execute decode_mcc.py and cal_sim.py respectively.

Performance Analysis

We are unsure that other reproduction studies can achieve the same results as ours. Reasons for such deviation can come from:

  1. The hyperparameters in the config.py file may need to be fine-tuned.
  2. In datasetextractor.py, we set a threshold to restrict the time consumption for parsing each Java file. Hence, servers with different hardware configuration may parse diverse numbers of methods.

cognac's People

Contributors

shangwenwang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

cognac's Issues

Regarding the data generation

Hello, thank you very much for sharing your great work !

I am so sorry to disturb you! I'm facing some problems when running the dataset generation.
I hope you can help me to solve it! Thank you in advance :)

If I'm not wrong,
====Step 1====
dataextractor.py
input:
java-small/training
java-small/validation
java-small/test
Screenshot 2022-08-25 at 3 40 19 PM

output:
training.json
validation.json
test.json
Screenshot 2022-08-25 at 3 40 45 PM

====Step 2====
train_fasttext.py
input:
training.json
validation.json
test.json
Screenshot 2022-08-25 at 3 40 27 PM

output:
fasttext_vectors/vocab.pkl
fasttext_vectors/weight.pkl

vocab.pkl is only 58 bytes and weight.pkl is 2206 bytes
Screenshot 2022-08-25 at 3 45 16 PM

I checked inside the code (train_fasttext.py) and figured that the json2corpus method has split(', '). However, I don't have the same expression in the JSON files generated by Step 1 (dataextractor.py).
Screenshot 2022-08-25 at 3 41 10 PM

Could you please kindly advise me on how I can fix this problem?

validation_shuffled

Hello @ShangwenWang,

Could you let me know about the file named "validation_shuffled.json" ?

This file is only used in cal_sim.py; hence I wonder how I can generate this file.

Thank you for your help!

代码报错

作者你好,我在运行dataextractor.py的时候,第309行 的method.tokens = AST.tokens一直报错,'MethodDeclaration' object has no attribute 'tokens',是我安装的javalang版本问题吗?早期的MethodDeclaration有tokens属性吗

for method in AST.types[0].body:
            #TODO: a question is how to add constructor
            if not isinstance(method, self.consideredType) or method.name == 'main':
                continue
            curMethod = curClass + '.' + method.name
            method.tokens = AST.tokens
            self.methodMapping[curMethod] = method
            invocations = self.findInvocation(str(method))
            invocations_completed = self.completeInvocation(localImports, invocations, curClass, superclass, curClassMethods)
            self.callee[curMethod] = set(invocations_completed)
            for x in invocations_completed:
                self.caller[x].add(curMethod)

Deal with data issues

Hi, I'm having trouble getting the token sequence in the first step. After parsing the code through Javalang,I can't get the token sequence, and the type of each token. There is an error that the parsed AST does not have a token sequence. Can you help me?

method.tokens = AST.tokens
The AST does not have a token attribute.

Some problems

Hello, I'am sorry to disturb you,I met some problems when running your model. I hope you can help me to solve them,Thank you in advance.
1、I have finished Step 1 and Step 2 according to your steps, but the generated VOCAB.PKL is very small, only 1KB. Is this normal?
2、I ran into difficulty on the third step,I don't know what those paths are supposed to be.Can you describe it to me in detail,please?
3、I know you are using Pointer Generator Network,Do I need to use his code to help complete your model or do I just need the code you provide?

train_data_path = os.path.join(root_dir, "path2train")
eval_data_path = os.path.join(root_dir, "path2validation")
decode_data_path = os.path.join(root_dir, "path2test")
vocab_path = os.path.join(root_dir, "path2vocab")
log_root = os.path.join(root_dir, "path2log")
excluded_type = {}

Questions about gaps in outcomes

Hello,
I'm sorry to bother you, but I'm having some problems.

I used the data sets you provided: Small, Med and Large to conduct experiments on your model, but the final results were quite different from those in your paper.

I am wondering if I have made some mistakes in parameters. Are the parameters listed in your config file consistent with your experiment at that time? Or is there any code that needs to be uncommented?

I hope you can give me some suggestions so that my reproduction can be as consistent as possible with yours.

Thank you very much for your help.

Code implementation error

In the file decode.py:

  1. line 116, you ignore the return value 'enc_stmts'(expected 7, actually 8)
  2. line 119 and line 164, call the "self.model.encoder/self.model.decoder", you miss one parameter.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.