shangwenwang / cognac Goto Github PK

View Code? Open in Web Editor NEW

6.0 1.0 7.0 789 KB

The official webpage of an FSE2021 paper

License: MIT License

Python 99.75% Shell 0.25%

cognac's Introduction

This is the online repository of the ESEC/FSE2021 paper titled "Lightweight Global and Local Contexts Guided Method Name Recommendation with Prior Knowledge".

Datasets

All datasets used in our study are open-sourced. We provide the links to each of them below.

Empirical dataset (here)
MNR task datasets: Java-small, Java-med, Java-large (here)
MNR task dataset: MNire's (here)
MCC task dataset (here)

Source Code

Requirements

Our Cognac is implemented by following the PyTorch version of pointer generator network. It is built on PyTorch-1.5 and TensorFlow-1.12. We use FastText to embed each token and utilize the Python package javalang to perform program analysis. Link to the installation of this package is here.

Reproduction Steps

To reproduce our study, you need to:

Execute dataextractor.py to extract the inputs of Cognac;
Execute train_fasttext.py to train the FastText model with using the extracted data from the last step.
Train, validate, and test the model by executing start_train.sh, start_eval.sh, and start_decode.sh respectively.
If you want to reproduce the MCC task, execute decode_mcc.py and cal_sim.py respectively.

Performance Analysis

We are unsure that other reproduction studies can achieve the same results as ours. Reasons for such deviation can come from:

The hyperparameters in the config.py file may need to be fine-tuned.
In datasetextractor.py, we set a threshold to restrict the time consumption for parsing each Java file. Hence, servers with different hardware configuration may parse diverse numbers of methods.

cognac's People

Contributors

Stargazers

Watchers

Forkers

eddings nashid area51-playzone shinepig123 ddf1826120803 binglinchengxiash lidiancracy

cognac's Issues

Quesstions about MCC task?

Thanks your great work!
In step4, I have a problem.

Regarding the data generation

Hello, thank you very much for sharing your great work !

I am so sorry to disturb you! I'm facing some problems when running the dataset generation.
I hope you can help me to solve it! Thank you in advance :)

If I'm not wrong,
====Step 1====
dataextractor.py
input:
java-small/training
java-small/validation
java-small/test

output:
training.json
validation.json
test.json

====Step 2====
train_fasttext.py
input:
training.json
validation.json
test.json

output:
fasttext_vectors/vocab.pkl
fasttext_vectors/weight.pkl

vocab.pkl is only 58 bytes and weight.pkl is 2206 bytes

I checked inside the code (train_fasttext.py) and figured that the json2corpus method has split(', '). However, I don't have the same expression in the JSON files generated by Step 1 (dataextractor.py).

Could you please kindly advise me on how I can fix this problem?

validation_shuffled

Hello @ShangwenWang,

Could you let me know about the file named "validation_shuffled.json" ?

This file is only used in cal_sim.py; hence I wonder how I can generate this file.

Thank you for your help!

代码报错

作者你好，我在运行dataextractor.py的时候，第309行的method.tokens = AST.tokens一直报错，'MethodDeclaration' object has no attribute 'tokens'，是我安装的javalang版本问题吗？早期的MethodDeclaration有tokens属性吗

for method in AST.types[0].body:
            #TODO: a question is how to add constructor
            if not isinstance(method, self.consideredType) or method.name == 'main':
                continue
            curMethod = curClass + '.' + method.name
            method.tokens = AST.tokens
            self.methodMapping[curMethod] = method
            invocations = self.findInvocation(str(method))
            invocations_completed = self.completeInvocation(localImports, invocations, curClass, superclass, curClassMethods)
            self.callee[curMethod] = set(invocations_completed)
            for x in invocations_completed:
                self.caller[x].add(curMethod)

Deal with data issues

Hi, I'm having trouble getting the token sequence in the first step. After parsing the code through Javalang,I can't get the token sequence, and the type of each token. There is an error that the parsed AST does not have a token sequence. Can you help me?

method.tokens = AST.tokens
The AST does not have a token attribute.

Some problems

Hello, I'am sorry to disturb you，I met some problems when running your model. I hope you can help me to solve them，Thank you in advance.
1、I have finished Step 1 and Step 2 according to your steps, but the generated VOCAB.PKL is very small, only 1KB. Is this normal?
2、I ran into difficulty on the third step，I don't know what those paths are supposed to be.Can you describe it to me in detail，please？
3、I know you are using Pointer Generator Network，Do I need to use his code to help complete your model or do I just need the code you provide?

train_data_path = os.path.join(root_dir, "path2train")
eval_data_path = os.path.join(root_dir, "path2validation")
decode_data_path = os.path.join(root_dir, "path2test")
vocab_path = os.path.join(root_dir, "path2vocab")
log_root = os.path.join(root_dir, "path2log")
excluded_type = {}

Questions about gaps in outcomes

Hello，
I'm sorry to bother you, but I'm having some problems.

I used the data sets you provided: Small, Med and Large to conduct experiments on your model, but the final results were quite different from those in your paper.

I am wondering if I have made some mistakes in parameters. Are the parameters listed in your config file consistent with your experiment at that time? Or is there any code that needs to be uncommented?

I hope you can give me some suggestions so that my reproduction can be as consistent as possible with yours.

Thank you very much for your help.

How to extract context to prepare dataset?

How do you extract the local and global context from Java code? Do you provide source code to extract context?

Code implementation error

In the file decode.py:

line 116, you ignore the return value 'enc_stmts'(expected 7, actually 8)
line 119 and line 164, call the "self.model.encoder/self.model.decoder", you miss one parameter.