guxd / deepapi Goto Github PK

View Code? Open in Web Editor NEW

53.0 2.0 20.0 1.63 MB

Repository for Deep API Learning (DeepAPI)

Home Page: https://guxd.github.io/deepapi

License: MIT License

Python 100.00%

deepapi's Introduction

Deep API Learning

Code for the FSE 2016 paper Deep API Learning.

Two Versions

We release both Theano and PyTorch code of our approach, in the theano and pytorch folders, respectively.

The theano folder contains the code to run the experiments presented in the paper. The code is frozen to what it was when we originally wrote the paper. (NOTE: we modified some deprecated API invocations to fit for the latest python and theano).
The PyTorch is the bleeding-edge reporitory where we packaged it up, improved the code quality and added some features.

If you are interested in using DeepAPI, check out the PyTorch version and feel free to contribute.

For more information, please refer to the README files under the directory of each component.

Tool Demo

An online tool demo can be found in http://211.249.63.55/ (Currently shut down due to limited budget)

Citation

If you find it useful and would like to cite it, the following would be appropriate:

@inproceedings{gu2016deepapi,
    author = {Gu, Xiaodong and Zhang, Hongyu and Zhang, Dongmei and Kim, Sunghun},
    title = {Deep API Learning},
    booktitle = {Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering},
    series = {FSE 2016},
    year = {2016},
    location = {Seattle, WA, USA},
    pages = {631--642},
    publisher = {ACM},
    address = {New York, NY, USA},
}

deepapi's People

Contributors

Stargazers

Watchers

Forkers

wxmandrew zhaoyicc xing-hu haxzie-xx zhoupeng chubbymaggie aster1995 syssec-laboratory forest520 henry199898 caidw nimasteryang moshiii hapsby kongmoumou sure17 seram2531

deepapi's Issues

Please provide testing API or trained model.

Hi,
I am a researcher at York University and I am trying to reproduce your work. Is there any way to use your trained model for performance evaluation now? I saw the readme file says the testing API is not avaliable due to budget.
If not, can you please share the trained model? my email is [email protected]
Regards,
Moshi

Missing training hdf5 files

I am unable to run the PyTorch code. It is looking for for training h5 file - train.apiseq.shuf.h5

raise IOError("``%s`` does not exist" % (filename,))
OSError: ``./data/train.apiseq.shuf.h5`` does not exist

Where can I get the training files?

Code or tool for pre-processing source code

Can you also provide the code or tool for pre-processing source code?
(parsing source code, and extracting api sequences etc.)
Thanks!

lack of ScheduledOptim

“ScheduledOptim” is not defined in modules.Whether the content is missing？

please provide accurate training data

Hello!

Thanks for your amazing work!
I am a researcher at york university and I am trying to reproduce your work. I found some problem here:

I am trying to extract training data from the h5 file using the data loader but I got non-sense labels:

I am sure that this is not a parsing error for the reason that:

The index matches the dictionary:

dictionary:

I am using the original APIdata class

and I decode the api answer using the original script:

Please confirm this problem and upload the most recent valid dataset.
My email is [email protected]

Best,
Moshi

Question about BLEU metric

Dear authors,

I run sampel.py and the script output three values: recall, precision, and F1.

I am wondering which value did you refer to in the Table 1 of your paper? Is it recall, precision, or F1?

Thank you for your clarification!

Is the training model trustworthy?

I downloaded the 120,000. model you trained in the dataset and tested it. The following results were obtained from 1000 test questions，
Avg Recall BLEU 34.955057, Avg Precision BLEU 34.955057, F1 34.955057，Is there a problem there?

How is the training set generated

Dear Professor Gu, I was wondering how the training set is generated? When I opened the 'train.apiseq.h5' file, there were two data sets: indices and phrases. What do they each represent? At the same time, the indices data set contains two attributes, length and pos. How is it generated? After I read the paper, I still couldn't understand it, which caused me a lot of trouble. I'm looking forward to get your reply sincerely. Thank you!
Here are some of the data sets I could see.

can't import ScheduledOptim in seq2seq.py

Dear Author,
when I tried to run the pytorch part, I found that the ScheduledOptim class could not be referenced in the seq2seq.py file. I did not find this class in modules.py, how can I solve this problem?
Thank you!!!

Some typo errors in pytorch/sample.py line 35, lacking variable vocab?

I run the code, but encounter an error in Line35 in sample.py.

"name 'vocab' is not defined"

I checked that there seemed no variables named vocab, am i wrong or there existed some typo bugs?

(I am sorry I sent the same content in Pull requests part.