michaelpradel / deepbugs Goto Github PK

View Code? Open in Web Editor NEW

150.0 150.0 46.0 48.97 MB

DeepBugs is a framework for learning bug detectors from an existing code corpus.

License: MIT License

JavaScript 77.30% Shell 0.30% Python 20.61% Jupyter Notebook 1.79%

deepbugs's People

Contributors

Stargazers

Watchers

Forkers

chubbymaggie ufwt roofkitty stevecheng-git erez-aharonov tiandiyixian sinharahul ml-in-programming chen0031 feifanxu ml-lab safe3 autolambda mir-am goodluckkk mmsbrggr tirthikas syssec-laboratory overwindows sabinezach lsabc gokulk1994 neomatrix369 d3v3l0 myteam888 manthanmkulakarni eriksali wooza maukbe jerryzhuochuxin tjjack msl9810 sonnguyenvnu jirigesi techhubeye ptrkstrk shihai1991 fjgao monamour716 uynewnas xstar9 nhdang002

deepbugs's Issues

OOV are not replaced with UNK token when training and validation happens.

When using BugDetection.py unknown tokens are not converted into UNK and instances containing them are skipped.
For example, this happens in python/LearningDataSwappedArgs.py lines 65-69.
For that one this could be fixed by having python/Util.py replace OOV tokens in the call before yielding it.

I've checked this for the swapped arguments bugs but I guess that the same happens for the rest.
Please let me know if I have misunderstood how the above works.

Using more dimensions for the embeddings.

Have you performed any experiments with larger embeddings from Word2Vec?
For example, 512 or 1024 dimensions.
Do you know if that results in better or worse performance?

Incorrect filename in the README section Embeddings for Identifiers

Part 3 of Embeddings for Identifiers has the following command:
python3 python/EmbeddingsLearnerWord2Vec.py token_to_number_*.json encoded_tokens_*.json.

But when the command is ran, with the appropriate filenames for token_to_number_*.json and encoded_tokens_*.json, I get an error:
python3: can't open file 'python/EmbeddingsLearnerWord2Vec.py': [Errno 2] No such file or directory.

Looking in the python directory I see a file named: EmbeddingLearnerWord2Vec.py not EmbeddingsLearnerWord2Vec.py.

Running the command with EmbeddingLearnerWord2Vec.py works fine.

Only 3 types available?

I found 5 types in BugDetection.py but only 3 types are listed in README.md. Can these two types(IncorrectAssignment and MissingArg) be used?

Difficulty running commands on training data corpus

Seem to have difficulty running the initial step 1 command from the Readme.md

"node javascript/extractFromJS.js idsLitsWithTokens --parallel 4 data/js/programs_50_training.txt data/js/programs_50"

With all the files correctly in place it seems to return a code of
Total number of files: 0
Left in worklist: 0. Spawning an instance.
Left in worklist: 0. Spawning an instance.
Left in worklist: 0. Spawning an instance.
Left in worklist: 0. Spawning an instance.

program transformations applied

I have read your paper and tried to understand how positive and negative training samples are created.

Is the transformations process available in this repo, if yes, please let me know the file names which are traversing the AST's and extracting the function calls so that the examples can be created.

setTimeout example for Angular.js project

In the paper, there is a code snippet from Angular.js project, where setTimeout function is used incorrectly.

browserSingleton . startPoller (100 ,
        function (delay , fn) {
            setTimeout (delay ,fn) ;
})

However, I am unable to find such instance in the Angular.js code repo.

Can you please share the link to the buggy code/commit or pull request?

Some Questions

Hello,

I have two questions about DeepBugs:

Why were the old models labelled as buggy and removed with the latest commit?
Is there a specific reason why the other bug detectors, especially Swapped Binary Operands, were not considered in the OOPSLA 18 paper?

Best regards
Florian

When the 'node javascript/extractFromJS.js calls --parallel 4 .. ' command is run for the bigger corpus, multiple 'calls..' .json files get created

Hi
When I run the 'node javascript/extractFromJS.js calls --parallel 4 .. ' command for the bigger corpus for programs_eval.txt(or programs_training.txt), multiple 'calls_..' .json file get created instead of a single 'calls_..'.json file(corresponding to programs_eval.txt).
Which of these 'calls_..'.json files should I use in the next step(for training the classifier) in python?

Error: "python/BugDetection.py", line 134, in <module> ...

I have followed all the instructions in read me file. The first part works fine where I get call_* files for training and eval. When I run the second step, below message is printed with bunch of other stuff:
Traceback (most recent call last):
File "python/BugDetection.py", line 134, in
learning_data.pre_scan(training_data_paths, validation_data_paths)

How to run each bug detector?

Hi,
I would like to just simply try out each bug detector. But the README.md does not seem to have explanations how to run each bug detector after the learning&validation process. That is, I expect that, given a javascript file as an input, then a bug detector, as an output, may report buggy locations in the file.

Can you give me some helps?

nodejs version requirement

Hello
Can you share the type_to_vector.json and node_type_to_vector.json or share idea on how to generate it.

Would be waiting for a response

Thanks and regards
Shivam