Coder Social home page Coder Social logo

michaelpradel / deepbugs Goto Github PK

View Code? Open in Web Editor NEW
150.0 150.0 46.0 48.97 MB

DeepBugs is a framework for learning bug detectors from an existing code corpus.

License: MIT License

JavaScript 77.30% Shell 0.30% Python 20.61% Jupyter Notebook 1.79%

deepbugs's People

Contributors

michaelpradel avatar wooza avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

deepbugs's Issues

OOV are not replaced with UNK token when training and validation happens.

When using BugDetection.py unknown tokens are not converted into UNK and instances containing them are skipped.
For example, this happens in python/LearningDataSwappedArgs.py lines 65-69.
For that one this could be fixed by having python/Util.py replace OOV tokens in the call before yielding it.

I've checked this for the swapped arguments bugs but I guess that the same happens for the rest.
Please let me know if I have misunderstood how the above works.

Using more dimensions for the embeddings.

Have you performed any experiments with larger embeddings from Word2Vec?
For example, 512 or 1024 dimensions.
Do you know if that results in better or worse performance?

Incorrect filename in the README section Embeddings for Identifiers

Part 3 of Embeddings for Identifiers has the following command:
python3 python/EmbeddingsLearnerWord2Vec.py token_to_number_*.json encoded_tokens_*.json.

But when the command is ran, with the appropriate filenames for token_to_number_*.json and encoded_tokens_*.json, I get an error:
python3: can't open file 'python/EmbeddingsLearnerWord2Vec.py': [Errno 2] No such file or directory.

Looking in the python directory I see a file named: EmbeddingLearnerWord2Vec.py not EmbeddingsLearnerWord2Vec.py.

Running the command with EmbeddingLearnerWord2Vec.py works fine.

Only 3 types available?

I found 5 types in BugDetection.py but only 3 types are listed in README.md. Can these two types(IncorrectAssignment and MissingArg) be used?

Difficulty running commands on training data corpus

Seem to have difficulty running the initial step 1 command from the Readme.md

"node javascript/extractFromJS.js idsLitsWithTokens --parallel 4 data/js/programs_50_training.txt data/js/programs_50"

With all the files correctly in place it seems to return a code of
Total number of files: 0
Left in worklist: 0. Spawning an instance.
Left in worklist: 0. Spawning an instance.
Left in worklist: 0. Spawning an instance.
Left in worklist: 0. Spawning an instance.

program transformations applied

I have read your paper and tried to understand how positive and negative training samples are created.

Is the transformations process available in this repo, if yes, please let me know the file names which are traversing the AST's and extracting the function calls so that the examples can be created.

setTimeout example for Angular.js project

In the paper, there is a code snippet from Angular.js project, where setTimeout function is used incorrectly.

browserSingleton . startPoller (100 ,
        function (delay , fn) {
            setTimeout (delay ,fn) ;
})

However, I am unable to find such instance in the Angular.js code repo.

Can you please share the link to the buggy code/commit or pull request?

Some Questions

Hello,

I have two questions about DeepBugs:

  1. Why were the old models labelled as buggy and removed with the latest commit?
  2. Is there a specific reason why the other bug detectors, especially Swapped Binary Operands, were not considered in the OOPSLA 18 paper?

Best regards
Florian

When the 'node javascript/extractFromJS.js calls --parallel 4 .. ' command is run for the bigger corpus, multiple 'calls..' .json files get created

Hi
When I run the 'node javascript/extractFromJS.js calls --parallel 4 .. ' command for the bigger corpus for programs_eval.txt(or programs_training.txt), multiple 'calls_..' .json file get created instead of a single 'calls_..'.json file(corresponding to programs_eval.txt).
Which of these 'calls_..'.json files should I use in the next step(for training the classifier) in python?

Error: "python/BugDetection.py", line 134, in <module> ...

I have followed all the instructions in read me file. The first part works fine where I get call_* files for training and eval. When I run the second step, below message is printed with bunch of other stuff:
Traceback (most recent call last):
File "python/BugDetection.py", line 134, in
learning_data.pre_scan(training_data_paths, validation_data_paths)

image

How to run each bug detector?

Hi,
I would like to just simply try out each bug detector. But the README.md does not seem to have explanations how to run each bug detector after the learning&validation process. That is, I expect that, given a javascript file as an input, then a bug detector, as an output, may report buggy locations in the file.

Can you give me some helps?

nodejs version requirement

Hello
Can you share the type_to_vector.json and node_type_to_vector.json or share idea on how to generate it.

Would be waiting for a response

Thanks and regards
Shivam

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.