Coder Social home page Coder Social logo

ace's Introduction

Table of Contents

ACE - Automated Code Element Extractor

ACE extracts code elements (classes, methods, fields, etc) found in freeform ASCII text documents. It does not require any apriori knowledge of the code elements, but does require a large corpus of documents. The island parser is java based. ACE is based on heuristics and is NOT intended to be a re-implementation of the Java Spec. Since the tool is Java dependent, you must ensure that the docs you are passing it don't contain other languages that look like Java (e.g., C#).


There are two stages:

  1. Find code elements that are unambiguous and build an index of valid ce in the document corpus
  2. Find ambiguous code elements, and use the index to resolve them to their declaring type
More details in Rigby and Robillard ICSE 2013.

Required Libraries

  • sudo apt-get install postgresql libconfig-general-perl libregexp-common-perl libdbd-pg-perl libfile-slurp-perl libfile-find-rule-perl libio-all-perl
  • Postgres has ridculously low mem settings. Make sure to tune it for DW using http://pgtune.leopard.in.ua/

Run

  • See the example config file
    • src/example.conf
  • There are a number of main main_'type'.pl, for example, main_stackoverflow.pl. Main files are for different document types and are in main directory to prevent cluttering but must be moved and run from the /src directory to work
    • src/main/main_example.conf
  • resolve.sh will run all the appropriate perl and sql scripts
    •  ./resolve.sh db_name config_file main_file 2> project_name/project.error | tee project_name/project.out 
    •  ./resolve.sh so_2017_march android/android.conf main/main_stackoverflow.pl 2> android/android.error | tee andriod/android.out 
  • certain two camel words are are not valid classes and are project specific. (e.g., HttpClient is both a class and a project)
    • Too fix these terms add them to src/project_specific/<project_name></project_name>.sql
  • if you're going to run more than one project/tag through ACE in the same db,
    • you'll need to modify: src/rename_tables.sql

The Output

  • Just dump the clt table (Code Like Term) to a file
    • select tid, du, pqn, simple, kind from clt where trust = 0 and kind <> 'variable';
    • You can increase the reliability of tool by joining with the index table
  • tid = The thread id or question id
  • du = the document unit or post id
  • pqn = the partially qualified name (we don't resolve all the way back to packages)
  • kind = the type, method, field, package, annotation, etc
  • simple = the name of the code element (variables are not code elements and are only used as intermediaries)
  • trust = only want trust = 0, anything higher has been rejected

NOTES

  • ACE is designed for parsing code snippets and freeform text not entire source code files
    • Scope rules are expanded to include documentation units, etc
  • ACE includes elements from stacktraces

Special Cases

  • Ignore: System.* (e.g., System.out.println)
  • Ignore: Default annotation (e.g., @Override)
  • more details in src/limits.mediawiki

Manual Validation

Here are some scripts to do manual validation:

  • See src/benchmark/README.mediawiki
    • benchmark.sql -> dumps current bench_good.csv and creates a new bench.csv for analysis, also fills out the benchmark table
  • results_ace_tables.sql -> ace result tables including precision and recall
  • Note: If a CE is private (e.g., a method created by the poster) and it is not fully defined in that post, we ignore it because there's no way of knowing its declaring type

Further NLP parsing (experimental)

  • try: ./main/main_nlp_stackoverflow.pl android/android.conf
  • Note: Terms that remain ambiguous will not be highlighted, even if the term is identified later on in the document. This is to avoid terms that are very common. For example if url.get() is identified in a document, we could go back and highlight every instance of the word 'get', which would be silly.

What do the trust values mean?

If you just include code elements with trust 0, you'll be fine, if you want to know where things came from then you can look at the trust_original values:

0 &#45;&gt; naturally good&#58; qualified or defined or a new Constructor()
1&#45;&gt; variable and member, but variable is unresolved
2&#45;&gt; method undefined
3&#45;&gt; type undefined
4&#45;&gt; chain of methods, can have undefined type 
5&#45;&gt; field undefined
6&#45;&gt; declaration of constructor, method, class
7&#45;&gt; second pass from dictionary
8&#45;&gt; stacktrace
9&#45;&gt; ambigious package name
10&#45;&gt; annotation or annotation_package
11&#45;&gt; non&#45;compound second round (may not be processed)
12&#45;&gt; project specific badness

Android project

shams/Android_API_Change_Bug_History/src/combine_with_so

ace's People

Contributors

rigbypc avatar riskproject avatar daveguy avatar latifa-guerrouj avatar mrsumitbd avatar

Watchers

James Cloos avatar  avatar Dharani Kumar Palani avatar  avatar

Forkers

suntian11

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.