Coder Social home page Coder Social logo

autopruner's Introduction

AutoPruner

This repository contains source code of research paper "AutoPruner: Transformer-based Call Graph Pruning", which is published at ESEC/FSE 2022

@inproceedings{le2022autopruner,
  title={AutoPruner: transformer-based call graph pruning},
  author={Le-Cong, Thanh and Kang, Hong Jin and Nguyen, Truong Giang and Haryono, Stefanus Agus and Lo, David and Le, Xuan-Bach D and Huynh, Quyet Thang},
  booktitle={Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering},
  pages={520--532},
  year={2022}
}

注意,先把 requirments 看完再看 Structure。

Structure

The structure of our source code's repository is as follows:

  • config: contains our experimental configurations;
  • script: contains script for running our experiments;
  • src: contains our source code.
    • finetune: contains source code for fine-tuning phase
    • training: contains source code for training phase
    • utils: contains source code for utility functions, e.g., logger, visualization, ...
    • gnn: contains source code for gnn benchmark
    • Note that, for each sub-folder in this folder, main.py, dataset.py, model.py contains the source code of training/testing, dataset processing and deep learning models, respectively;
  • environment.yml: contains the configuration for AutoPruner's enviroment.

The structure of our data's repository is as follows:

  • dl_dataset: contains our processed dataset for AutoPruner;
  • gnn_dataset: contains our processed dataset for GNN benchmark;
  • gnn_model: contains our trained models for GNN benchmarks;
  • info_data: contains the lists of training and testing programs;
  • model: contains our trained models for AutoPruner;
  • npe_result: contains the results of manual evaluation for Null-pointer analysis;
  • processed_data: contains extracted source code for methods in programs in cgPruner's dataset
  • raw_data: contains the static call graphs generated by static analysis tools from cgPruner

Requirements

Hardware

  • More than 200GB disk space
  • 2 NVIDIA GPU that CUDA 11.3; supports and have at least 8GB memory.

Software

  • Ubuntu 18.04 or newer
  • Docker/Conda

Environment Configuration

Conda

conda env create -n autopruner --file environment.yml

Docker

For ease of use, we also provide a installation package via a docker image. User can setup AutoPruner's docker step-by-step as follows:

  • Pull AutoPruner's docker image:
docker pull thanhlecong/autopruner:v2
  • Run a docker container:
docker run --name autopruner -it --shm-size 16G --gpus all thanhlecong/autopruner:v2
  • Activate conda:
source /opt/conda/bin/activate
  • Activate AutoPruner's conda enviroment:
conda activate autopruner

Note that, the source code of AutoPruner are stored at /workspace/ in Docker. So, please move to this folder before running experiments.

Experiments

To use our tool, please use the following command

python3 -m src.training.main --config_path [config path]
                             --mode [mode: test or train] 
                             --feature [type of features: 0: structure, 1: semantic, 2:combine] 
                             --model_path [path to saved model (for saving in train mode and loading in test mode)]

数据集下载下来后的名字是 data.tar.gz,使用 tar -xvf data.tar.gz 进行解压,解压位置在 AutoPruner 文件夹同目录下。解压后的结构是:

├── AutoPruner
├── replication_package
└── data.tar.gz

To replicate the result of AutoPruner, please down the data from this link and put in the same folder with this repository, then run following below instructions. Note that, our results may be slightly different when running on different devices. However, this diffences does not affect our findings in the paper.

RQ1

RQ1 想要验证 AutoPruner 的有效性。Is AutoPruner effective in pruning false positives from static call graphs? 那么如何验证一个方法的有效性呢?

todo:

具体实验

To replicate the result of AutoPruner in call graph pruning on Wala (RQ1), please use

bash script/rq1_wala.sh

To replicate the result of AutoPruner in call graph pruning on Doop (RQ1), please use

bash script/rq1_doop.sh

To replicate the result of AutoPruner in call graph pruning on Petablox (RQ1), please use

bash script/rq1_peta.sh

RQ2

Null-pointer Analysis

In this analysis, we follow the experimental settings of cgPruner including their code of Null-pointer Analysis (NPA). Please refer to cgPruner's replication package for further instructions. You also can find our manual evaluation in npe_result folder in this link

Monomorphic Call-site Detection

To replicate the result of AutoPruner in monomorphic call-site detection on Wala's call graph (RQ1), please use

bash script/rq2_wala.sh

To replicate the result of AutoPruner in monomorphic call-site detection on Doop's call graph (RQ1), please use

bash script/rq2_doop.sh

To replicate the result of AutoPruner in monomorphic call-site detection on Petablox's call graph (RQ1), please use

bash script/rq2_peta.sh

RQ3

To replicate the ablation study of AutoPruner with strutural features, please use

bash script/rq3_structure.sh

To replicate the ablation study of AutoPruner with semantic features, please use

bash script/rq3_semantic.sh

To replicate the ablation study of AutoPruner with caller function, please use

bash script/rq3_caller.sh

To replicate the ablation study of AutoPruner with callee function, please use

bash script/rq3_callee.sh

autopruner's People

Contributors

thanhlecongg avatar luorongluorong avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.