Coder Social home page Coder Social logo

tdunlap607 / hermes Goto Github PK

View Code? Open in Web Editor NEW

This project forked from soarsmu/hermes

0.0 0.0 0.0 237.59 MB

Source code for accepted SANER 2022 Research Paper "HERMES: Using Commit-Issue Linking to Detect Vulnerability-Fixing Commits"

License: MIT License

Python 90.57% Jupyter Notebook 9.43%

hermes's Introduction

HERMES

Auto Vulnernability Fix Commit Classification

The dataset is available at location: MSR/2019/experiment/full_data_set_with_all_features.txt

Watch our presentation at SANER 2022: https://www.youtube.com/watch?v=S4a3wpHbVTw

Slides for the presentation in file: HERMES-SANER-2022_slides.pptx

Also PDF version: HERMES-SANER-2022_slides.pdf

in json format

Data Object Classes are described in entitites.py

To replicate HERMES on full dataset please use this command

python3 experiment.py --min_df 5 --use_linked_commits_only False --use_issue_classifier True --use_stacking_ensemble True --use-patch-context-lines False --tf-idf-threshold 0.005 --dataset sub_enhanced_dataset_th_100.txt

To replicate HERMES on subset of explicitly linked commits please use this command

python3 experiment.py --min_df 5 --use_linked_commits_only True --use_issue_classifier True --use_stacking_ensemble True --use-patch-context-lines False --tf-idf-threshold 0.005 --dataset full_dataset_with_all_features.txt

To replicate HERMES on the explicitly linked data and on commits recovered links

Please extract zip files in MSR2019/experiment which contain dataset corresponding to different threshold (thresholds are written at postfix of file names).

After that, to run HERMES on different threshold, please use this command's template:

python3 experiment.py --min_df 5 --use_linked_commits_only True --use_issue_classifier True --use_stacking_ensemble True --use-patch-context-lines False --tf-idf-threshold 0.005 --dataset file_name

where file_name is name of file in list of files just extracted

Link to our issue corpus: https://zenodo.org/record/5602211#.YXjQg9ZBxO8

Descriptions for parameter in command:

  • min_df [real]: Min document frequency to filter out infrequent terms
  • use_linked_commits_only [boolean]: Option to use all commits in dataset for training and testing, or only use commits where each contain at least one issue linked to Github or Jira issue tracker
  • use_issue_classifier [boolean]: Option to use or not use issue classifier in HERMES. If not, HERMES contain only message classifier and patch classifier
  • use_stacking_ensemble [boolean]: Option to use stacking ensemble or simple voting to combine base classifier (i.e. message classsifier, patch classifier, issue classifier). If true, use stacking ensemble for combination. Otherwise, use simple voting
  • tf-idf-threshold [real]: Option in issue classifier to filter out noises in issue classifier
  • dataset [string]: Name of the dataset selected for experiment

Steps on how to replicate:

Create a venv:

python3 -m venv venv
source ./venv/bin/activate

Rename the data_loader folder to loader: mv ./data_loader/ ./loader

Install the necessary dependencies: pip3 install numpy pygithub scikit-learn nltk pandas

Download stopword and punkt: places data in /home/USERNAME/nltk_data/

python3 -m nltk.downloader stopwords
python3 -m nltk.downloader punkt

Unzip the enchance dataset that you want to use:

unzip ./MSR2019/experiments/sub_enhanced_dataset_th_100.txt.zip

Create a directory to hold the classifier output: mkdir ./classifier_output

Run from the command line:

python3 experiment.py --min_df 5 --use_linked_commits_only False --use_issue_classifier True --use_stacking_ensemble True --use-patch-context-lines False --tf-idf-threshold 0.005 --dataset sub_enhanced_dataset_th_100.txt

Temp results on DAA: Message F1: 0.8 Issue F1: 1.0 Patch F1: 0.8

Training result for positive weight: 0.5, negative weight: 0.5 Log message mean precision: 1.0 Log message mean recall: 0.6666666666666666 Log message mean f1: 0.8 Issue mean precision: 1.0 Issue mean recall: 1.0 Issue mean f1: 1.0 Patch mean precision: 1.0 Patch mean recall: 0.6666666666666666 Patch mean f1: 0.8 Joint-model mean precision: 1.0 Joint-model mean recall: 0.6666666666666666 Joint-model mean f1: 0.8 Joint-model mean AUC-ROC: 1.0 Joint-model mean AUC-PR: 1.0

hermes's People

Contributors

ntgiang71096 avatar tdunlap607 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.