Coder Social home page Coder Social logo

joaohenggeler / uc-masters-software-vulnerabilities Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 1.0 7.16 GB

This repository contains any code and documents developed for the master thesis "Building and Evaluating Software Vulnerability Datasets" (2020/2021).

License: Apache License 2.0

Python 99.45% C++ 0.55%

uc-masters-software-vulnerabilities's Introduction

Building and Evaluating Software Vulnerability Datasets

This repository contains any code and documents developed for the master thesis "Building and Evaluating Software Vulnerability Datasets" (2020/2021).

Background

Software vulnerabilities can have serious consequences when exploited, such as unauthorized authentication, data losses, and financial losses. Although there exist techniques for detecting these vulnerabilities by analyzing the source code or executing the software, these suffer from both false positives (misidentified vulnerabilities) and false negatives (undetected vulnerabilities). One other way of identifying vulnerabilities is to combine certain source code properties (software metrics) with machine learning techniques. A previous study has shown this to be feasible, although the data that was collected is now out of date. In a similar fashion, security alerts (i.e. potential vulnerabilities) may be found directly by using Static Analysis Tools (SATs), though these also present a high number of false positives.

Contributions

  • Implemented an automated process capable of collecting vulnerability metadata from the CVE Details website, retrieving any affected code units (files, functions, classes) from a project's version control system, generating software metrics and security alerts for each one, storing the collected information in a MySQL database, and building robust datasets capable of being fed to machine learning algorithms.

  • Built datasets of vulnerable code units for five large open-source C/C++ projects: Mozilla, Linux Kernel, Xen Hypervisor, Apache HTTP Server, and GNU C Library (Glibc).

  • Validated the function samples by exploring various machine learning configurations and investigating whether it is possible to detect vulnerable function code in current versions using static data from previous commits.

Publications

Authors

  • João Henggeler Antunes - Student
  • José Alexandre D'Abruzzo Pereira - Supervisor
  • Marco Vieira - Supervisor

uc-masters-software-vulnerabilities's People

Contributors

joaohenggeler avatar joseadp avatar

Watchers

 avatar  avatar

Forkers

joelyyoung

uc-masters-software-vulnerabilities's Issues

Investigate the MySQL storage engines used in the original dataset import scripts

The tables that were imported from the original dataset use MyISAM as their storage engine. In MySQL 5.5 and later, the default is InnoDB. Since the Python scripts don't specify an engine when creating any new tables, these will use InnoDB while the tables from the original dataset use MyISAM.

The InnoDB engine provides foreign key referential integrity constraints, while MyISAM does not. If you try to define a foreign key when creating a new table (which will use InnoDB by default), you won't be able to relate it to a column in a table that uses MyISAM. For example, when creating a relationship between the table 'alert' (InnoDB) and 'patches' or 'files_*' (MyISAM).

For more information on MySQL storage engines, refer to this page.

How important is this issue? Should only one type of engine be used (use MyISAM for new tables, or convert any old ones to InnoDB)?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.