Coder Social home page Coder Social logo

sr-lab / glitch Goto Github PK

View Code? Open in Web Editor NEW
17.0 3.0 4.0 3.91 MB

GLITCH is a technology-agnostic framework that enables automated detection of code smells in Infrastructure-as-Code scripts.

License: GNU General Public License v3.0

Python 73.67% TypeScript 0.69% Ruby 0.94% Puppet 1.32% HCL 23.12% Dockerfile 0.27%
ansible chef iac puppet smell-detector linter

glitch's Introduction

GLITCH

DOI License: GPL-3.0 Python Version Last release

alt text

GLITCH is a technology-agnostic framework that enables automated detection of IaC smells. GLITCH allows polyglot smell detection by transforming IaC scripts into an intermediate representation, on which different smell detectors can be defined. GLITCH currently supports the detection of nine different security smells [1, 2] and nine design & implementation smells [3] in scripts written in Puppet, Ansible, or Chef.

Paper and Academic Usage

"GLITCH: Automated Polyglot Security Smell Detection in Infrastructure as Code" is the main paper that describes the implementation of security smells in GLITCH. It also presents a large-scale empirical study that analyzes security smells on three large datasets containing 196,755 IaC scripts and 12,281,251 LOC.

If you use GLITCH or any of its datasets, please cite:

@inproceedings{saavedraferreira22glitch,
 title={{GLITCH}: Automated Polyglot Security Smell Detection in Infrastructure as Code},
 author={Saavedra, Nuno and Ferreira, Jo{\~a}o F},
 booktitle={Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering},
 year={2022}
}
@inproceedings{saavedra23glitchdemo,
  author={Saavedra, Nuno and Gonçalves, João and Henriques, Miguel and Ferreira, João F. and Mendes, Alexandra},
  booktitle={2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE)}, 
  title={Polyglot Code Smell Detection for Infrastructure as Code with GLITCH}, 
  year={2023},
  pages={2042-2045},
  doi={10.1109/ASE56229.2023.00162}
}

Installation

To install run:

python -m pip install -e .

To use the tool for Chef you also need Ruby and its Ripper package installed.

Poetry

To install GLITCH using Poetry, run:

poetry install

WARNING: For now, the GLITCH VSCode extension does not function if GLITCH is installed via Poetry. Since Poetry uses virtual environments it does not create a binary for GLITCH available in the user's PATH, which is required for the VSCode extension.

Usage

To explore all available options, use the command:

glitch --help

To analyze a file or folder and retrieve CSV results, use the following command:

glitch --tech (chef|puppet|ansible|terraform) --csv --config PATH_TO_CONFIG PATH_TO_FILE_OR_FOLDER

If you want to consider the module structure you can add the flag --module.

Poetry

If GLITCH was installed using Poetry, execute GLITCH commands as follows:

poetry run glitch --help

Alternatively, you can use poetry shell:

poetry shell
glitch --help

Tests

To run the tests for GLITCH go to the folder glitch and run:

python -m unittest discover tests

Configs

New configs can be created with the same structure as the ones found in the folder configs.

Documentation

More information can be found in GLITCH's documentation.

VSCode extension

GLITCH has a Visual Studio Code extension which is available here.

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

License

GPL-3.0

References

[1] Rahman, A., Parnin, C., & Williams, L. (2019, May). The seven sins: Security smells in infrastructure as code scripts. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE) (pp. 164-175). IEEE.

[2] Rahman, A., Rahman, M. R., Parnin, C., & Williams, L. (2021). Security smells in ansible and chef scripts: A replication study. ACM Transactions on Software Engineering and Methodology (TOSEM), 30(1), 1-31.

[3] Schwarz, J., Steffens, A., & Lichter, H. (2018, September). Code smells in infrastructure as code. In 2018 11th International Conference on the Quality of Information and Communications Technology (QUATIC) (pp. 220-228). IEEE.

glitch's People

Contributors

ashrick12 avatar jff avatar joaotgoncalves avatar miguelchenriques avatar nfsaavedra avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

glitch's Issues

Output seems not correct for a simple yaml file

Here is my yaml file named test.yaml as below:

name: create an app with full permission

file:
  path: /app
  owner: foo
  group: foo
  mode: "777"

And I ran the command to generate the report this way glitch --tech ansible --csv test.yaml

The output below seems not correct:

+-------------------------------------+-------------+----------------------------+---------------------------+
| Smell                               | Occurrences | Smell density (Smell/KLoC) | Proportion of scripts (%) |
+-------------------------------------+-------------+----------------------------+---------------------------+
| Admin by default                    |           0 |                        0.0 |                       0.0 |
| Avoid comments                      |           0 |                        0.0 |                       0.0 |
| Duplicate block                     |           0 |                        0.0 |                       0.0 |
| Empty password                      |           0 |                        0.0 |                       0.0 |
| Full permission to the filesystem   |           0 |                        0.0 |                       0.0 |
| Hard-coded password                 |           0 |                        0.0 |                       0.0 |
| Hard-coded secret                   |           0 |                        0.0 |                       0.0 |
| Hard-coded user                     |           0 |                        0.0 |                       0.0 |
| Imperative abstraction              |           0 |                        0.0 |                       0.0 |
| Improper alignment                  |           0 |                        0.0 |                       0.0 |
| Invalid IP address binding          |           0 |                        0.0 |                       0.0 |
| Long Resource                       |           0 |                        0.0 |                       0.0 |
| Long statement                      |           0 |                        0.0 |                       0.0 |
| Misplaced attribute                 |           0 |                        0.0 |                       0.0 |
| Missing default case statement      |           0 |                        0.0 |                       0.0 |
| Multifaceted Abstraction            |           0 |                        0.0 |                       0.0 |
| No integrity check                  |           0 |                        0.0 |                       0.0 |
| Suspicious comment                  |           0 |                        0.0 |                       0.0 |
| Too many variables                  |           0 |                        0.0 |                       0.0 |
| Unguarded variable                  |           0 |                        0.0 |                       0.0 |
| Unnecessary abstraction             |           0 |                        0.0 |                       0.0 |
| Use of HTTP without TLS             |           0 |                        0.0 |                       0.0 |
| Use of obsolete command or function |           0 |                        0.0 |                       0.0 |
| Weak Crypto Algorithm               |           0 |                        0.0 |                       0.0 |
+-------------------------------------+-------------+----------------------------+---------------------------+
| Combined                            |           0 |                        0.0 |                       0.0 |
+-------------------------------------+-------------+----------------------------+---------------------------+
+-----------------+---------------+
| Total IaC files | Lines of Code |
+-----------------+---------------+
|               1 |             7 |
+-----------------+---------------+

You see Full permission to the filesystem it should contain occurrences 1. everything shows zero. Did I miss anything? I really appreciate your help.

Multifaceted abstraction smell (Sharma et al. 2016)

This smell can be detected in two different ways for Puppet:

  1. More than one resource is defined in the declaration of a file, service or package. We are not currently able to support this because of #7.
  2. Calculation of the LCOM. LCOM is related to the intersection of parameters between components. We should be able to support this easily if we implement #6.

MD5 triggers weak crypt smell on checksums

Currently doing checksums with md5 will trigger weak crypt smell, eg:

# Docker
RUN md5sum foo.sh

In this case md5sum is being used to verify the integrity of the file and it will trigger the weak crypt smell. md5sum and other checksums commands (shasum, sha1sum, etc..) should be whitelisted.

Add versions to requirements.txt

          This is fine, but it might be a good idea to include versions (perhaps the best is to create a separate issue for that). Otherwise, we might have issues in a near future related to incompatible versions.

Originally posted by @jff in #19 (comment)

have an automated test for the oracles

Describe the solution you'd like
It would be nice to have an automated test that checks if the number of true/false positives and true/false negatives remains the same for the oracle datasets used in GLITCH's studies.

support node management

Describe the solution you'd like
It would be interesting to have support in the intermediate representation for the management of nodes, i.e., for instance the inventory in Ansible and the node construct in Puppet.

add black to CI

Describe the solution you'd like
It would be nice to have black in the CI. This would enforce the usage of black.

Parse values in the intermediate representation

We should parse the values in the intermediate representation allowing to differ, for instance, types (booleans, strings, numbers...) and expressions (ands, ors ...). This parsing would allow to define other type of smells in a more accurate way. For instance, imagine the smell "Hard-coded secret". We have something like: $test | "hello". Although a variable is present, there is still a chance that the secret is hard-coded.

simplify CLI options

Describe the solution you'd like
Right now some CLI options are not very clear. For instance, the --includeall and --dataset are not very clear and should be replaced with simpler options or even removed. The --linterand --csv option could also be replaced with a format option.

I am unable to install this in my Mac having Python3 installed

  1. git clone [email protected]:sr-lab/GLITCH.git
  2. cd GLITCH
  3. python -m pip install -e .

Successfully built glitch
Installing collected packages: glitch
Attempting uninstall: glitch
Found existing installation: glitch 1.0.1
Uninstalling glitch-1.0.1:
Successfully uninstalled glitch-1.0.1
Successfully installed glitch-1.0.1

Then running the glitch command shows error as of below:
abdul@Abduls-MacBook-Pro GLITCH % glitch --help
Traceback (most recent call last):
File "/Users/abdul/Library/Caches/pypoetry/virtualenvs/glitch-5uPBzSuZ-py3.12/bin/glitch", line 5, in
from glitch.main import main
File "/Users/abdul/projects/uwf/GLITCH/glitch/main.py", line 15, in
from glitch.parsers.chef import ChefParser
File "/Users/abdul/projects/uwf/GLITCH/glitch/parsers/chef.py", line 8, in
from pkg_resources import resource_filename
ModuleNotFoundError: No module named 'pkg_resources'

If anybody please helps?

Evaluation status

Puppet

  • Collect dataset (Rahman)
  • Compute and validate dataset statistics
  • Collect Oracle (Our own)
  • Compute and validate oracle statistics
  • Run SLIC: collect results and execution time
    • Oracle
    • Compute precision and accuracy
    • Rahman's dataset (7 smells) 🎯
  • Run GLITCH: collect results and execution time
    • Oracle
    • Compute precision and accuracy
    • Rahman's dataset (7 smells) 🎯
    • Rahman's dataset (all of GLITCH's supported smells) 🎯

Ansible

  • Collect dataset (our own, following same criteria as Rahman and Rahman Openstack Dataset)
  • Compute and validate dataset statistics
  • Collect Oracle (Rahman)
  • Compute and validate oracle statistics
  • Review Oracle and add code line to each smell
  • Run SLAC: collect results and execution time
    • Oracle 🎯
    • Compute precision and accuracy 🎯
    • Rahman's dataset (8 smells)
  • Run GLITCH: collect results and execution time
    • Oracle 🎯
    • Compute precision and accuracy 🎯
    • Rahman's dataset (8 smells)
    • Rahman's dataset (all of GLITCH's supported smells)

Chef

  • Collect dataset (our own, following same criteria as Rahman)
  • Compute dataset statistics
  • Collect Oracle (Our own)
  • Compute and validate oracle statistics
  • Run SLAC: collect results and execution time
    • Oracle
    • Compute precision and accuracy
    • Run on dataset (9 smells) 🎯
  • Run GLITCH: collect results and execution time
    • Oracle
    • Compute precision and accuracy
    • Run on dataset (9 smells) 🎯
    • Run on dataset (all of GLITCH's supported smells; same as above?) 🎯

Long statement detected on 140 characters

Describe the bug
Long statement is being detected when we have 140 characters + '\n'.

To Reproduce
Run GLITCH on script with a line with 140 characters + '\n'.

Expected behavior
It shouldn't detect the smell

refactor the Docker parser

Describe the enhancement
Currently, the Docker parser has some problems such as:

  • command parsing
  • multi-line commands

The translation of a Dockerfile to our intermediate representation is also not very intuitive. It would be better to invest some time in identifying a new mapping between Dockerfiles and the IR.

condition statement and conditions should have different representations

Is your feature request related to a problem? Please describe.
Right now the condition statement and its conditions are represented with the same construct ConditionStatement. However, this doesn't allow the distinction between them and sometimes the conditions are used as being the condition statement itself.
For instance:

$php_prefix = $::osfamily ? {
    'debian' => 'php5-',
    'redhat' => 'php-',
}

Only has a ConditionStatement for the first condition and one for the second condition, but it doesn't have a construct for the actual switch statement.

Describe the solution you'd like
We should create a new construct either for the conditions or the switch/if statements.

change attributes of UnitBlock to be more object-oriented

Describe the solution you'd like
Right now, the UnitBlock has an attribute for each type of element. However, this does not scale well, does not adhere to good practices of object-oriented programming and it is not intuitive when generic statements are in the mix. For instance, let's imagine a conditional statement has a atomic unit in its blocks. Should the atomic unit also be added to the atomic_units attribute? It doesn't make sense.

Describe alternatives you've considered
The UnitBlock should have a single attribute for statements.

Improve print representation of hierarchical attributes

The print should be recursive and follow the same patterns as the other components, otherwise it becomes hard to understand what is going on.

Instead of this:

roles[0]->None attributes: [name:'Install dependencies']

We want something like this:

roles[0]->None:
  attributes:
    name->Install dependencies

Ansible script type

It would be interesting if there was a better way to define the type of an Ansible script (vars, tasks or script)

refactor Ansible parser to use the ansible package

Is your feature request related to a problem? Please describe.
Currently GLITCH does not support attributes defined as in the example below (aka Ansible-specific syntax):

- name: Create web root
  file: path="{{ www_root }}"
        owner="{{ web_user }}"
        group="{{ web_group }}"
        mode=0755
        state=directory
  with_dict: sites

This is mentioned in the work by Opdebeeck et al. (2023).

Describe the solution you'd like
We shoud use the ansible-core package instead of the yaml package..

missing integrity check on values with spaces

Describe the bug
The regex for the missing integrity check isn't triggered on values as such:
https://storage.googleapis.com/cri-containerd-release/cri-containerd-{{ containerd_version }}.linux-amd64.tar.gz
This happens because of the space before and after the variable.

To Reproduce
Run GLITCH on this script:
https://github.com/starlingx/ansible-playbooks/blob/7983841637966089106bb80f28d7b701ec6b6323/playbookconfig/src/playbooks/roles/provision-edgeworker/prepare-edgeworker/kubernetes/tasks/install-ubuntu-packages.yml#L31

Expected behavior
Detecting a Missing integrity check smell.

hierarchy of configuration files

Describe the solution you'd like
Configuration files that do not have a certain key should use the value in the default.ini file. This is useful, for instance, to specify a Terraform-specific configuration with only the necessary keys.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.