sr-lab / glitch Goto Github PK

GLITCH is a technology-agnostic framework that enables automated detection of code smells in Infrastructure-as-Code scripts.

License: GNU General Public License v3.0

Python 73.67% TypeScript 0.69% Ruby 0.94% Puppet 1.32% HCL 23.12% Dockerfile 0.27%

ansible chef iac puppet smell-detector linter

glitch's Introduction

GLITCH

GLITCH is a technology-agnostic framework that enables automated detection of IaC smells. GLITCH allows polyglot smell detection by transforming IaC scripts into an intermediate representation, on which different smell detectors can be defined. GLITCH currently supports the detection of nine different security smells [1, 2] and nine design & implementation smells [3] in scripts written in Puppet, Ansible, or Chef.

Paper and Academic Usage

"GLITCH: Automated Polyglot Security Smell Detection in Infrastructure as Code" is the main paper that describes the implementation of security smells in GLITCH. It also presents a large-scale empirical study that analyzes security smells on three large datasets containing 196,755 IaC scripts and 12,281,251 LOC.

If you use GLITCH or any of its datasets, please cite:

Nuno Saavedra and João F. Ferreira. 2022. GLITCH: Automated Polyglot Security Smell Detection in Infrastructure as Code. In 37th IEEE/ACM International Conference on Automated Software Engineering (ASE ’22), October 10–14, 2022, Rochester, MI, USA. ACM, New York NY, USA, 12 pages. https://doi.org/10.1145/3551349.3556945

@inproceedings{saavedraferreira22glitch,
 title={{GLITCH}: Automated Polyglot Security Smell Detection in Infrastructure as Code},
 author={Saavedra, Nuno and Ferreira, Jo{\~a}o F},
 booktitle={Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering},
 year={2022}
}

Nuno Saavedra, João Gonçalves, Miguel Henriques, João F. Ferreira, Alexandra Mendes. 2023. Polyglot Code Smell Detection for Infrastructure as Code with GLITCH. In 38th IEEE/ACM International Conference on Automated Software Engineering (ASE '23), September 11-15, 2023, Luxembourg. https://doi.org/10.1109/ASE56229.2023.00162

@inproceedings{saavedra23glitchdemo,
  author={Saavedra, Nuno and Gonçalves, João and Henriques, Miguel and Ferreira, João F. and Mendes, Alexandra},
  booktitle={2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE)}, 
  title={Polyglot Code Smell Detection for Infrastructure as Code with GLITCH}, 
  year={2023},
  pages={2042-2045},
  doi={10.1109/ASE56229.2023.00162}
}

Installation

To install run:

python -m pip install -e .

To use the tool for Chef you also need Ruby and its Ripper package installed.

Poetry

To install GLITCH using Poetry, run:

poetry install

WARNING: For now, the GLITCH VSCode extension does not function if GLITCH is installed via Poetry. Since Poetry uses virtual environments it does not create a binary for GLITCH available in the user's PATH, which is required for the VSCode extension.

Usage

To explore all available options, use the command:

glitch --help

To analyze a file or folder and retrieve CSV results, use the following command:

glitch --tech (chef|puppet|ansible|terraform) --csv --config PATH_TO_CONFIG PATH_TO_FILE_OR_FOLDER

If you want to consider the module structure you can add the flag --module.

Poetry

If GLITCH was installed using Poetry, execute GLITCH commands as follows:

poetry run glitch --help

Alternatively, you can use poetry shell:

poetry shell
glitch --help

Tests

To run the tests for GLITCH go to the folder glitch and run:

python -m unittest discover tests

Configs

New configs can be created with the same structure as the ones found in the folder configs.

Documentation

More information can be found in GLITCH's documentation.

VSCode extension

GLITCH has a Visual Studio Code extension which is available here.

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

License

GPL-3.0

References

_{[1] Rahman, A., Parnin, C., & Williams, L. (2019, May). The seven sins: Security smells in infrastructure as code scripts. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE) (pp. 164-175). IEEE.}

_{[2] Rahman, A., Rahman, M. R., Parnin, C., & Williams, L. (2021). Security smells in ansible and chef scripts: A replication study. ACM Transactions on Software Engineering and Methodology (TOSEM), 30(1), 1-31.}

_{[3] Schwarz, J., Steffens, A., & Lichter, H. (2018, September). Code smells in infrastructure as code. In 2018 11th International Conference on the Quality of Information and Communications Technology (QUATIC) (pp. 220-228). IEEE.}

glitch's People

Contributors

Stargazers

Watchers

Forkers

joaotgoncalves seelder yikun-li ashrick12

glitch's Issues

Output seems not correct for a simple yaml file

Here is my yaml file named test.yaml as below:

name: create an app with full permission

file:
  path: /app
  owner: foo
  group: foo
  mode: "777"

And I ran the command to generate the report this way glitch --tech ansible --csv test.yaml

The output below seems not correct:

+-------------------------------------+-------------+----------------------------+---------------------------+
| Smell                               | Occurrences | Smell density (Smell/KLoC) | Proportion of scripts (%) |
+-------------------------------------+-------------+----------------------------+---------------------------+
| Admin by default                    |           0 |                        0.0 |                       0.0 |
| Avoid comments                      |           0 |                        0.0 |                       0.0 |
| Duplicate block                     |           0 |                        0.0 |                       0.0 |
| Empty password                      |           0 |                        0.0 |                       0.0 |
| Full permission to the filesystem   |           0 |                        0.0 |                       0.0 |
| Hard-coded password                 |           0 |                        0.0 |                       0.0 |
| Hard-coded secret                   |           0 |                        0.0 |                       0.0 |
| Hard-coded user                     |           0 |                        0.0 |                       0.0 |
| Imperative abstraction              |           0 |                        0.0 |                       0.0 |
| Improper alignment                  |           0 |                        0.0 |                       0.0 |
| Invalid IP address binding          |           0 |                        0.0 |                       0.0 |
| Long Resource                       |           0 |                        0.0 |                       0.0 |
| Long statement                      |           0 |                        0.0 |                       0.0 |
| Misplaced attribute                 |           0 |                        0.0 |                       0.0 |
| Missing default case statement      |           0 |                        0.0 |                       0.0 |
| Multifaceted Abstraction            |           0 |                        0.0 |                       0.0 |
| No integrity check                  |           0 |                        0.0 |                       0.0 |
| Suspicious comment                  |           0 |                        0.0 |                       0.0 |
| Too many variables                  |           0 |                        0.0 |                       0.0 |
| Unguarded variable                  |           0 |                        0.0 |                       0.0 |
| Unnecessary abstraction             |           0 |                        0.0 |                       0.0 |
| Use of HTTP without TLS             |           0 |                        0.0 |                       0.0 |
| Use of obsolete command or function |           0 |                        0.0 |                       0.0 |
| Weak Crypto Algorithm               |           0 |                        0.0 |                       0.0 |
+-------------------------------------+-------------+----------------------------+---------------------------+
| Combined                            |           0 |                        0.0 |                       0.0 |
+-------------------------------------+-------------+----------------------------+---------------------------+
+-----------------+---------------+
| Total IaC files | Lines of Code |
+-----------------+---------------+
|               1 |             7 |
+-----------------+---------------+

You see Full permission to the filesystem it should contain occurrences 1. everything shows zero. Did I miss anything? I really appreciate your help.

Multifaceted abstraction smell (Sharma et al. 2016)

This smell can be detected in two different ways for Puppet:

More than one resource is defined in the declaration of a file, service or package. We are not currently able to support this because of #7.
Calculation of the LCOM. LCOM is related to the intersection of parameters between components. We should be able to support this easily if we implement #6.

change the name of ConditionStatement to ConditionalStatement

MD5 triggers weak crypt smell on checksums

Currently doing checksums with md5 will trigger weak crypt smell, eg:

# Docker
RUN md5sum foo.sh

In this case md5sum is being used to verify the integrity of the file and it will trigger the weak crypt smell. md5sum and other checksums commands (shasum, sha1sum, etc..) should be whitelisted.

Add versions to requirements.txt

          This is fine, but it might be a good idea to include versions (perhaps the best is to create a separate issue for that). Otherwise, we might have issues in a near future related to incompatible versions.

Originally posted by @jff in #19 (comment)

have an automated test for the oracles

Describe the solution you'd like
It would be nice to have an automated test that checks if the number of true/false positives and true/false negatives remains the same for the oracle datasets used in GLITCH's studies.

Add docstrings and check docstrings in CI

Describe the solution you'd like
Add docstrings to all public methods and add a check in CI for this.

support node management

Describe the solution you'd like
It would be interesting to have support in the intermediate representation for the management of nodes, i.e., for instance the inventory in Ansible and the node construct in Puppet.

Fix vscode extension for Ansible

Since autodetect was removed, the extension has to be updated.

add black to CI

Describe the solution you'd like
It would be nice to have black in the CI. This would enforce the usage of black.

Improper alignment (Sharma et al. 2016)

We ignored the point "Right-to-left chaining arrows should not be used", since we are not able to support this Puppet operation yet.

tests are creating a Dockerfile that is not deleted

Describe the bug
The tests are creating a Dockerfile that is not deleted

Expected behavior
The file is deleted.

Parse values in the intermediate representation

We should parse the values in the intermediate representation allowing to differ, for instance, types (booleans, strings, numbers...) and expressions (ands, ors ...). This parsing would allow to define other type of smells in a more accurate way. For instance, imagine the smell "Hard-coded secret". We have something like: $test | "hello". Although a variable is present, there is still a chance that the secret is hard-coded.

Package click - problem with version 8

AttributeError: 'str' object has no attribute '__name__'

simplify CLI options

Describe the solution you'd like
Right now some CLI options are not very clear. For instance, the --includeall and --dataset are not very clear and should be replaced with simpler options or even removed. The --linterand --csv option could also be replaced with a format option.

I am unable to install this in my Mac having Python3 installed

git clone [email protected]:sr-lab/GLITCH.git
cd GLITCH
python -m pip install -e .

Successfully built glitch
Installing collected packages: glitch
Attempting uninstall: glitch
Found existing installation: glitch 1.0.1
Uninstalling glitch-1.0.1:
Successfully uninstalled glitch-1.0.1
Successfully installed glitch-1.0.1

Then running the glitch command shows error as of below:
abdul@Abduls-MacBook-Pro GLITCH % glitch --help
Traceback (most recent call last):
File "/Users/abdul/Library/Caches/pypoetry/virtualenvs/glitch-5uPBzSuZ-py3.12/bin/glitch", line 5, in
from glitch.main import main
File "/Users/abdul/projects/uwf/GLITCH/glitch/main.py", line 15, in
from glitch.parsers.chef import ChefParser
File "/Users/abdul/projects/uwf/GLITCH/glitch/parsers/chef.py", line 8, in
from pkg_resources import resource_filename
ModuleNotFoundError: No module named 'pkg_resources'

If anybody please helps?

add setuptools to requirements

Describe the bug
setuptools is required since we use the module pkg_resources

IR - Handling hashes in attributes and variables of Chef and Puppet

Following the discussion here, we should decide if hashes are handled as hierarchical attributes and variables in Chef and Puppet, or should be handled as regular values.

(#17)

Create generic blocks

(or we should adapt the current ones)
Some technologies have blocks of structures that do not make sense in every technology. For instance, Puppet allows to group resources inside the same declaration

Evaluation status

Puppet

Ansible

Chef

Long statement detected on 140 characters

Describe the bug
Long statement is being detected when we have 140 characters + '\n'.

To Reproduce
Run GLITCH on script with a line with 140 characters + '\n'.

Expected behavior
It shouldn't detect the smell

refactor the Docker parser

Describe the enhancement
Currently, the Docker parser has some problems such as:

command parsing
multi-line commands

The translation of a Dockerfile to our intermediate representation is also not very intuitive. It would be better to invest some time in identifying a new mapping between Dockerfiles and the IR.

Consider urls included in value instead of only the url itself (Integrity check smell)

allow to print the Intermediate Representation from the CLI

Describe the solution you'd like
The CLI should have a command that allows to print the intermediate representation for a certain script.

condition statement and conditions should have different representations

Is your feature request related to a problem? Please describe.
Right now the condition statement and its conditions are represented with the same construct ConditionStatement. However, this doesn't allow the distinction between them and sometimes the conditions are used as being the condition statement itself.
For instance:

$php_prefix = $::osfamily ? {
    'debian' => 'php5-',
    'redhat' => 'php-',
}

Only has a ConditionStatement for the first condition and one for the second condition, but it doesn't have a construct for the actual switch statement.

Describe the solution you'd like
We should create a new construct either for the conditions or the switch/if statements.

change attributes of UnitBlock to be more object-oriented

Describe the solution you'd like
Right now, the UnitBlock has an attribute for each type of element. However, this does not scale well, does not adhere to good practices of object-oriented programming and it is not intuitive when generic statements are in the mix. For instance, let's imagine a conditional statement has a atomic unit in its blocks. Should the atomic unit also be added to the atomic_units attribute? It doesn't make sense.

Describe alternatives you've considered
The UnitBlock should have a single attribute for statements.

Analyze hardcoded users, passwords and secrets

Improve print representation of hierarchical attributes

The print should be recursive and follow the same patterns as the other components, otherwise it becomes hard to understand what is going on.

Instead of this:

roles[0]->None attributes: [name:'Install dependencies']

We want something like this:

roles[0]->None:
  attributes:
    name->Install dependencies

Ansible script type

It would be interesting if there was a better way to define the type of an Ansible script (vars, tasks or script)

LaTeX tables - add combined row

Smell Too Many Variables should ignore variable files

In Ansible, Chef and Puppet, there are files that exist only to define variables. In those files, the smell should not be triggered.

Multithreaded security analysis

refactor Ansible parser to use the ansible package

Is your feature request related to a problem? Please describe.
Currently GLITCH does not support attributes defined as in the example below (aka Ansible-specific syntax):

- name: Create web root
  file: path="{{ www_root }}"
        owner="{{ web_user }}"
        group="{{ web_group }}"
        mode=0755
        state=directory
  with_dict: sites

This is mentioned in the work by Opdebeeck et al. (2023).

Describe the solution you'd like
We shoud use the ansible-core package instead of the yaml package..

missing integrity check on values with spaces

Describe the bug
The regex for the missing integrity check isn't triggered on values as such:
https://storage.googleapis.com/cri-containerd-release/cri-containerd-{{ containerd_version }}.linux-amd64.tar.gz
This happens because of the space before and after the variable.

To Reproduce
Run GLITCH on this script:
https://github.com/starlingx/ansible-playbooks/blob/7983841637966089106bb80f28d7b701ec6b6323/playbookconfig/src/playbooks/roles/provision-edgeworker/prepare-edgeworker/kubernetes/tasks/install-ubuntu-packages.yml#L31

Expected behavior
Detecting a Missing integrity check smell.

sr-lab / glitch Goto Github PK

glitch's Introduction

GLITCH

Paper and Academic Usage

Installation

Poetry

Usage

Poetry

Tests

Configs

Documentation

VSCode extension

Contributing

License

References

glitch's People

Contributors

Stargazers

Watchers

Forkers

glitch's Issues

Puppet

Ansible

Chef

Recommend Projects

Recommend Topics

Recommend Org