Coder Social home page Coder Social logo

sr-lab / glitch Goto Github PK

View Code? Open in Web Editor NEW
18.0 3.0 4.0 3.97 MB

GLITCH is a technology-agnostic framework that enables automated detection of code smells in Infrastructure-as-Code scripts.

License: GNU General Public License v3.0

Python 73.67% TypeScript 0.69% Ruby 0.94% Puppet 1.32% HCL 23.12% Dockerfile 0.27%
ansible chef iac puppet smell-detector linter

glitch's Issues

Multifaceted abstraction smell (Sharma et al. 2016)

This smell can be detected in two different ways for Puppet:

  1. More than one resource is defined in the declaration of a file, service or package. We are not currently able to support this because of #7.
  2. Calculation of the LCOM. LCOM is related to the intersection of parameters between components. We should be able to support this easily if we implement #6.

support node management

Describe the solution you'd like
It would be interesting to have support in the intermediate representation for the management of nodes, i.e., for instance the inventory in Ansible and the node construct in Puppet.

missing integrity check on values with spaces

Describe the bug
The regex for the missing integrity check isn't triggered on values as such:
https://storage.googleapis.com/cri-containerd-release/cri-containerd-{{ containerd_version }}.linux-amd64.tar.gz
This happens because of the space before and after the variable.

To Reproduce
Run GLITCH on this script:
https://github.com/starlingx/ansible-playbooks/blob/7983841637966089106bb80f28d7b701ec6b6323/playbookconfig/src/playbooks/roles/provision-edgeworker/prepare-edgeworker/kubernetes/tasks/install-ubuntu-packages.yml#L31

Expected behavior
Detecting a Missing integrity check smell.

Ansible script type

It would be interesting if there was a better way to define the type of an Ansible script (vars, tasks or script)

Parse values in the intermediate representation

We should parse the values in the intermediate representation allowing to differ, for instance, types (booleans, strings, numbers...) and expressions (ands, ors ...). This parsing would allow to define other type of smells in a more accurate way. For instance, imagine the smell "Hard-coded secret". We have something like: $test | "hello". Although a variable is present, there is still a chance that the secret is hard-coded.

Add versions to requirements.txt

          This is fine, but it might be a good idea to include versions (perhaps the best is to create a separate issue for that). Otherwise, we might have issues in a near future related to incompatible versions.

Originally posted by @jff in #19 (comment)

Evaluation status

Puppet

  • Collect dataset (Rahman)
  • Compute and validate dataset statistics
  • Collect Oracle (Our own)
  • Compute and validate oracle statistics
  • Run SLIC: collect results and execution time
    • Oracle
    • Compute precision and accuracy
    • Rahman's dataset (7 smells) ๐ŸŽฏ
  • Run GLITCH: collect results and execution time
    • Oracle
    • Compute precision and accuracy
    • Rahman's dataset (7 smells) ๐ŸŽฏ
    • Rahman's dataset (all of GLITCH's supported smells) ๐ŸŽฏ

Ansible

  • Collect dataset (our own, following same criteria as Rahman and Rahman Openstack Dataset)
  • Compute and validate dataset statistics
  • Collect Oracle (Rahman)
  • Compute and validate oracle statistics
  • Review Oracle and add code line to each smell
  • Run SLAC: collect results and execution time
    • Oracle ๐ŸŽฏ
    • Compute precision and accuracy ๐ŸŽฏ
    • Rahman's dataset (8 smells)
  • Run GLITCH: collect results and execution time
    • Oracle ๐ŸŽฏ
    • Compute precision and accuracy ๐ŸŽฏ
    • Rahman's dataset (8 smells)
    • Rahman's dataset (all of GLITCH's supported smells)

Chef

  • Collect dataset (our own, following same criteria as Rahman)
  • Compute dataset statistics
  • Collect Oracle (Our own)
  • Compute and validate oracle statistics
  • Run SLAC: collect results and execution time
    • Oracle
    • Compute precision and accuracy
    • Run on dataset (9 smells) ๐ŸŽฏ
  • Run GLITCH: collect results and execution time
    • Oracle
    • Compute precision and accuracy
    • Run on dataset (9 smells) ๐ŸŽฏ
    • Run on dataset (all of GLITCH's supported smells; same as above?) ๐ŸŽฏ

refactor Ansible parser to use the ansible package

Is your feature request related to a problem? Please describe.
Currently GLITCH does not support attributes defined as in the example below (aka Ansible-specific syntax):

- name: Create web root
  file: path="{{ www_root }}"
        owner="{{ web_user }}"
        group="{{ web_group }}"
        mode=0755
        state=directory
  with_dict: sites

This is mentioned in the work by Opdebeeck et al. (2023).

Describe the solution you'd like
We shoud use the ansible-core package instead of the yaml package..

simplify CLI options

Describe the solution you'd like
Right now some CLI options are not very clear. For instance, the --includeall and --dataset are not very clear and should be replaced with simpler options or even removed. The --linterand --csv option could also be replaced with a format option.

I am unable to install this in my Mac having Python3 installed

  1. git clone [email protected]:sr-lab/GLITCH.git
  2. cd GLITCH
  3. python -m pip install -e .

Successfully built glitch
Installing collected packages: glitch
Attempting uninstall: glitch
Found existing installation: glitch 1.0.1
Uninstalling glitch-1.0.1:
Successfully uninstalled glitch-1.0.1
Successfully installed glitch-1.0.1

Then running the glitch command shows error as of below:
abdul@Abduls-MacBook-Pro GLITCH % glitch --help
Traceback (most recent call last):
File "/Users/abdul/Library/Caches/pypoetry/virtualenvs/glitch-5uPBzSuZ-py3.12/bin/glitch", line 5, in
from glitch.main import main
File "/Users/abdul/projects/uwf/GLITCH/glitch/main.py", line 15, in
from glitch.parsers.chef import ChefParser
File "/Users/abdul/projects/uwf/GLITCH/glitch/parsers/chef.py", line 8, in
from pkg_resources import resource_filename
ModuleNotFoundError: No module named 'pkg_resources'

If anybody please helps?

Improve print representation of hierarchical attributes

The print should be recursive and follow the same patterns as the other components, otherwise it becomes hard to understand what is going on.

Instead of this:

roles[0]->None attributes: [name:'Install dependencies']

We want something like this:

roles[0]->None:
  attributes:
    name->Install dependencies

condition statement and conditions should have different representations

Is your feature request related to a problem? Please describe.
Right now the condition statement and its conditions are represented with the same construct ConditionStatement. However, this doesn't allow the distinction between them and sometimes the conditions are used as being the condition statement itself.
For instance:

$php_prefix = $::osfamily ? {
    'debian' => 'php5-',
    'redhat' => 'php-',
}

Only has a ConditionStatement for the first condition and one for the second condition, but it doesn't have a construct for the actual switch statement.

Describe the solution you'd like
We should create a new construct either for the conditions or the switch/if statements.

MD5 triggers weak crypt smell on checksums

Currently doing checksums with md5 will trigger weak crypt smell, eg:

# Docker
RUN md5sum foo.sh

In this case md5sum is being used to verify the integrity of the file and it will trigger the weak crypt smell. md5sum and other checksums commands (shasum, sha1sum, etc..) should be whitelisted.

refactor the Docker parser

Describe the enhancement
Currently, the Docker parser has some problems such as:

  • command parsing
  • multi-line commands

The translation of a Dockerfile to our intermediate representation is also not very intuitive. It would be better to invest some time in identifying a new mapping between Dockerfiles and the IR.

Long statement detected on 140 characters

Describe the bug
Long statement is being detected when we have 140 characters + '\n'.

To Reproduce
Run GLITCH on script with a line with 140 characters + '\n'.

Expected behavior
It shouldn't detect the smell

change attributes of UnitBlock to be more object-oriented

Describe the solution you'd like
Right now, the UnitBlock has an attribute for each type of element. However, this does not scale well, does not adhere to good practices of object-oriented programming and it is not intuitive when generic statements are in the mix. For instance, let's imagine a conditional statement has a atomic unit in its blocks. Should the atomic unit also be added to the atomic_units attribute? It doesn't make sense.

Describe alternatives you've considered
The UnitBlock should have a single attribute for statements.

hierarchy of configuration files

Describe the solution you'd like
Configuration files that do not have a certain key should use the value in the default.ini file. This is useful, for instance, to specify a Terraform-specific configuration with only the necessary keys.

Output seems not correct for a simple yaml file

Here is my yaml file named test.yaml as below:

name: create an app with full permission

file:
  path: /app
  owner: foo
  group: foo
  mode: "777"

And I ran the command to generate the report this way glitch --tech ansible --csv test.yaml

The output below seems not correct:

+-------------------------------------+-------------+----------------------------+---------------------------+
| Smell                               | Occurrences | Smell density (Smell/KLoC) | Proportion of scripts (%) |
+-------------------------------------+-------------+----------------------------+---------------------------+
| Admin by default                    |           0 |                        0.0 |                       0.0 |
| Avoid comments                      |           0 |                        0.0 |                       0.0 |
| Duplicate block                     |           0 |                        0.0 |                       0.0 |
| Empty password                      |           0 |                        0.0 |                       0.0 |
| Full permission to the filesystem   |           0 |                        0.0 |                       0.0 |
| Hard-coded password                 |           0 |                        0.0 |                       0.0 |
| Hard-coded secret                   |           0 |                        0.0 |                       0.0 |
| Hard-coded user                     |           0 |                        0.0 |                       0.0 |
| Imperative abstraction              |           0 |                        0.0 |                       0.0 |
| Improper alignment                  |           0 |                        0.0 |                       0.0 |
| Invalid IP address binding          |           0 |                        0.0 |                       0.0 |
| Long Resource                       |           0 |                        0.0 |                       0.0 |
| Long statement                      |           0 |                        0.0 |                       0.0 |
| Misplaced attribute                 |           0 |                        0.0 |                       0.0 |
| Missing default case statement      |           0 |                        0.0 |                       0.0 |
| Multifaceted Abstraction            |           0 |                        0.0 |                       0.0 |
| No integrity check                  |           0 |                        0.0 |                       0.0 |
| Suspicious comment                  |           0 |                        0.0 |                       0.0 |
| Too many variables                  |           0 |                        0.0 |                       0.0 |
| Unguarded variable                  |           0 |                        0.0 |                       0.0 |
| Unnecessary abstraction             |           0 |                        0.0 |                       0.0 |
| Use of HTTP without TLS             |           0 |                        0.0 |                       0.0 |
| Use of obsolete command or function |           0 |                        0.0 |                       0.0 |
| Weak Crypto Algorithm               |           0 |                        0.0 |                       0.0 |
+-------------------------------------+-------------+----------------------------+---------------------------+
| Combined                            |           0 |                        0.0 |                       0.0 |
+-------------------------------------+-------------+----------------------------+---------------------------+
+-----------------+---------------+
| Total IaC files | Lines of Code |
+-----------------+---------------+
|               1 |             7 |
+-----------------+---------------+

You see Full permission to the filesystem it should contain occurrences 1. everything shows zero. Did I miss anything? I really appreciate your help.

add black to CI

Describe the solution you'd like
It would be nice to have black in the CI. This would enforce the usage of black.

have an automated test for the oracles

Describe the solution you'd like
It would be nice to have an automated test that checks if the number of true/false positives and true/false negatives remains the same for the oracle datasets used in GLITCH's studies.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.