Coder Social home page Coder Social logo

firewatch's Introduction

firewatch

Thank you for your interest in the firewatch project! Please find some information about the project below. If you are looking for more information, please see the additional documentation below:

Features

  • Automatically creates Jira issues for failed OpenShift CI jobs.
    • The issues created are provided with any logs or junit files found in a failed step.
    • The issues can be created in any Jira instance or project (provided the proper credentials are supplied).
    • If multiple issues are created for a run, the issues will all be related to each other to help with issue.
    • The issues created have a "classification" section that is defined by the user in their firewatch config.
      • This can be used as a "best-guess" for what may have gone wrong during the job's run.
    • Firewatch will search for any past issues created by a specific step that has failed in the same way (pod failure or test failure) and list the 10 most recent ones.
      • This is meant to help the engineer working on the bug to find a solution.
      • If verbose test failure reporting is enabled, this search is refined further to only search for issues with the same test failure.
    • Firewatch searches for duplicate issues and makes a comment on the issues rather than filing a second issue.
      • This is done using the labels on issues created by firewatch. These labels should consist of the failure type, failed step, and the job name.
      • If the new failures matches the failure type, failed step, and job name; firewatch will make a comment notifying the user that this failure likely happened again.
      • If verbose test failure reporting is enabled, this search is refined further to only search for issues with the same test failure.
    • If no failures are found, firewatch will search for any open issues on the Jira server provided and add a comment to the issue mentioning that the job has passed since that issue was filed.
      • This functionality also uses the labels on issues created by firewatch.
      • Note: If you get a notification on an issue, but would like to continue working on the issue without getting notifications, add the ignore-passing-notification label to the issue.
    • Firewatch can optionally be configured to report successful runs by defining the "success_rules" section in the config.
      • For each rule in this section, a Jira story will be created (with status closed) reporting the job success.
    • Firewatch can optionally be configured to report test failures in a more verbose way (verbose test failure reporting).
      • When configured to do this, firewatch will report on EVERY test failure in all JUnit files. The issues created from this will specify the failed test name in the title and description.
      • This functionality has the potential to create A LOT of tickets if cascading failures occur. Because of this, firewatch is configured by default to only report up to 10 test failures per run. This value can be overridden, but do so with caution.

Getting Started

Jira User Permissions

Firewatch can be used with any user in a Jira instance, but that user will need to have proper permissions in the project they are reporting to. The user should be able to:

  • Create issues
  • Add comments to issues
  • Add attachments to issues
  • Edit issues
  • Transition issues (this only happens when a "success" issue is created, then immediately closed)

If you are using firewatch in the Red Hat Jira instance, the default user is firewatch-tool.

If you are encountering permissions issues, please add the user to the project you are reporting to under the role you would like to choose. Typically, if you add the user in the Developer role, the tool will work as expected.

Usage in OpenShift CI

Reporting issues using this tool in OpenShift CI is very simple, you can do one of the following:

Remember, when you are using the firewatch-report-issues ref, some variables need to be defined in your configuration file:

  • FIREWATCH_CONFIG [REQUIRED]

    • This value should be a list of rules you have defined for firewatch to report on.

    IMPORTANT:

    For more information how to define these rules, please see the configuration guide.

    • Example:

      FIREWATCH_CONFIG: |
        {
          "failure_rules":
            [
                {"step": "exact-step-name", "failure_type": "pod_failure", "classification": "Infrastructure", "jira_project": "!default", "jira_component": ["some-component"], "jira_assignee": "[email protected]", "jira_security_level": "Restricted"},
                {"step": "*partial-name*", "failure_type": "all", "classification":  "Misc.", "jira_project": "OTHER", "jira_component": ["component-1", "component-2", "!default"], "jira_priority": "major", "group": {"name": "some-group", "priority": 1}},
                {"step": "*ends-with-this", "failure_type": "test_failure", "classification": "Test failures", "jira_epic": "!default", "jira_additional_labels": ["test-label-1", "test-label-2", "!default"], "group": {"name": "some-group", "priority": 2}},
                {"step": "*ignore*", "failure_type": "test_failure", "classification": "NONE", "jira_project": "NONE", "ignore": "true"},
                {"step": "affects-version", "failure_type": "all", "classification": "Affects Version", "jira_project": "TEST", "jira_epic": "!default", "jira_affects_version": "4.14", "jira_assignee": "!default"}
            ]
      
          # OPTIONAL
          "success_rules":
            [
              {"jira_project": "PROJECT", "jira_epic": "PROJECT-123", "jira_component": ["some-component"], "jira_affects_version": "!default", "jira_assignee": "[email protected]", "jira_priority": "major", "jira_security_level": "Restricted", "jira_additional_labels": ["test-label-1", "test-label-2", "!default"]},
              {"jira_project": "!default"},
            ]
        }
  • FIREWATCH_DEFAULT_JIRA_PROJECT [REQUIRED]

    • The default Jira project to report issues to.
  • FIREWATCH_JIRA_SERVER

    • The Jira server to report to.
    • DEFAULT: https://issues.stage.redhat.com
  • FIREWATCH_JIRA_API_TOKEN_PATH

    • The path to the file holding the Jira API token.
    • DEFAULT: /tmp/secrets/jira/access_token.
  • FIREWATCH_FAIL_WITH_TEST_FAILURES

    • A variable that will determine if the firewatch-report-issues step will fail with a non-zero exit code if a test failure is found in a JUnit file.
    • DEFAULT: "false"
    • BEHAVIOR:
      • "false": The firewatch-report-issues step will not fail with a non-zero exit code when test failures are found.
      • "true": The firewatch-report-issues step will fail with a non-zero exit code when test failures are found.
  • FIREWATCH_VERBOSE_TEST_FAILURE_REPORTING

    • A variable that will determine if firewatch will report on every test failure in all JUnit files (up to the limit defined in $FIREWATCH_VERBOSE_TEST_FAILURE_REPORTING_LIMIT).
    • DEFAULT: "false"
    • BEHAVIOR:
      • "false": Firewatch will only report on the first test failure found in a JUnit file.
      • "true": Firewatch will report on every test failure found in a JUnit file.
  • FIREWATCH_VERBOSE_TEST_FAILURE_REPORTING_LIMIT

    • The limit of test failures to report on when verbose test failure reporting is enabled.
    • DEFAULT: 10
    • BEHAVIOR:
      • If verbose test failure reporting is enabled, firewatch will only report on the first $FIREWATCH_VERBOSE_TEST_FAILURE_REPORTING_LIMIT test failures found in a JUnit file.

OPTIONAL DEFAULT VARIABLES:

  • FIREWATCH_DEFAULT_JIRA_EPIC [OPTIONAL]
    • The default Jira epic to report issues to where the "jira_epic" value is set to "!default".
  • FIREWATCH_DEFAULT_JIRA_COMPONENT [OPTIONAL]
    • The list of default Jira components that issues will be reported under where the "jira_component" list contains "!default".
    • For example:
      • IF $FIREWATCH_DEFAULT_JIRA_COMPONENT = ["default-1", "default-2"]
      • AND "jira_component": ["component-1", "!default"]
      • THEN when an issue is created under the rule containing the "jira_component" rule above, the components will be set to ["component-1", "default-1", "default-2"].
  • FIREWATCH_DEFAULT_JIRA_AFFECTS_VERSION [OPTIONAL]
    • The default value for "Affects Version" in Jira issues where the "jira_affects_version" value is set to "!default".
  • FIREWATCH_DEFAULT_JIRA_ADDITIONAL_LABELS [OPTIONAL]
    • The list of default Jira labels to be applied to issues where the "jira_additional_labels" list contains "!default".
    • For example:
      • IF $FIREWATCH_DEFAULT_JIRA_ADDITIONAL_LABELS = ["default-1", "default-2"]
      • AND "jira_additional_labels": ["label-1", "!default"]
      • THEN when an issue is created under the rule containing the "jira_additional_labels" rule above, the labels will be set to ["label-1", "default-1", "default-2"].
  • FIREWATCH_DEFAULT_JIRA_ASSIGNEE [OPTIONAL]
    • The default value for the assignee of issues where the "jira_assignee" value is set to "!default".
  • FIREWATCH_DEFAULT_JIRA_PRIORITY [OPTIONAL]
    • The default value for the priority of issues where the "jira_priority" value is set to "!default".
  • FIREWATCH_DEFAULT_JIRA_SECURITY_LEVEL [OPTIONAL]
    • The default value for the security level of issues where the "jira_security_level" value is set to "!default".

Local Usage

If you'd like to run firewatch locally, use the following instructions:

Docker (recommended)

  1. Ensure you have Docker installed on your system.
  2. Clone the repository: git clone https://github.com/CSPI-QE/firewatch.git.
  3. Navigate to the project root in your terminal: cd firewatch.
  4. Run the following to build and run a Docker container with firewatch installed: make build-run.
  5. Use the firewatch command to execute the tool. See the CLI usage guide for instructions on using the tool.

Local Machine (using venv)

  1. Clone the repository: git clone https://github.com/CSPI-QE/firewatch.git
  2. Navigate to the project root: cd firewatch
  3. Install the necessary dependencies: make dev-environment
  4. Use the firewatch command to execute the tool. See the CLI usage guide for instructions on using the tool.

Contributing

We welcome contributions to firewatch! If you'd like to contribute, please review our Contribution Guidelines for more information on how to get started.

License

firewatch is released under the GNU Public License.

Contact

If you have any questions, suggestions, or feedback, feel free to reach out to us:

We appreciate your interest in firewatch!

firewatch's People

Contributors

amp-rh avatar ascerra avatar calebevans avatar dependabot[bot] avatar dfrazzette avatar madunn avatar myakove avatar oharan2 avatar pre-commit-ci[bot] avatar redhat-qe-bot avatar rnetser avatar rujutashinde avatar smatula avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

firewatch's Issues

Failed job with multiple firewatch tickets does not create the tickets in chronological order.

When we have a job that fails in multiple steps and leads to the creation of multiple firewatch tickets, the tickets do not get created in the order that the failed steps were run.

For example consider these 3 tickets created from a single failed ACM run
LPINTEROP-3777
LPINTEROP-3778
LPINTEROP-3779

Mapping the step captured within the ticket and the order of the failed steps that were run the tickets look like this
LPINTEROP-3779
LPINTEROP-3777
LPINTEROP-3778
*Meaning that 3779's failure actually occurred before 3777 and 3778 but since the numbers aren't in order when a watcher goes to debug this they might make an incorrect assumption about the order the steps were run.

If we could make the firewatch tickets in order of the steps that failed that would help watchers understand these situations a bit easier.

Side note:
This particular scenario is being improved here to prevent cascading failures, but we could see this type of example in other scenarios as well.

[Feature Request] Allow Security Level to be set on firewatch created tickets

Description: When a ticket is created in a Jira project that has different Security levels we should be able to configure firewatch to set the security level on ticket creation.

Reason: This feature would be in preparation for filing security issues against OpenShift CI runs. If we ever start scraping the OpenShift CI logs for security vulnerabilities then we can use firewatch to create a ticket to document the problem. Though we can't document security risks publicly so this feature would need to exist first.

Priority: LOW (if we ever create a security tool for OpenShift CI logs this will become more important.)

[Maintenance] Make the release v2 changes available in OpenShift CI

What needs to be done:

  • Make sure the image will be available
  • Update the firewatch report issues ref to use the new arguments (replaced underscores with hyphens)
  • Add a new boolean environment variable for the --report-success command
  • Add new environment variables for the "default" values
  • Fix any config syntax issues that will be introduced with these changes

[Feature Request] Close or update validation tracker task

This is potentially a one-off use-case, but I thought I'd share.

In the OpenShift 4.Next working group, we have a Jira tracker for collecting feedback on an Engineering Candidate (EC) build. It seems like there is a possibility of triggering tests based on an event, like the publishing of a new Engineering Candidate build.

We currently keep a tracker for the different feedback we'd like to collect, e.g. https://issues.redhat.com/browse/FDN-403

The idea here is that firewatch tasks could look up the active tracker somehow, and close out the relevant subtask if that tests passes, and leave a comment if the tests fail. That way, it's easy to look at the feedback tracker, of which some are automated and some are manual, and see a overall picture of the feedback from the EC.

For example, there might be a Managed Service feedback and an InterOp feedback and a Perf&Scale subtask. Maybe the first two are automated via tests and Firewatch, and the Perf&Scale still requires a manual update.

As a person contributing to and managing OpenShift Engineering Candidate builds
I want to automatically close feedback substasks when associated tests pass
So that I can understand the feedback for the EC with less overhead

[Feature Request] Add optional automatic re-tries

Israel Pinto has requested that the tool have an optional field for each rule that defines if a failure that matches a rule should be re-triggered. This rule should also come with a limiting value to avoid unlimited retires within 24 hours.

[Feature Request] Move common values out of rules

I see there are lot common values in all steps such as jira_project, jira_assignee, jira_priority , labels which have same values. It would be nice to have all shared same values outside of the rules, like in some global variables under FIREWATCH_CONFIG
This would be avoid repetition and make rules shorter to manage.

[Change Request] Force the `jira_component` Optional Config Item to be a List of String Rather than a List or a String

We would prefer that the jira_component optional config item be forced to be a list of strings (e.g. "jira_components": ["comp1", "comp2"]). Currently we accept a list of strings or just a string which is confusing. This change should not go into version 1 of this tool as it will break any jobs currently using the current version of the option with just a string.

Originated from this comment.

[Feature Request] Introduce new label to add to Jira tickets when a job run passes for a previously failing job

  • Stale tickets that can be marked as "passing" are difficult to track.
  • Firewatch continues to add "passing" comments to these stale tickets, but manual approval is required to change the status of the tickets.
  • We could introduce a new Jira label to be used for previously failing jobs which now have a passing job run. This would make it simpler to identify and filter for these tickets so that they can move into the next stage of the workflow.

[Enhancement] Make logging and documentation around Jira permission errors more user-friendly

We noticed in this execution of firewatch that the error displayed for permission errors isn't very explanatory. I would like to add a try/except block around any interaction with Jira that looks for that specific error, so we can give more context in the logs and in documentation.

I would also like to add some documentation for the permissions required for the bot account used (firewatch-tool by default) so that external teams can add those permissions when required.

job directory should be deleted at the end of the flow

job dir is not deleted; running report twice - second run will fail

2023-10-24T13:12:26.477078 cli.objects.jira_base INFO Jira authentication successful...
2023-10-24T13:12:26.749765 cli.objects.job INFO Downloading log files...
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/rnetser/.virtualenvs/firewatch/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rnetser/.virtualenvs/firewatch/lib/python3.11/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/home/rnetser/.virtualenvs/firewatch/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rnetser/.virtualenvs/firewatch/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rnetser/.virtualenvs/firewatch/lib/python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rnetser/.virtualenvs/firewatch/lib/python3.11/site-packages/click/decorators.py", line 33, in new_func
    return f(get_current_context(), *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rnetser/git/firewatch/cli/report/__init__.py", line 90, in report
    job = Job(
          ^^^^
  File "/home/rnetser/git/firewatch/cli/objects/job.py", line 74, in __init__
    self.logs_dir = self._download_logs(
                    ^^^^^^^^^^^^^^^^^^^^
  File "/home/rnetser/git/firewatch/cli/objects/job.py", line 232, in _download_logs
    with open(file, "xb") as target:
         ^^^^^^^^^^^^^^^^
FileExistsError: [Errno 17] File exists: '/tmp/1652281746718199808/logs/addon-install/build-log.txt'

[Change Request] Update Python Version to 3.12

Python 3.12 was released at the beginning of October 2023. I would like to test the code with 3.12 to verify it still works as expected, if it does, I would like to update the Python version to 3.12 in the Dockerfile.

[Feature Request] Add a list of past tickets for a given step/failure type in the description of new bugs

As a way to provide more information for an engineer to solve a bug, I would like to provide a list of past bugs for step/failure types to the description of a bug. The idea behind this being that there are only so many ways a step can fail, considering a step is typically just a bash script. The one exception to this could be test_failure types, but even then this functionality can help an engineer solve an issue or know who to contact to help solve an issue.

Here is an example:

Say step some-important-step encountered a pod_failure in the job you are executing (any step can be used in any job, technically). Something like the following would be put at the bottom of the description:


Here are the 10 most recent bugs associated with this step:

Bug Date Created Assignee
SOMEBUG-110 10/01/2023 Assignee Name
SOMEBUG-109 09/14/2023 Assignee Name
SOMEBUG-108 07/04/2023 Assignee Name
SOMEBUG-107 06/19/2023 Assignee Name
SOMEBUG-106 05/01/2023 Assignee Name
SOMEBUG-105 01/03/2023 Assignee Name
SOMEBUG-104 12/24/2022 Assignee Name
SOMEBUG-103 09/01/2022 Assignee Name
SOMEBUG-102 03/17/2022 Assignee Name
SOMEBUG-101 02/06/2022 Assignee Name

Dependency Dashboard

This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.

This repository currently has no open or pending branches.

Detected dependencies

pep621
pyproject.toml
  • poetry-core >=1.0.0
pip_requirements
tests/requirements.txt
poetry
pyproject.toml
  • python ^3.12
  • click ^8.1.7
  • google-cloud-storage ^2.10.0
  • jinja2 ^3.1.2
  • jira ^3.5.2
  • junitparser ^3.1.0
  • python-simple-logger ^1.0.6
  • google-cloud-storage ^2.10.0
  • pytest ^7.4.0
  • pytest-cov ^4.1.0
  • pytest-mock ^3.11.1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.