green-coding-solutions / green-metrics-tool Goto Github PK

Measure energy and carbon consumption of software

Home Page: https://metrics.green-coding.io

License: GNU Affero General Public License v3.0

Python 56.12% CSS 0.50% JavaScript 17.91% HTML 11.24% C 11.89% Makefile 0.15% Shell 2.11% Dockerfile 0.07%

carbon-footprint climate-change co2-emissions green-computing green-software metrics power-consumption sustainability sustainable-software green-coding

green-metrics-tool's Introduction

(This is the energy cost of running our CI-Pipelines on Github. Find out more about Eco-CI)

Introduction

The Green Metrics Tool is a developer tool indented for measuring the energy and CO2 consumption of software through a software life cycle analysis (SLCA).

Key features are:

Reproducible measurements through configuration/setup-as-code
POSIX style metric providers for many sensors (RAPL, IPMI, PSU, Docker, Temperature, CPU ...)
Low overhead
Statististical frontend with charts - DEMO
API - DEMO
Cluster setup
Free Hosted service for more precise measurements
Timeline-View: Monitor software projects over time - DEMO for Wagtail / DEMO Overview
Energy-ID Score-Cards for software (Also see below)

It is designed to re-use existing infrastructure and testing files as much as possible to be easily integrateable into every software repository and create transparency around software energy consumption.

It can orchestrate Docker containers according to a given specificaion in a usage_scenario.yml file.

These containers will be setup on the host system and the testing specification in the usage_scenario.yml will be run by sending the commands to the containers accordingly.

This repository contains the command line tools to schedule and run the measurement report as well as a web interface to view the measured metrics in some nice charts.

Frontend

To see the frontend in action and get an idea of what kind of metrics the tool can collect and display go to out Green Metrics Frontend

Documentation

To see the the documentation and how to install and use the tool please go to Green Metrics Tool Documentation

Screenshots of Single Run View

Screenshots of Comparison View

Energy-ID Scorecards

Details: Energy-ID project page

green-metrics-tool's People

Contributors

Stargazers

Watchers

Forkers

michael-voit sudhakarkr electricmaxxx davidrui6 bpetit iamomasa acehunterr saiteja13427 okechukwu-opara ibakshay salty-ivy davidkopp andreaswe alexzurbonsen mafleischer vcerpasalas dan-mm nordic-institute thegreenwebfoundation

green-metrics-tool's Issues

Refactor code to be clearer on container_id usage in providers

Providers have the functionality to report data for multiple containers. So a metrics provider can output data as such:

time value container_id

This allows a provider to output data for multiple containers.

We are also using this to output data for various other types thought. For example in lm_sensors we are misusing this functionality to output the feature values. While this works fine we should refactor the code to not use the container wording as this is quite confusing for people who don't know the code and our little "hack".

Our tool must log the kernel it was running on via /proc/cmdline ... custom kernels might skew results

HArden our Docker Containers

--cap-drop=all, to remove all privileged capabilities. You can then manually whitelist some capabilities, but we won’t need any.
--security-opt no-new-privileges to remove problems with setuid
=> Are all our containers working with that setup?

Gunicorn container: All processes started must run as www-data. Currently at least the starting process is run as root
Postgres container is fine. Running as postgres
NGINX starts master as root, which seems to be needed. But workers are all www-data

https://blog.trailofbits.com/2019/07/19/understanding-docker-container-escapes/

https://docs.docker.com/engine/security/userns-remap/

https://cheatsheetseries.owasp.org/cheatsheets/Docker_Security_Cheat_Sheet.html

Change calculation algorithm for AVG CPU Load

In our dashboards (e.g. https://metrics.green-coding.org/stats.html?id=ad9becb9-2a32-4247-b5e8-60e0264442aa) we use the metric "AVG CPU Load".

Currently this is done by adding up all datapoints returned from the API and then integrating them and then dividing by the length of the array.

However the points are not equidistant in time, which results in the fact, that the average is not accurate as the periods for which a CPU load has occured might be shorter / longer than the other periods.

The non-equial time distance of the datapoints has to be factored in to make the algorithm accurate.

Should it be possible to use the compose.yml file?

Currently we are duplicating code from the compose.yml in the usage_scenario.yml. Following ideas were discussed with @ArneTR

Allow an import in the usage_scenario.yml. This will increase code complexity in the runner.py and could potentially make the yml file very had to read if we want to use keywords like setup-commands which we would have to interweave.
Separate the flow into a new file and allow the usage of compose.yml for setup.

install.sh script should show failure more noticeable

Currently if the install.sh script fails it is not very noticeable for an unexperienced user.

Best case is that an error will be shown in red ... I believe this is not possible in bash ... or?

Maybe it makes sense to rewrite it in Python to have proper error handling?
Show a green message on success!

Please do a bit research on the questions and then bring it up for discussion.

Workflow Idea:

Research possibilites in bash. Please timebox to 30 Minutes
If good solution is found, please implement if possible in 2 hours
If no bash solution is found, please research your mentioned Python option. Timebox 0.5-1 hour and then bring to the Team meeting.

Guest and Guest_Nice should not be accumulated with other CPU utilization timings

In the metric reporters the cpu utilization is read from /proc/stat.

At the time of writing we were assuming that we just could add up all fields in /proc/stat to get the total time that the system was utilized in any way.

However in https://unix.stackexchange.com/questions/178045/proc-stat-is-guest-counted-into-user-time it is stated that guest and guest_nice are already included in user and nice.

This is not stated in the official kernel documentation and thus was not assumed.

The StackOverflow post should be double checked and then a patch applied that excludes these two fields from the calculation.

The impact should however be minimal though in our case.

Setup own Github Runner on one of our machines which are online with SSH access

Code formatting and linting

As discussed in the call today:

Please integrate pylint or similar
Please integrate black or similar
Please integrate your C linting / formatting solution

NO need to lint all the branches. If you write a proper git pre-commit for us and a quick how-to to use these tools the branch maintainers should do that themselves.

Only liniting of main / dev would be nice to have from your side.

how does it work?

hi there.

I'm trying to understand how this metrics tool works, but I'm struggling to understand it from looking at just the code - would you please point me to the documentation for understanding what's happening under the hood?

I've read this:
https://www.green-coding.org/blog/green-metrics-measurement-tool/

And I've followed the link to the project demo:
https://github.com/green-coding-berlin/green-metric-demo-software

But it's still not obvious to me - are you measuring things like the increase in CPU usage reported by the docker containers?

I've contributed some docs to Scaphandre, a similar project for measuring the energy usage of compute, and you can see how they work out the energy you an attribute to a given process below:

https://hubblo-org.github.io/scaphandre-documentation/explanations/how-scaph-computes-per-process-power-consumption.html

I'm happy to add some more documentation to this project once I understand it 👍

Boot providers one by one with notes displaying what was booted to see energy consumption overhead of reporters individually

Boot RAPL before anything else and then see how much impact the metric providers have?! It is the DC provider! Clearly seen, cause it had not step before! Since our providers are static we can even deduct their energy costr then? "Estimated energy cost of providers?"
Start them one by one with 2 second delay? But also have !quick-start! option for us

disable this feature by default, but allow for a debug flag to turn it on (--verbose-provider-boot)

- improve disable-metrics script

the turning on all providers first seems like a hacky workflow
is it future-proof? Probably not.

Disable nmi watchdog

As the nmi kernel functionality can lead to high unforeseeable energy usage we need to disable it
kernel.nmi_watchdog

Update setup-test-env / stop containers script

Testing Code on GMT should clear the postgresl and smtp data for test runs, but keep the set up metric providers from the config.yml
- also no password should be asked
Also the live docker containers are currently stopping also. So the services must be renamed in the test-compose.yml to a different name so only they are stopped

Discuss if we really need a dev branch

Currently the dev and main branch are quite far apart. I would like to discuss why we need a deb branch in the first place? What are the advantages of working with a dev branch and, if we need one, how can we ensure that features propagate to main in a timely manor.

Add cpu configuration/ logging settings to runner

Add flags to enable/ disable:

Hyper-threading
P_States
Turboboost (green-coding/tools)
C_States
CPU_Freq

Log:

Virtualization
Hardware prefetcher : spec-power-model
BIOS_Speed
C_States
Temperature

Move psycopg2 to psycopg3

Currently the Green Metrics Tool uses the psycopg2 library for its PostgreSQL connection.

It should be moved to psycopg3 due to performance and future compatibility reasons.

Remove static linking for binaries

The initial idea was to ship the binaries with the report. This doesn't work so we can remove the static linking

Create error message if the sensors config has invalid values

Currently if you pass a value into the metric provider it can not find it just continues. We should through an error here

Document more kernel values for the run as machine attribute

Document current values in /sys/kernel/debug/sched
Document current scaling govenor of the CPU
Document CPU frequency limits

Runs in Workflows in Github Actions should show which branch it was run on

It is okay though if I only see this first in the job, when multiple branches are checked.

Better it would be though in the run title. However, even here only showing it in the job title is sufficient

1h timebox. Please otherwise just echo in the job logs

fix and turn on local stress test

When an entry in the GMT dashboard has no data the API completely aborts and nothing is shown. We would at least like to see the data from the projects table

Sample: https://metrics.green-coding.org/stats.html?id=c3abc7e8-a68a-494c-8257-9c680d2548a7

Projects data in the top tabs could still be shown. Only charts etc. should empty on no data.

Just make a dump of all processes on the system before the start of every run

Research article: Compare Python 3.10 3.11 and 3.12 regarding their energy consumption

It would also be interesting to see how Python compares in this internal benchmark:

https://github.com/mCodingLLC/VideosSampleCode/blob/master/videos/031_the_fastest_way_to_loop_in_python_an_unfortunate_truth/fastest_way_to_loop.py

The other benchmark would be to run a Python only Compute workload: -> Here we need a good example ... maybe the one used here: https://benchmarksgame-team.pages.debian.net/benchmarksgame/index.html

And a C-Bound computation code: https://github.com/green-coding-berlin/example-applications/tree/main/ml-model

XGBoost estimation should be accumulated to mWh

Currently the XGBoost estimation of the energy consumption of the system is not accumulated.

Like any other energy reporter a total estimated energy budget should be given. It suffers from the same issue as the avg CPU reporter at the moment that it must account also for the non-equidistant time points.
This must be considered when accumulating.

Or to put it differently: It must be properly integrated and not just be added up and divided by the total amount of time.

Add README to tests directory

Bug: GMT currently checks for "Cleanup gracefully completed". It must check for the success message with the ID though: "Please access your report with the ID"

Browser Storage & UX

The current Green Metrics Tool Dashboard needs a new config site where the user can select some features.

Here the user should be able to have some checkboxes / toggles where he can select different options for the UI /UX
=> Please use a design component from our CSS Framework already bundled: https://fomantic-ui.com/

Features are:

Display charts in a way that they cut off certain parts of the measurement.
- Currently the charts display the full amount of data points including the "pre-idle" and "post-idle" phase. When the user activates this option though the API requests must be changed so that the "remove_idle=True" parameter is attached. This will instruct the API to only deliver the pure measurement without these datapoiints in the charts
Data has to be stored for the user: Please store the settings of the checkoboxes / toggles as a state in the browser local storage

Add linting

We should have a test that lints all our source code.

We can also run this on pre_commit to make sure that only correctly formatted code is added.

UI: Grouping of projects by URL

On our overview page https://metrics.green-coding.org/ we currently list all measured projects.

The projects should be grouped by the URL.

Currently we have multiple projects per URL and only the latest should show as a separate entry and the others should reveal on click.

A component from our CSS Framework should be used and adapted (https://fomantic-ui.com/)

New Configuration option to turn frequency scaling of the processor off

This makes for a more robust and reproducible measurement.

Downside is that frequency scaling is usually on by default and may not represent real world scenarios as good.

Therefore the measurement should run separately on top of the normal one

UI: Charts Zoom

Our current metrics dashboard lives under https://metrics.green-coding.org/

A sample measurement can be found here: https://metrics.green-coding.org/stats.html?id=ad9becb9-2a32-4247-b5e8-60e0264442aa

Currently many charts display in the bottom part of the page when scrolled down.

We need a functionality to reorder the charts so it is for instance possible to see DC energy and RAPL energy side by side, or underneath each other.

Drag and Drop is not needed, just an additional button to:

Make chart 100% width
Hide chart
Move chart to left / right in order

Please have a look at https://css-tricks.com/snippets/css/a-guide-to-flexbox/ to understand how flexbox generally works in terms of layouting the containers

The functionality should be javascript only, no frameworks.

However it might be helpful to use some stylistic functionalities that are provided by the CSS framework we use: https://fomantic-ui.com/

Daily scheduling of Github actions in GMT (main/dev) currently broken

We have seen test NOT running after 24h although there has been a commit in the last 24 hours.

Check that containers are properly cleaned up on blcokheating server

Move all dependencies to a requirements.txt and update scripts / cronjobs in documentation

Identify all the packages that must be included in a requirements.txt

Brush up on the requirements.txt if unknown and what it does (pip3 install -r requirements.txt)

Documentation of the lm_sensors provider

We currently have no entry for this provider on https://docs.green-coding.org/docs/measuring/metric-providers/metric-providers-overview/ and in their sub-entries

Improve sed replacements in testing code

The testing code for setting up the testing environment currently relies on the mechanic to take the current config.yml file / compose.yml file as input and then make a working test-env from it.

The downside of using sed for this approach is that it relies on specific naming of the shared volumes, services etc.

If the user changes these namings (or maybe also we, but forget to update the test code) the GMT still runs fine, but the tests will maybe run on shared volumes that are still active.
This is unintended and we should check if the expected name is really present in the file.

Even better: The file should be parsed in its intended format (yml) and the complete block of shared volumes should be cleared and repopulated to guarantee that we do not use any shared volumes from the original live system.

I propose moving the replacement code to a proper YML parser. Didi proposed yq, but also Python3 should be a valid option as it allows for easier traversing of the dict.

Up for discussion on Wednesday

Sanity checks before starting measurement / Guardclauses / guard clauses

For our "minimal" system we need to turn off any possible features that might interfere with the measurement and increase Std.Dev of the energy results.

This issue is collection of current ideas:

kernel.nmi_watchdog
kernel.soft_watchdog
kernel.watchdog
SGX active?
RAPL energy filtering active?
cronjobs
X-Server / Wayland (at least a warning should be stated)
More than one SSH connection open (at least a warning should be stated)
More than X processes active on the system (threshhold to be determined)
Not enough free memory available
Disk full
network traffic currently happening
CPU load on system more than 1% total load

Refactor the runner.py to load metric providers with parameters

Currently if we want to monitor two fans in a system we need to create two separate metric providers. This could be streamlined by allowing metric providers to take parameters. I suggest to following refactor to the config.yml and systems that consume such.

measurement:
  metric-providers:
    lm_sensors.temperature.cpu.provider.LmSenorsCpuTempProvider: 100
    lm_sensors.fan.1.provider.FanOneProvider: 100
    lm_sensors.fan.2.provider.FanTwoProvider: 100

this is how it looks currently which is very verbose and has loads of code duplication on the metric side of things.

measurement:
  metric-providers:
    (lm_sensors.provider.LmSenorsProvider, 100, 'CPU', 'cpu_temp', 100, 'C', 'show_graph')
    (lm_sensors.provider.LmSenorsProvider, 100, 'fan1', 'fan1_temp, 1, 'RPM', 'show_graph')
    (lm_sensors.provider.LmSenorsProvider, 100, 'fan2, 'fan2_temp, 1, 'RPM', 'show_avg')

This enables us to also render the metrics in the front end dynamically as all information is in the config. Also we could change the underlying metric-provider-binary to consume multiple configurations so we only need to run one instance that outputs multiple values.

This can also make it quite easy to measure the temperature of each core for example. Which currently would need the number of cores providers running.

[TEST] In the GMT workflow it should be checked that all metric providers have actually successfully written data and in the DB there are rows for each

Timeout for jobs executed

Currently a timeout exists only a per-command level.

Meaning that the total execution time of the whole usage_scenario.yml can be infinite.

Looking forward to a deployment as a service in the cloud this could block free resources too long.

An new configuration parameter shall be introduced that allows to set a total runtime limit.

Refactor current provider/base.py to dependency inject all config variables

No more direct accessing the global object.
Currently resolution is submitted separately and then the other config / measurement specs are read from the global object

Get IPMI running on the blockheating server (without GMT for now)

See email with the IPMI scripts and get the reading running.

Make a quick comparison of IPMI to perf readings

improve Daily Tests Pipeline

Sometimes runs show as error although only a part failed.
Also we see succeeding runs, but actually they faile

Enchancement: Wrong Stdout note format leads to SQL error

Currently when using the read-notes-stdout: True option in the usage_scenario.yml expects a certain format: https://docs.green-coding.org/docs/measuring/usage-scenario/#read-notes-stdout-format-specification

However if the Stdout is not in the expected format a very undescriptive ... something like this:

   Error: Base exception occured in jobs.py: invalid input syntax for type bigint: "Dimensions:"
LINE 6: ', 'Dimensions:', NOW())
          ^

The issue is, that the first value is not a valid timestamp.

We should introduce a Regex check that will check if the string has the expected format (^\d{16} .+$) for every line and if not spit out a better error

Remove warnings from install.sh and include a final success message

We ran into issues that it was not visible to us if the install.sh file was completed successfully.

We want to include a new green colored banner at the end of the install.sh file so it is obvious.

Also we need to get rid of the warnings, which are non critical, but apparently very confusing even to us.

Adding options for Boot, Idle and Burn-In Phase

The tool should be configurable to include / exclude the following phases:

Pre-Idling (already done. Can be set to zero)
Post-idling (already done. Can be set to zero)
Boot phase (Tool should measure the system level metrics when the docker containers boot, to estimate the cost of the application start)
Burn-In (To get more reproducible measurements the CPU should be pre-heated before the measurement. A 60 seconds run with all cores and memory stressed should suffice.)

Since the Burn-In and the Boot phase occur before the actual start of the measurement they are not included in the energy budget calculations.

A new API/URL parameter should be introduced which allows to include / separate the Boot-Phase measurement to only display this measurement or to add it to the main measurement total.

In the frontend these phases should be distinguishable. We want to show the average of the idle and the boot pahse separately with the wattage / joules in these phases.

this might then be a good point in time to refactor the currently confusing "avg" marker in the graphs.

Separate jobs for dev and main

Research article: How much is the energy impact of a pure HTML site vs. the energy impact of a fully javascript generated site?

Create docker setup containing an NGINX server to deliver the webpage and a chrome browser to access it
Version 1: Plain HTML site with CSS. Should be quite long (2 viewports) and contain lorem impsum content
Version 2: Generate same version with Javascript only starting from an empty HTML site and then injecting the containers into the DOM with node creation
Version 3: Do the same thing as version 2 but with a JS library. For example jQuery

All three versions should be benchmarked with the GMT. 10 repetitions. 30 seconds pause between each at least to cool down.

Please inline all CSS and JS code to reduce variance regarding network requests.