Coder Social home page Coder Social logo

os-config's Introduction

os-config

os-config's People

Contributors

alexgg avatar balena-ci avatar dfunckt avatar klutchell avatar majorz avatar zubairlk avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

os-config's Issues

`join` without argument panics

root@balena:~# os-config join
thread 'main' panicked at 'internal error: entered unreachable code', src/args.rs:119:9

Allow to provide default values in os-config.json

It would be useful to me to be able to provide a default value is os-config.json, just in case the remote os-config server is down, or does not exist yet, or when the device is not directly connected to the internet.
For example this would allow us to put a default ssh public key in /home/root/.ssh/authorized_keys_remote even if no remote os-config server is running.

Would you accept such a feature request? I would like to provide a pull request if you think it can be accepted, but I will need to learn Rust first ;-)

Unreachable code on `apiEndpoint` which does not start with https/http

A misconfigured config.json revealed a statement we need to refactor:

unreachable!();

Here we define the code as unreachable, but a misconfigured config.json could lead to binary crash. For now we have only a report of this error, not the actual circumstances or config.json. I asked for more details if we are able to reproduce.

The original report is: "when i try to flash a configured image (downloaded from the API of the image maker), the os-config-devicekey service fails on-device with thread 'main' panicked at 'internal error: entered unreachable code', src/config_json.rs:177:9"

os-config join should display help text

root@8b46e19:~# os-config join
thread 'main' panicked at 'internal error: entered unreachable code', src/args.rs:122:9
root@8b46e19:~#

Instead of panicking, it should preferably display the help text.

Also regarding the help text. The help should highlight the need for 'config.json inside single quotes'
Currently it is

root@8b46e19:~# os-config join -h
os-config-join 
Configure/reconfigure a device

USAGE:
    os-config join [JSON_CONFIG]

FLAGS:
    -h, --help       Prints help information
    -V, --version    Prints version information

ARGS:
    <JSON_CONFIG>    Provisioning JSON configuration
root@8b46e19:~# 

Using [config.json contents inside brackets] doesn't work

config.json is single point of failure and it does get corrupted often

A corrupt config.json leads to the totally bricked device. Only option after that is to reflash the image again which is not an ideal scenario for field deployed devices. The device sometimes get stuck in configuring state due to this. And the boot never completes anyway with os-config service failing.

There are several possible ways to make it more robust:

  1. read_config_json() seem to just bail out in case of an invalid json. In case of corrupt or empty file (which is also a bad json), we should make an attempt to either write a default valid json or last good copy of this file.
  2. These corruptions are not limited to config.json, if logo is also corrupt, the boot suffers. A more robust solution might be: A successful boot takes a snapshot of all critical boot files in a backup folder if any of them has changed. This backup service should be last to run in systemd order. At earliest opportunity in the boot, check for 0 byte files in boot folder and restore the backup for those. 0 byte is the most common type of corruption we have come across, there might be more and check might have to reworked after general consensus.
  3. it might also help to keep the boot folder ro most of the time and make it rw only for doing write operations and make it readonly again after the operation is done.

Given FAT16 is not the most reliable filesystem, the solution will never be perfect but right now we are having problems way too often.

Mock test server based on `hyper`

The configuration endpoint will be mocked by a test JSON server, so that we do not rely on external service availability during initial development. Also that will facilitate fully covered integration testing.

Investigate incompatible device types

A team member did os-config json with a JSON for a RPi 3 board, but his original config is of Balena Fin device. This did not error out. I need to investigate this, since we compare device types, and this should not have occurred.

Needs to catch termination signals and exit gracefully

Currently os-config does not catch SIGTERM and other signals, which makes it exit in non-graceful way by leaving supervisor not started.

Instead we need to catch those standard signals, finish current action, and then exit gracefully.

Read/write racing issue with `config.json`

During an update there is delay due to network unavailability and/or the needed time for fetching the config from the API endpoint, which causes a racing issue with other services that write to config.json. When merging data back to config.json we need to use a fresh copy of it. This will hugely minimize the chance for such a racing issue to occur. We need do add an integration test for this as well.

os-config doesn't run properly on x86 devices using intel nuc based balenaOS

when using balenaOS(intel nuc image) on laptops, os-config does not run successfully.

  • On Intel-based x86 laptops, the device can be joined manually using os-config join <config>

  • On AMD-based x86 laptops, os-config throws the following error:

Aug 30 08:56:07 5c3cc97 os-config[2777]: Fetching service configuration from https://api.balena-cloud.com/os/v1/config...
Aug 30 08:56:08 5c3cc97 os-config[2777]: thread 'reqwest-internal-sync-runtime' panicked at 'header name validated by httparse: InvalidHeaderName { _priv: () }', src/libcore/result.rs:997:5
Aug 30 08:56:08 5c3cc97 os-config[2777]: note: Run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
Aug 30 08:56:08 5c3cc97 os-config[2777]: thread 'main' panicked at 'event loop thread panicked', /usr/src/debug/os-config/1.1.1-r0/cargo_home/bitbake/reqwest-0.9.17/src/client.rs:675:5

Segmentation fault when running the `update` command

Hello,

I've been diagnosing an issue on multiple devices in our balena fleet that have been stuck in the Online (Heartbeat only) status after flashing them with balenaOS version 2.95.12+rev1. They have been stuck in this state for a day now. The issue didn't appear related to network connectivity as the logs from the devices were being streamed to balenaCloud. I also went through the steps for diagnosing firewall issues but there were no issues there.

After checking multiple services, I narrowed the issue down to the os-config service which was stuck in a crash loop:

root@48998eb:~# journalctl --follow -n 300 -u os-config
Feb 09 10:12:07 48998eb os-config[465032]: Fetching service configuration from https://api.balena-cloud.com/os/v1/config...
Feb 09 10:12:07 48998eb systemd[1]: os-config.service: Main process exited, code=killed, status=11/SEGV
Feb 09 10:12:07 48998eb systemd[1]: os-config.service: Failed with result 'signal'.
Feb 09 10:12:18 48998eb os-config[465150]: Fetching service configuration from https://api.balena-cloud.com/os/v1/config...
Feb 09 10:12:18 48998eb systemd[1]: os-config.service: Main process exited, code=killed, status=11/SEGV
Feb 09 10:12:18 48998eb systemd[1]: os-config.service: Failed with result 'signal'.
Feb 09 10:12:28 48998eb os-config[465270]: Fetching service configuration from https://api.balena-cloud.com/os/v1/config...
Feb 09 10:12:28 48998eb systemd[1]: os-config.service: Main process exited, code=killed, status=11/SEGV
Feb 09 10:12:28 48998eb systemd[1]: os-config.service: Failed with result 'signal'.
Feb 09 10:12:38 48998eb os-config[465314]: Fetching service configuration from https://api.balena-cloud.com/os/v1/config...
Feb 09 10:12:38 48998eb systemd[1]: os-config.service: Main process exited, code=killed, status=11/SEGV
Feb 09 10:12:38 48998eb systemd[1]: os-config.service: Failed with result 'signal'.

When I tried to manually run the os-config update command I got the same error (i.e., segmentation fault):

root@48998eb:~# os-config update
Fetching service configuration from https://api.balena-cloud.com/os/v1/config...
Segmentation fault

Did something change in what's returned by https://api.balena-cloud.com/os/v1/config that is now causing the crash? Any help is appreciated.

System information

root@48998eb:~# os-config --version
os-config 1.2.1

root@48998eb:~# cat /etc/os-release
ID="balena-os"
NAME="balenaOS"
VERSION="2.95.12+rev1"
VERSION_ID="2.95.12+rev1"
PRETTY_NAME="balenaOS 2.95.12+rev1"
MACHINE="genericx86-64-ext"
META_BALENA_VERSION="2.95.12"
BALENA_BOARD_REV="ed67aa9"
META_BALENA_REV="87974875"
SLUG="genericx86-64-ext"

OpenSSL error with custom build

I'm working on a custom board and I get this error from the os-config process

root@localhost:~# os-config update
Fetching service configuration from https://api.balena-cloud.com/os/v1/config...
https://api.balena-cloud.com/os/v1/config: The OpenSSL library reported an error: error:14090086:SSL routines:ssl3_get_server_certificate:certificate verify failed:s3_clnt.c:1264:

Running curl gives me:

root@localhost:~# curl -v https://api.balena-cloud.com/os/v1/config
*   Trying 52.22.171.115...
* Connected to api.balena-cloud.com (52.22.171.115) port 443 (#0)
* found 173 certificates in /etc/ssl/certs/ca-certificates.crt
* ALPN, offering http/1.1
* SSL connection using TLS1.2 / ECDHE_RSA_AES_128_GCM_SHA256
*        server certificate verification OK
*        server certificate status verification SKIPPED
*        common name: balena.io (matched)
*        server certificate expiration date OK
*        server certificate activation date OK
*        certificate public key: RSA
*        certificate version: #3
*        subject: CN=balena.io
*        start date: Thu, 25 Oct 2018 00:00:00 GMT
*        expire date: Mon, 25 Nov 2019 12:00:00 GMT
*        issuer: C=US,O=Amazon,OU=Server CA 1B,CN=Amazon
*        compression: NULL
* ALPN, server accepted to use http/1.1
> GET /os/v1/config HTTP/1.1
> Host: api.balena-cloud.com
> User-Agent: curl/7.47.1
> Accept: */*
> 
< HTTP/1.1 200 OK
< Date: Wed, 19 Dec 2018 02:36:27 GMT
< Content-Type: application/json; charset=utf-8
< Content-Length: 2100
< Connection: keep-alive
< X-Frame-Options: DENY
< X-Content-Type-Options: nosniff
< ETag: W/"834-VaO8snq0xAwKwntt3EgR1OQ3Jk8"
< Vary: Accept-Encoding
< 
* Connection #0 to host api.balena-cloud.com left intact

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.