canonical / pebble Goto Github PK

View Code? Open in Web Editor NEW

143.0 12.0 53.0 1.33 MB

Take control of your internal daemons!

Home Page: https://canonical-pebble.readthedocs-hosted.com/

License: GNU General Public License v3.0

Shell 0.14% Go 99.86%

api go service-manager

pebble's Introduction

The Pebble service manager

Take control of your internal daemons!

Pebble helps you to orchestrate a set of local service processes as an organized set. It resembles well known tools such as supervisord, runit, or s6, in that it can easily manage non-system processes independently from the system services, but it was designed with unique features that help with more specific use cases.

General model
Layer configuration examples
Using Pebble
Container usage
Layer specification
API and clients
Hacking / Development
Contributing

General model

Pebble is organized as a single binary that works as a daemon and also as a client to itself. When the daemon runs it loads its own configuration from the $PEBBLE directory, as defined in the environment, and also records in that same directory its state and unix sockets for communication. If that variable is not defined, Pebble will attempt to look for its configuration from a default system-level setup at /var/lib/pebble/default. Using that directory is encouraged for whole-system setup such as when using Pebble to control services in a container.

The $PEBBLE directory must contain a layers/ subdirectory that holds a stack of configuration files with names similar to 001-base-layer.yaml, where the digits define the order of the layer and the following label uniquely identifies it. Each layer in the stack sits above the former one, and has the chance to improve or redefine the service configuration as desired.

Layer configuration examples

Below is an example of the current configuration format. For full details of all fields, see the complete layer specification.

summary: Simple layer

description: |
    A better description for a simple layer.

services:
    srv1:
        override: replace
        summary: Service summary
        command: cmd arg1 "arg2a arg2b"
        startup: enabled
        after:
            - srv2
        before:
            - srv3
        requires:
            - srv2
            - srv3
        environment:
            VAR1: val1
            VAR2: val2
            VAR3: val3

    srv2:
        override: replace
        startup: enabled
        command: cmd
        before:
            - srv3

    srv3:
        override: replace
        command: cmd

The override field (which is required) defines whether this entry overrides the previous service of the same name (if any), or merges with it. See the full layer specification for more details.

Layer override example

Any of the fields can be replaced individually in a merged service configuration. To illustrate, here is a sample override layer that might sit on top of the one above:

summary: Simple override layer

services:
    srv1:
        override: merge
        environment:
            VAR3: val3
        after:
            - srv4
        before:
            - srv5

    srv2:
        override: replace
        summary: Replaced service
        startup: disabled
        command: cmd

    srv4:
        override: replace
        command: cmd
        startup: enabled

    srv5:
        override: replace
        command: cmd

Using Pebble

To install the latest version of Pebble, run the following command (we don't currently ship binaries, so you must first install Go):

go install github.com/canonical/pebble/cmd/pebble@latest

Pebble is invoked using pebble <command>. To get more information:

To see a help summary, type pebble -h.
To see a short description of all commands, type pebble help --all.
To see details for one command, type pebble help <command> or pebble <command> -h.

A few of the commands that need more explanation are detailed below.

Running the daemon (server)

If Pebble is installed and the $PEBBLE directory is set up, running the daemon is easy:

$ pebble run
2022-10-26T01:18:26.904Z [pebble] Started daemon.
2022-10-26T01:18:26.921Z [pebble] POST /v1/services 15.53132ms 202
2022-10-26T01:18:26.921Z [pebble] Started default services with change 50.
2022-10-26T01:18:26.936Z [pebble] Service "srv1" starting: sleep 300

This will start the Pebble daemon itself, as well as starting all the services that are marked as startup: enabled (if you don't want that, use --hold). Then other Pebble commands may be used to interact with the running daemon, for example, in another terminal window.

To provide additional arguments to a service, use --args <service> <args> .... If the command field in the service's plan has a [ <default-arguments...> ] list, the --args arguments will replace the defaults. If not, they will be appended to the command.

To indicate the end of an --args list, use a ; (semicolon) terminator, which must be backslash-escaped if used in the shell. The terminator may be omitted if there are no other Pebble options that follow.

For example:

# Start the daemon and pass additional arguments to "myservice".
$ pebble run --args myservice --verbose --foo "multi str arg"

# Use args terminator to pass --hold to Pebble at the end of the line.
$ pebble run --args myservice --verbose \; --hold

# Start the daemon and pass arguments to multiple services.
$ pebble run --args myservice1 --arg1 \; --args myservice2 --arg2

To override the default configuration directory, set the PEBBLE environment variable when running:

$ export PEBBLE=~/pebble
pebble run
2022-10-26T01:18:26.904Z [pebble] Started daemon.
...

To initialise the $PEBBLE directory with the contents of another, in a one time copy, set the PEBBLE_COPY_ONCE environment variable to the source directory. This will only copy the contents if the target directory, $PEBBLE, is empty.

Viewing, starting, and stopping services

You can view the status of one or more services by using pebble services:

$ pebble services srv1       # show status of a single service
Service  Startup  Current
srv1     enabled  active

$ pebble services            # show status of all services
Service  Startup   Current
srv1     enabled   active
srv2     disabled  inactive

The "Startup" column shows whether this service is automatically started when Pebble starts ("enabled" means auto-start, "disabled" means don't auto-start).

The "Current" column shows the current status of the service, and can be one of the following:

active: starting or running
inactive: not yet started, being stopped, or stopped
backoff: in a backoff-restart loop
error: in an error state

To start specific services, type pebble start followed by one or more service names:

$ pebble start srv1 srv2  # start two services (and any dependencies)

When starting a service, Pebble executes the service's command, and waits 1 second to ensure the command doesn't exit too quickly. Assuming the command doesn't exit within that time window, the start is considered successful, otherwise pebble start will exit with an error.

Similarly, to stop specific services, use pebble stop followed by one or more service names:

$ pebble stop srv1        # stop one service

When stopping a service, Pebble sends SIGTERM to the service's process group, and waits up to 5 seconds. If the command hasn't exited within that time window, Pebble sends SIGKILL to the service's process group and waits up to 5 more seconds. If the command exits within that 10-second time window, the stop is considered successful, otherwise pebble stop will exit with an error.

Updating and restarting services

When you update service configuration (by adding a layer), the services changed won't be automatically restarted. To restart them and bring the service state in sync with the new configuration, use pebble replan.

The "replan" operation restarts startup: enabled services whose configuration have changed between when they started and now; if the configuration hasn't changed, replan does nothing. Replan also starts startup: enabled services that have not yet been started.

Here is an example, where srv1 is a service that has startup: enabled, and srv2 does not:

$ pebble replan
2023-04-25T15:06:50+02:00 INFO Service "srv1" already started.
$ pebble add lay1 layer.yaml  # update srv1 config
Layer "lay1" added successfully from "layer.yaml"
$ pebble replan
Stop service "srv1"
Start service "srv1"
$ pebble add lay2 layer.yaml  # change srv2 to "startup: enabled"
Layer "lay2" added successfully from "layer.yaml"
$ pebble replan
2023-04-25T15:11:22+02:00 INFO Service "srv1" already started.
Start service "srv2"

If you want to force a service to restart even if its service configuration hasn't changed, use pebble restart <service>.

Service dependencies

Pebble takes service dependencies into account when starting and stopping services. When Pebble starts a service, it also starts the services which that service depends on (configured with required). Conversely, when stopping a service, Pebble also stops services which depend on that service.

For example, if service nginx requires logger, pebble start nginx will start both nginx and logger (in an undefined order). Running pebble stop logger will stop both nginx and logger; however, running pebble stop nginx will only stop nginx (nginx depends on logger, not the other way around).

When multiple services need to be started together, they're started in order according to the before and after configuration, waiting 1 second for each to ensure the command doesn't exit too quickly. The before option is a list of services that this service must start before (it may or may not require them). Or if it's easier to specify this ordering the other way around, after is a list of services that this service must start after.

Note that currently, before and after are of limited usefulness, because Pebble only waits 1 second before moving on to start the next service, with no additional checks that the previous service is operating correctly.

If the configuration of requires, before, and after for a service results in a cycle or "loop", an error will be returned when attempting to start or stop the service.

Service auto-restart

Pebble's service manager automatically restarts services that exit unexpectedly. By default, this is done whether the exit code is zero or non-zero, but you can change this using the on-success and on-failure fields in a configuration layer. The possible values for these fields are:

restart: restart the service and enter a restart-backoff loop (the default behaviour).
shutdown: shut down and exit the Pebble daemon (with exit code 0 if the service exits successfully, exit code 10 otherwise)
- success-shutdown: shut down with exit code 0 (valid only for on-failure)
- failure-shutdown: shut down with exit code 10 (valid only for on-success)
ignore: ignore the service exiting and do nothing further

In restart mode, the first time a service exits, Pebble waits the backoff-delay, which defaults to half a second. If the service exits again, Pebble calculates the next backoff delay by multiplying the current delay by backoff-factor, which defaults to 2.0 (doubling). The increasing delay is capped at backoff-limit, which defaults to 30 seconds.

The backoff-limit value is also used as a "backoff reset" time. If the service stays running after a restart for backoff-limit seconds, the backoff process is reset and the delay reverts to backoff-delay.

Health checks

Separate from the service manager, Pebble implements custom "health checks" that can be configured to restart services when they fail.

Each check can be one of three types. The types and their success criteria are:

http: an HTTP GET request to the URL specified must return an HTTP 2xx status code
tcp: opening the given TCP port must be successful
exec: executing the specified command must yield a zero exit code

Checks are configured in the layer configuration using the top-level field checks. Full details are given in the layer specification, but below is an example layer showing the three different types of checks:

checks:
    up:
        override: replace
        level: alive
        period: 30s
        threshold: 1  # an aggressive threshold
        exec:
            command: service nginx status

    online:
        override: replace
        level: ready
        tcp:
            port: 8080

    test:
        override: replace
        http:
            url: http://localhost:8080/test

Each check is performed with the specified period (the default is 10 seconds apart), and is considered an error if a timeout happens before the check responds -- for example, before the HTTP request is complete or before the command finishes executing.

A check is considered healthy until it's had threshold errors in a row (the default is 3). At that point, the check is considered "down", and any associated on-check-failure actions will be triggered. When the check succeeds again, the failure count is reset to 0.

To enable Pebble auto-restart behavior based on a check, use the on-check-failure map in the service configuration (this is what ties together services and checks). For example, to restart the "server" service when the "test" check fails, use the following:

services:
    server:
        override: merge
        on-check-failure:
            # can also be "shutdown", "success-shutdown", or "ignore" (the default)
            test: restart

You can view check status using the pebble checks command. This reports the checks along with their status (up or down) and number of failures. For example:

$ pebble checks
Check   Level  Status  Failures
up      alive  up      0/1
online  ready  down    1/3
test    -      down    42/3

The "Failures" column shows the current number of failures since the check started failing, a slash, and the configured threshold.

If the --http option was given when starting pebble run, Pebble exposes a /v1/health HTTP endpoint that allows a user to query the health of configured checks, optionally filtered by check level with the query string ?level=<level> This endpoint returns an HTTP 200 status if the checks are healthy, HTTP 502 otherwise.

Each check can specify a level of "alive" or "ready". These have semantic meaning: "alive" means the check or the service it's connected to is up and running; "ready" means it's properly accepting network traffic. These correspond to Kubernetes "liveness" and "readiness" probes.

The tool running the Pebble server can make use of this, for example, under Kubernetes you could initialize its liveness and readiness probes to hit Pebble's /v1/health endpoint with ?level=alive and ?level=ready filters, respectively.

Ready implies alive, and not-alive implies not-ready. If you've configured an "alive" check but no "ready" check, and the "alive" check is unhealthy, /v1/health?level=ready will report unhealthy as well, and the Kubernetes readiness probe will act on that.

If there are no checks configured, the /v1/health endpoint returns HTTP 200 so the liveness and readiness probes are successful by default. To use this feature, you must explicitly create checks with level: alive or level: ready in the layer configuration.

Changes and tasks

When Pebble performs a (potentially invasive or long-running) operation such as starting or stopping a service, it records a "change" object with one or more "tasks" in it. The daemon records this state in a JSON file on disk at $PEBBLE/.pebble.state.

To see recent changes, for this or previous server runs, use pebble changes. You might see something like this:

$ pebble changes
ID  Status  Spawn                Ready                Summary
1   Done    today at 14:33 NZDT  today at 14:33 NZDT  Autostart service "srv1"
2   Done    today at 15:26 NZDT  today at 15:26 NZDT  Start service "srv2"
3   Done    today at 15:26 NZDT  today at 15:26 NZDT  Stop service "srv1" and 1 more

To drill down and see the tasks that make up a change, use pebble tasks <change-id>:

$ pebble tasks 3
Status  Spawn                Ready                Summary
Done    today at 15:26 NZDT  today at 15:26 NZDT  Stop service "srv1"
Done    today at 15:26 NZDT  today at 15:26 NZDT  Stop service "srv2"

Logs

The daemon's service manager stores the most recent stdout and stderr from each service, using a 100KB ring buffer per service. Each log line is prefixed with an RFC-3339 timestamp and the [service-name] in square brackets.

Logs are viewable via the logs API or using pebble logs, for example:

$ pebble logs
2022-11-14T01:35:06.979Z [srv1] Log 0 from srv1
2022-11-14T01:35:08.041Z [srv2] Log 0 from srv2
2022-11-14T01:35:09.982Z [srv1] Log 1 from srv1

To view existing logs and follow (tail) new output, use -f (press Ctrl-C to exit):

$ pebble logs -f
2022-11-14T01:37:56.936Z [srv1] Log 0 from srv1
2022-11-14T01:37:57.978Z [srv2] Log 0 from srv2
2022-11-14T01:37:59.939Z [srv1] Log 1 from srv1
^C

You can output logs in JSON Lines format, using --format=json:

$ pebble logs --format=json
{"time":"2022-11-14T01:39:10.886Z","service":"srv1","message":"Log 0 from srv1"}
{"time":"2022-11-14T01:39:11.943Z","service":"srv2","message":"Log 0 from srv2"}
{"time":"2022-11-14T01:39:13.889Z","service":"srv1","message":"Log 1 from srv1"}

If you want to also write service logs to Pebble's own stdout, run the daemon with --verbose:

$ pebble run --verbose
2022-10-26T01:41:32.805Z [pebble] Started daemon.
2022-10-26T01:41:32.835Z [pebble] POST /v1/services 29.743632ms 202
2022-10-26T01:41:32.835Z [pebble] Started default services with change 7.
2022-10-26T01:41:32.849Z [pebble] Service "srv1" starting: python3 -u /path/to/srv1.py
2022-10-26T01:41:32.866Z [srv1] Log 0 from srv1
2022-10-26T01:41:35.870Z [srv1] Log 1 from srv1
2022-10-26T01:41:38.873Z [srv1] Log 2 from srv1
...

Log forwarding

Pebble supports forwarding its services' logs to a remote Loki server. In the log-targets section of the plan, you can specify destinations for log forwarding, for example:

log-targets:
    staging-logs:
        override: merge
        type: loki
        location: http://10.1.77.205:3100/loki/api/v1/push
        services: [all]
    production-logs:
        override: merge
        type: loki
        location: http://my.loki.server.com/loki/api/v1/push
        services: [svc1, svc2]

Specifying services

For each log target, use the services key to specify a list of services to collect logs from. In the above example, the production-logs target will collect logs from svc1 and svc2.

Use the special keyword all to match all services, including services that might be added in future layers. In the above example, staging-logs will collect logs from all services.

To remove a service from a log target when merging, prefix the service name with a minus -. For example, if we have a base layer with

my-target:
    services: [svc1, svc2]

and override layer with

my-target:
    services: [-svc1]
    override: merge

then in the merged layer, the services list will be merged to [svc1, svc2, -svc1], which evaluates left to right as simply [svc2]. So my-target will collect logs from only svc2.

You can also use -all to remove all services from the list. For example, adding an override layer with

my-target:
    services: [-all]
    override: merge

would remove all services from my-target, effectively disabling my-target. Meanwhile, adding an override layer with

my-target:
    services: [-all, svc1]
    override: merge

would remove all services and then add svc1, so my-target would receive logs from only svc1.

Labels

In the labels section, you can specify custom labels to be added to any outgoing logs. These labels may contain $ENVIRONMENT_VARIABLES - these will be interpreted in the environment of the corresponding service. Pebble may also add its own default labels (depending on the protocol). For example, given the following plan:

services:
  svc1:
    environment:
      OWNER: 'alice'
  svc2:
    environment:
      OWNER: 'bob'

log-targets:
  tgt1:
    type: loki
    labels:
      product: 'juju'
      owner: 'user-$OWNER'

the logs from svc1 will be sent with the following labels:

product: juju
owner: user-alice     # env var $OWNER substituted
pebble_service: svc1  # default label for Loki

and for svc2, the labels will be

product: juju
owner: user-bob       # env var $OWNER substituted
pebble_service: svc2  # default label for Loki

Notices

Pebble includes a subsystem called notices, which allows the user to introspect various events that occur in the Pebble server, as well as record custom client events. The server saves notices to disk, so they persist across restarts, and expire after a notice-defined interval.

Each notice is either public or has a specific user ID. Public notices may be viewed by any user, while notices that have a user ID may only be viewed by users with that same user ID, or by an admin (root, or the user the Pebble daemon is running as).

Each notice is uniquely identified by its user ID, type and key combination, and the notice's count of occurrences is incremented every time a notice with that type and key combination occurs.

Each notice records the time it first occurred, the time it last occurred, and the time it last repeated.

A repeat happens when a notice occurs with the same user ID, type, and key as a prior notice, and either the notice has no "repeat after" duration (the default), or the notice happens after the provided "repeat after" interval (since the prior notice). Thus, specifying "repeat after" prevents a notice from appearing again if it happens more frequently than desired.

In addition, a notice records optional data (string key-value pairs) from the last occurrence.

These notice types are currently available:

custom: a custom client notice reported via pebble notify. The key and any data is provided by the user. The key must be in the format mydomain.io/mykey to ensure well-namespaced notice keys.

To record custom notices, use pebble notify -- the notice user ID will be set to the client's user ID:

$ pebble notify example.com/foo
Recorded notice 1
$ pebble notify example.com/foo
Recorded notice 1
$ pebble notify other.com/bar name=value [email protected]  # two data fields
Recorded notice 2
$ pebble notify example.com/foo
Recorded notice 1

The pebble notices command lists notices not yet acknowledged, ordered by the last-repeated time (oldest first). After it runs, the notices that were shown may then be acknowledged by running pebble okay. When a notice repeats (see above), it needs to be acknowledged again.

$ pebble notices
ID   User    Type    Key              First                Repeated             Occurrences
1    1000    custom  example.com/foo  today at 16:16 NZST  today at 16:16 NZST  3
2    public  custom  other.com/bar    today at 16:16 NZST  today at 16:16 NZST  1

To fetch details about a single notice, use pebble notice, which displays the output in YAML format. You can fetch a notice either by ID or by type/key combination.

To fetch the notice with ID "1":

$ pebble notice 1
id: "1"
user-id: 1000
type: custom
key: example.com/foo
first-occurred: 2023-09-15T04:16:09.179395298Z
last-occurred: 2023-09-15T04:16:19.487035209Z
last-repeated: 2023-09-15T04:16:09.179395298Z
occurrences: 3
expire-after: 168h0m0s

To fetch the notice with type "custom" and key "other.com/bar":

$ pebble notice custom other.com/bar
id: "2"
user-id: public
type: custom
key: other.com/bar
first-occurred: 2023-09-15T04:16:17.180049768Z
last-occurred: 2023-09-15T04:16:17.180049768Z
last-repeated: 2023-09-15T04:16:17.180049768Z
occurrences: 1
last-data:
    name: value
    email: [email protected]
expire-after: 168h0m0s

Container usage

Pebble works well as a local service manager, but if running Pebble in a separate container, you can use the exec and file management APIs to coordinate with the remote system over the shared unix socket.

Exec (one-shot commands)

Pebble's "exec" feature allows you to run arbitrary commands on the server. This is intended for short-running programs; the processes started with exec don't use the service manager.

For example, you could use exec to run pg_dump and create a PostgreSQL database backup:

$ pebble exec pg_dump mydb
--
-- PostgreSQL database dump
--
...

The exec feature uses WebSockets under the hood, and allows you to stream stdin to the process, as well as stream stdout and stderr back. When running pebble exec, you can specify the working directory to run in (-w), environment variables to set (--env), and the user and group to run as (--uid/--user and --gid/--group).

You can also apply a timeout with --timeout, for example:

$ pebble exec --timeout 1s -- sleep 3
error: cannot perform the following tasks:
- exec command "sleep" (timed out after 1s: context deadline exceeded)

File management

Pebble provides various API calls and commands to manage files and directories on the server. The simplest way to use these is with the commands below, several of which should be familiar:

$ pebble ls <path>              # list file information (like "ls")
$ pebble mkdir <path>           # create a directory (like "mkdir")
$ pebble rm <path>              # remove a file or directory (like "rm")
$ pebble push <local> <remote>  # copy file to server (like "cp")
$ pebble pull <remote> <local>  # copy file from server (like "cp")

Layer specification

Below is the full specification for a Pebble configuration layer. Layers are added statically using a file in $PEBBLE/layers, or dynamically via the layers API or pebble add.

# (Optional) A short one line summary of the layer
summary: <summary>

# (Optional) A full description of the configuration layer
description: |
    <description>

# (Optional) A list of services managed by this configuration layer
services:

    <service name>:

        # (Required) Control how this service definition is combined with any
        # other pre-existing definition with the same name in the Pebble plan.
        #
        # The value 'merge' will ensure that values in this layer specification
        # are merged over existing definitions, whereas 'replace' will entirely
        # override the existing service spec in the plan with the same name.
        override: merge | replace

        # (Required in combined layer) The command to run the service. It is executed
        # directly, not interpreted by a shell, and may be optionally suffixed by default
        # arguments within "[" and "]" which may be overriden via --args.
        # Example: /usr/bin/somedaemon --db=/db/path [ --port 8080 ]
        command: <commmand>

        # (Optional) A short summary of the service.
        summary: <summary>

        # (Optional) A detailed description of the service.
        description: |
            <description>

        # (Optional) Control whether the service is started automatically when
        # Pebble starts. Default is "disabled".
        startup: enabled | disabled

        # (Optional) A list of other services in the plan that this service
        # should start after.
        after:
            - <other service name>

        # (Optional) A list of other services in the plan that this service
        # should start before.
        before:
            - <other service name>

        # (Optional) A list of other services in the plan that this service
        # requires in order to start correctly.
        requires:
            - <other service name>

        # (Optional) A list of key/value pairs defining environment variables
        # that should be set in the context of the process.
        environment:
            <env var name>: <env var value>

        # (Optional) Username for starting service as a different user. It is
        # an error if the user doesn't exist.
        user: <username>

        # (Optional) User ID for starting service as a different user. If both
        # user and user-id are specified, the user's UID must match user-id.
        user-id: <uid>

        # (Optional) Group name for starting service as a different user. It is
        # an error if the group doesn't exist.
        group: <group name>

        # (Optional) Group ID for starting service as a different user. If both
        # group and group-id are specified, the group's GID must match group-id.
        group-id: <gid>

        # (Optional) Working directory to run command in. By default, the
        # command is run in the service manager's current directory.
        working-dir: <directory>

        # (Optional) Defines what happens when the service exits with a zero
        # exit code. Possible values are:
        #
        # - restart (default): restart the service after the backoff delay
        # - shutdown: shut down and exit the Pebble daemon (with exit code 0)
        # - failure-shutdown: shut down and exit Pebble with exit code 10
        # - ignore: do nothing further
        on-success: restart | shutdown | failure-shutdown | ignore

        # (Optional) Defines what happens when the service exits with a nonzero
        # exit code. Possible values are:
        #
        # - restart (default): restart the service after the backoff delay
        # - shutdown: shut down and exit the Pebble daemon (with exit code 10)
        # - success-shutdown: shut down and exit Pebble with exit code 0
        # - ignore: do nothing further
        on-failure: restart | shutdown | success-shutdown | ignore

        # (Optional) Defines what happens when each of the named health checks
        # fail. Possible values are:
        #
        # - restart (default): restart the service once
        # - shutdown: shut down and exit the Pebble daemon (with exit code 11)
        # - success-shutdown: shut down and exit Pebble with exit code 0
        # - ignore: do nothing further
        on-check-failure:
            <check name>: restart | shutdown | success-shutdown | ignore

        # (Optional) Initial backoff delay for the "restart" exit action.
        # Default is half a second ("500ms").
        backoff-delay: <duration>

        # (Optional) After each backoff, the backoff delay is multiplied by
        # this factor to get the next backoff delay. Must be greater than or
        # equal to one. Default is 2.0.
        backoff-factor: <factor>

        # (Optional) Limit for the backoff delay: when multiplying by
        # backoff-factor to get the next backoff delay, if the result is
        # greater than this value, it is capped to this value. Default is
        # half a minute ("30s").
        backoff-limit: <duration>

        # (Optional) The amount of time afforded to this service to handle
        # SIGTERM and exit gracefully before SIGKILL terminates it forcefully.
        # Default is 5 seconds ("5s").
        kill-delay: <duration>

# (Optional) A list of health checks managed by this configuration layer.
checks:

    <check name>:

        # (Required) Control how this check definition is combined with any
        # other pre-existing definition with the same name in the Pebble plan.
        #
        # The value 'merge' will ensure that values in this layer specification
        # are merged over existing definitions, whereas 'replace' will entirely
        # override the existing check spec in the plan with the same name.
        override: merge | replace

        # (Optional) Check level, which can be used for filtering checks when
        # calling the checks API or health endpoint.
        #
        # For the health endpoint, ready implies alive. In other words, if all
        # the "ready" checks are succeeding and there are no "alive" checks,
        # the /v1/health API will return success for level=alive.
        level: alive | ready

        # (Optional) Check is run every time this period (time interval)
        # elapses. Must not be zero. Default is "10s".
        period: <duration>

        # (Optional) If this time elapses before a single check operation has
        # finished, it is cancelled and considered an error. Must be less
        # than the period, and must not be zero. Default is "3s".
        timeout: <duration>

        # (Optional) Number of times in a row the check must error to be
        # considered a failure (which triggers the on-check-failure action).
        # Default 3.
        threshold: <failure threshold>

        # Configures an HTTP check, which is successful if a GET to the
        # specified URL returns a 20x status code.
        #
        # Only one of "http", "tcp", or "exec" may be specified.
        http:
            # (Required) URL to fetch, for example "https://example.com/foo".
            url: <full URL>

            # (Optional) Map of HTTP headers to send with the request.
            headers:
                <name>: <value>

        # Configures a TCP port check, which is successful if the specified
        # TCP port is listening and we can successfully open it. Nothing is
        # sent to the port.
        #
        # Only one of "http", "tcp", or "exec" may be specified.
        tcp:
            # (Required) Port number to open.
            port: <port number>

            # (Optional) Host name or IP address to use. Default is "localhost".
            host: <host name>

        # Configures a command execution check, which is successful if running
        # the specified command returns a zero exit code.
        #
        # Only one of "http", "tcp", or "exec" may be specified.
        exec:
            # (Required) Command line to execute. The command is executed
            # directly, not interpreted by a shell.
            command: <commmand>

            # (Optional) Run the command in the context of this service.
            # Specifically, inherit its environment variables, user/group
            # settings, and working directory. The check's context (the
            # settings below) will override the service's; the check's
            # environment map will be merged on top of the service's.
            service-context: <service-name>

            # (Optional) A list of key/value pairs defining environment
            # variables that should be set when running the command.
            environment:
                <name>: <value>

            # (Optional) Username for starting command as a different user. It
            # is an error if the user doesn't exist.
            user: <username>

            # (Optional) User ID for starting command as a different user. If
            # both user and user-id are specified, the user's UID must match
            # user-id.
            user-id: <uid>

            # (Optional) Group name for starting command as a different user.
            # It is an error if the group doesn't exist.
            group: <group name>

            # (Optional) Group ID for starting command as a different user. If
            # both group and group-id are specified, the group's GID must
            # match group-id.
            group-id: <gid>

            # (Optional) Working directory to run command in. By default, the
            # command is run in the service manager's current directory.
            working-dir: <directory>

# (Optional) A list of remote log receivers, to which service logs can be sent.
log-targets:

  <log target name>:

    # (Required) Control how this log target definition is combined with
    # other pre-existing definitions with the same name in the Pebble plan.
    #
    # The value 'merge' will ensure that values in this layer specification
    # are merged over existing definitions, whereas 'replace' will entirely
    # override the existing target spec in the plan with the same name.
    override: merge | replace

    # (Required) The type of log target, which determines the format in
    # which logs will be sent. The supported types are:
    #
    # - loki: Use the Grafana Loki protocol. A "pebble_service" label is
    #   added automatically, with the name of the Pebble service as its value.
    type: loki

    # (Required) The URL of the remote log target.
    # For Loki, this needs to be the fully-qualified URL of the push API,
    # including the API endpoint, e.g.
    #     http://<ip-address>:3100/loki/api/v1/push
    location: <url>

    # (Optional) A list of services whose logs will be sent to this target.
    # Use the special keyword 'all' to match all services in the plan.
    # When merging log targets, the 'services' lists are appended. Prefix a
    # service name with a minus (e.g. '-svc1') to remove a previously added
    # service. '-all' will remove all services.
    services: [<service names>]

    # (Optional) A list of key/value pairs defining labels which should be set
    # on the outgoing logs. The label values may contain $ENV_VARS, which will
    # be substituted using the environment for the corresponding service.
    labels:
      <label name>: <label value>

API and clients

The Pebble daemon exposes an API (HTTP over a unix socket) to allow remote clients to interact with the daemon. It can start and stop services, add configuration layers the plan, and so on.

There is currently no official documentation for the API at the HTTP level (apart from the code itself!); most users will interact with it via the Pebble command line interface or by using the Go or Python clients.

The Go client is used primarily by the CLI, but is importable and can be used by other tools too. See the reference documentation and examples at pkg.go.dev.

We try to never change the underlying HTTP API in a backwards-incompatible way, however, in rare cases we may change the Go client in a backwards-incompatible way.

In addition to the Go client, there's also a Python client for the Pebble API that's part of the ops library used by Juju charms (documentation here).

Hacking / Development

See HACKING.md for information on how to run and hack on the Pebble codebase during development. In short, use go run ./cmd/pebble.

Contributing

We welcome quality external contributions. We have good unit tests for much of the code, and a thorough code review process. Please note that unless it's a trivial fix, it's generally worth opening an issue to discuss before submitting a pull request.

Before you contribute a pull request you should sign the Canonical contributor agreement -- it's the easiest way for you to give us permission to use your contributions.

Have fun!

... and enjoy the rest of the year!

pebble's People

Contributors

Stargazers

Watchers

Forkers

martinrusev hpidcock niemeyer benhoyt mthaddon strogo afreiberger terrorizer1980 stulluk rbarry82 rwcarlsen tlm samkenxstream flotter anpep marcoppenheimer atesburak mwnsiri barrettj12 isabella232 rebornplusplus woky cjdcordeiro bryanchance merkata jameinel edsonmichaque nohaihab luis-pinto-fanduel paul-rodriguez sparkiegeek troyanov shipperizer dmitry-lyfar shayancanonical aliamerj khestia olivercalder holmanb linostar tonyandrewmeyer thp-canonical maas medvied paulomach phvalguima rudra-iitm zeyadyasser

pebble's Issues

Pebble should support the reuse of a container's define ENTRYPOINT or CMD

When writing new sidecar charms, it is convenient to reuse container images already in the wild. Unfortunately, having to specify the command to execute in the Pebble service sends developers down the rabbit whole of reconstructing what that command is. In some cases, when one has access to the Dockerfile, this is not an insurmountable burden. But things get far more complex when the only thing at hand is the Docker image itself, requiring specialized tools like dive to introspect the image.

config-changed hook is triggered before pebble ready

If pebble is not ready, charm container can not talk to app container, nothing could the charm do.
It seems make no sense to trigger config-changed or any hook before on_pebble_ready event.

However,I noticed when I run juju deploy or juju upgrade-charm, config-changed hook will be triggered
before pebble ready more than once. Here is the log in my pihole charm:

...
unit-pihole-0: 22:29:33 INFO unit.pihole/0.juju-log Running legacy hooks/start.
unit-pihole-0: 22:29:33 WARNING unit.pihole/0.juju-log config-changed hook triggered while pebble is not ready, defer event
unit-pihole-0: 22:29:33 INFO juju.worker.uniter.operation ran "start" hook (via hook dispatching script: dispatch)
unit-pihole-0: 22:29:36 WARNING unit.pihole/0.juju-log config-changed hook triggered while pebble is not ready, defer event
unit-pihole-0: 22:29:36 INFO unit.pihole/0.juju-log on_pihole_pebble_ready triggered
...

At first, I keep getting pebble errors in config-changed hook, until I realized the fact that pebble could be not ready.
I can see these logs now because I checked the exception and handled it.
This also implies, in the config-changed hook, every charm will need to repeat the code of checking pebble connection, catch ops.pebble.ConnectionError, and defer event.

What I expect:

if pebble is not ready, no hook should be triggered.

Replan crashes with panic when there are no active services

To reproduce, start pebble run with this config:

services:
    test:
        override: replace
        command: sleep 300

And then execute pebble replan. The server will panic. Here is the stack trace:

2021/12/13 12:26:59 http: panic serving pid=55794;uid=1000;socket=/home/ben/pebble/.pebble.socket;: runtime error: index out of range [0] with length 0
goroutine 50 [running]:
net/http.(*conn).serve.func1()
	/home/ben/sdk/go1.17/src/net/http/server.go:1801 +0xb9
panic({0x7ac960, 0xc000288030})
	/home/ben/sdk/go1.17/src/runtime/panic.go:1047 +0x266
github.com/canonical/pebble/internal/daemon.v1PostServices(0xa9c840, 0xc0002aa200, 0xc000282090)
	/home/ben/w/pebble/internal/daemon/api_services.go:186 +0x1072
github.com/canonical/pebble/internal/daemon.(*Command).ServeHTTP(0xa9c840, {0x856eb0, 0xc0002900c0}, 0xc0002aa200)
	/home/ben/w/pebble/internal/daemon/daemon.go:251 +0x27a
github.com/gorilla/mux.(*Router).ServeHTTP(0xc0000b8000, {0x856eb0, 0xc0002900c0}, 0xc0002aa000)
	/home/ben/go/pkg/mod/github.com/gorilla/[email protected]/mux.go:210 +0x1cf
github.com/canonical/pebble/internal/daemon.logit.func1({0x8570c0, 0xc0002b4000}, 0xc0002aa000)
	/home/ben/w/pebble/internal/daemon/daemon.go:322 +0xdd
net/http.HandlerFunc.ServeHTTP(0x0, {0x8570c0, 0xc0002b4000}, 0xc000290078)
	/home/ben/sdk/go1.17/src/net/http/server.go:2046 +0x2f
net/http.serverHandler.ServeHTTP({0x8562e8}, {0x8570c0, 0xc0002b4000}, 0xc0002aa000)
	/home/ben/sdk/go1.17/src/net/http/server.go:2878 +0x43b
net/http.(*conn).serve(0xc00028c0a0, {0x8593a0, 0xc0002140c0})
	/home/ben/sdk/go1.17/src/net/http/server.go:1929 +0xb08
created by net/http.(*Server).Serve
	/home/ben/sdk/go1.17/src/net/http/server.go:3033 +0x4e8

Service reload command

I think it would be very useful to be able to define a service reload command (or maybe some other method like a signal) and be able to reload through pebble. Having to always restart can lower availability.

Pebble is not always waiting for finished processes

There is pebble#6 where it raises how pebble should be working with system signals.

I've noticed that, over time, zombie processes start to show up in my app container.

I am running on top of a Ubuntu:20.04 image, and I can get a prompt on my container and see:

# ps aux
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root           1  0.0  0.0 713020 10452 ?        Ssl  10:04   0:00 /charm/bin/pebble run --create-dirs --hold --verbose
postgres      58  0.0  0.0      0     0 ?        Zs   10:04   0:00 [postgres] <defunct>
postgres      93  0.0  0.1 215584 27032 ?        S    10:04   0:00 postgres
postgres     114  0.0  0.0 215696  8148 ?        Ss   10:04   0:00 postgres: checkpointer   
postgres     115  0.0  0.0 215584  5556 ?        Ss   10:04   0:00 postgres: background writer   
postgres     116  0.0  0.0 215584  9772 ?        Ss   10:04   0:00 postgres: walwriter   
postgres     117  0.0  0.0 216140  8428 ?        Ss   10:04   0:00 postgres: autovacuum launcher   
postgres     118  0.0  0.0  69948  4540 ?        Ss   10:04   0:00 postgres: archiver   
postgres     119  0.0  0.0  69948  4540 ?        Ss   10:04   0:00 postgres: stats collector   
postgres     120  0.0  0.0 216004  6516 ?        Ss   10:04   0:00 postgres: logical replication launcher   
root         137  0.0  0.0   7244  3812 pts/0    Ss   10:16   0:00 bash
root         144  0.0  0.0   8900  3232 pts/0    R+   10:16   0:00 ps aux

Checking for the parent PID and all of them have pebble (PID 1) as its parent.

So, I collected some straces and I noticed that pebble does treat SIGCHLD in some cases:

[pid 1294529] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=207, si_uid=999, si_status=0, si_utime=5, si_stime=8} ---
[pid 1295049] futex(0xc78eb8, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid 1294529] rt_sigreturn({mask=[]} <unfinished ...>
[pid 1295049] <... futex resumed>)      = 1
[pid 1294529] <... rt_sigreturn resumed>) = 1
[pid 1294524] <... futex resumed>)      = 0
[pid 1295049] wait4(207,  <unfinished ...>

However, there are other occasions where the SIGCHLD seems to be ignored. If I run strace early enough on the pebble process, I can see that at the beginning, SIGCHLD will be ignored.

[pid 1404940] 16:06:59 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=273, si_uid=999, si_status=1, si_utime=2, si_stime=5} ---
[pid 1414197] 16:06:59 clone(child_stack=NULL, flags=CLONE_VM|CLONE_VFORK|SIGCHLDstrace: Process 1414287 attached
[pid 1414287] 16:06:59 rt_sigaction(SIGCHLD, {sa_handler=SIG_DFL, sa_mask=~[], sa_flags=SA_RESTORER|SA_ONSTACK|SA_RESTART|SA_SIGINFO, sa_restorer=0x465190}, NULL, 8) = 0
[pid 1414287] 16:06:59 rt_sigaction(SIGCHLD, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f66aeeff210}, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0
[pid 1414287] 16:06:59 rt_sigaction(SIGCHLD, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f66aeeff210}, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f66aeeff210}, 8) = 0
[pid 1414287] 16:06:59 rt_sigaction(SIGCHLD, {sa_handler=0x55bbedb74aa0, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f66aeeff210}, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f66aeeff210}, 8) = 0
[pid 1414287] 16:06:59 clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLDstrace: Process 1414288 attached
[pid 1414287] 16:06:59 rt_sigaction(SIGCHLD, {sa_handler=0x55bbedb74aa0, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f66aeeff210},  <unfinished ...>
[pid 1414288] 16:06:59 rt_sigaction(SIGCHLD, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f66aeeff210}, {sa_handler=0x55bbedb74aa0, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f66aeeff210}, 8) = 0
[pid 1414288] 16:06:59 rt_sigaction(SIGCHLD, {sa_handler=0x55bbedb74aa0, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f66aeeff210}, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f66aeeff210}, 8) = 0
[pid 1414288] 16:06:59 rt_sigaction(SIGCHLD, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f66aeeff210}, {sa_handler=0x55bbedb74aa0, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f66aeeff210}, 8) = 0

CombineLayers panics when merging layer with checks onto one without

gbeuzeboc described this issue on Mattermost -- opening a GitHub issue here to track.

Hello, I have been giving Pebble a try to see if we could use it to orchestrate some robotics applications and I have few question.
The first one is regarding Checks. I have one service and one check (each one defined in a different layer). If my check fails I want my service to restart but pebble seems to crash:

# 001-layer.yaml
summary: Simple layer

description: |
    A better description for a simple layer.

services:
    srv1:
        override: replace
        command: bash -c 'while :; do echo "srv1 is running"; sleep 1; done'
        startup: enabled

# 002-layer2.yaml
summary: Simple layer

description: |
    Another one

services:
    srv1:
        override: merge
        on-check-failure:
            mycustomcheck: restart

checks:
    mycustomcheck:
        override: replace
        exec:
            command: bash -c 'sleep 1; exit 1'

If I run pebble over this config I get:

2022-08-08T17:08:36.114Z [pebble] Started daemon.
2022/08/08 19:08:36 http: panic serving pid=3934413;uid=1000;socket=/tmp/pebble/.pebble.socket;: assignment to entry in nil map
goroutine 51 [running]:
net/http.(*conn).serve.func1(0xc00023a000)
        /usr/lib/go-1.13/src/net/http/server.go:1767 +0x139
panic(0x8bc980, 0x9ee260)
        /usr/lib/go-1.13/src/runtime/panic.go:679 +0x1b2
github.com/canonical/pebble/internal/plan.(*Service).Merge(0xc0002783c0, 0xc000278140)
        /home/guillaume/code/misc/pebble/internal/plan/plan.go:160 +0x5d0
github.com/canonical/pebble/internal/plan.CombineLayers(0xc000212710, 0x2, 0x2, 0x2, 0x2, 0x0)
        /home/guillaume/code/misc/pebble/internal/plan/plan.go:420 +0x3de
github.com/canonical/pebble/internal/plan.ReadDir(0xc0000223c7, 0xb, 0x521ed2, 0x890e20, 0x0)
        /home/guillaume/code/misc/pebble/internal/plan/plan.go:841 +0x1d6
github.com/canonical/pebble/internal/overlord/servstate.(*ServiceManager).reloadPlan(0xc00015e900, 0xc0000eb400, 0x510d5a)
        /home/guillaume/code/misc/pebble/internal/overlord/servstate/manager.go:100 +0x38
github.com/canonical/pebble/internal/overlord/servstate.(*ServiceManager).acquirePlan(0xc00015e900, 0xc00021e330, 0x16, 0x0)
        /home/guillaume/code/misc/pebble/internal/overlord/servstate/manager.go:216 +0xe9
github.com/canonical/pebble/internal/overlord/servstate.(*ServiceManager).DefaultServiceNames(0xc00015e900, 0x0, 0x0, 0x0, 0x0, 0x0)
        /home/guillaume/code/misc/pebble/internal/overlord/servstate/manager.go:316 +0x62
github.com/canonical/pebble/internal/daemon.v1PostServices(0xd11da0, 0xc00021c300, 0x0, 0x0, 0x0)
        /home/guillaume/code/misc/pebble/internal/daemon/api_services.go:78 +0x1734
github.com/canonical/pebble/internal/daemon.(*Command).ServeHTTP(0xd11da0, 0x9ff060, 0xc0002360e0, 0xc00021c300)
        /home/guillaume/code/misc/pebble/internal/daemon/daemon.go:260 +0x607
github.com/gorilla/mux.(*Router).ServeHTTP(0xc0001900c0, 0x9ff060, 0xc0002360e0, 0xc00021c100)
        /home/guillaume/go/pkg/mod/github.com/gorilla/[email protected]/mux.go:210 +0xe2
github.com/canonical/pebble/internal/daemon.logit.func1(0x9ff3e0, 0xc00024c000, 0xc00021c100)
        /home/guillaume/code/misc/pebble/internal/daemon/daemon.go:331 +0xde
net/http.HandlerFunc.ServeHTTP(0xc00013d700, 0x9ff3e0, 0xc00024c000, 0xc00021c100)
        /usr/lib/go-1.13/src/net/http/server.go:2007 +0x44
net/http.serverHandler.ServeHTTP(0xc0001e2000, 0x9ff3e0, 0xc00024c000, 0xc00021c100)
        /usr/lib/go-1.13/src/net/http/server.go:2802 +0xa4
net/http.(*conn).serve(0xc00023a000, 0xa00b20, 0xc000222040)
        /usr/lib/go-1.13/src/net/http/server.go:1890 +0x875
created by net/http.(*Server).Serve
        /usr/lib/go-1.13/src/net/http/server.go:2928 +0x384
2022-08-08T17:08:36.115Z [pebble] Cannot start default services: cannot communicate with server: Post http://localhost/v1/services: EOF

And then I cannot ctrl-c anymore I have to send a sigkill. Is my config supposed to be "legal"?

Structured logs

Pebble logs entries as plain text with a specified pattern. While that is functional for human consumption, it is far more error probe to process plain text log entries than structured ones, e.g., JSON. One of the use-cases to process Pebble logs in the observability domain would be to correlate downtime with issues reported by Pebble, e.g., services that won’t start.

Charm workloads may not receive a SIGTERM during charm upgrade

Issue

I find that in some cases a charm workload may not get a SIGTERM signal when it is shut down prior to being upgraded.

I ran into this issue in trying to develop this integration test for the Alertmanger k8s charm as implemented in this git branch. The purpose of the integration test is to check persistence of silences across charm upgrade. I have checked using just the Alertmanager standalone binary (version 0.23 same as charm's rock) that Alertmanager does indeed flush silences stored in memory to disk whenever it receives either a SIGTERM or a SIGINT signal. I have also checked that the Alertmanager charm does get a bound persistent volume claim (used to store silences under the path /alertmanger) which is the same across charm upgrade and that this volume is empty before and after charm upgrade even when a silence has been set and is visible in the Alertmanager dashboard. On manually sending either a SIGTERM or a SIGINT (using juju ssh or kubectl exec), to the Alertmanger charm workload prior to doing a charm upgrade, the silences do get flushed to the persistent volume and are visible across charm upgrade. The integration test also does this using a Juju action to send a SIGINT. Please note the current edge release of Alertmanager does not have this action which is required by the test. However if the Alertmanager charm from the linked branch (with the action) is deployed, the action invoked and the charm upgraded, silences do persist. As I understand Pebble sends a SIGTERM then a SIGKILL to a workload process when stopping it. Hence I would have expected Alertmanger charm workload will persist silences to disk anytime it is stopped prior to upgrade, without the need to send this signal using a Juju action for instance.

Impacts

If the charm workload (in this case Alertmanager) does not get a trappable signal like SIGTERM then this may lead to potential data loss since the workload is unable to flush in memory structures to disk and gracefully exit. A workaround may be to provide the charm with a Juju action to issue such a signal prior to charm upgrade. However any such workaround is prone to human error on failing to notice what needs to be done prior to upgrade.

Forward stdout from services

Feature Request.

For integration with Kubernetes and Docker, it would be helpful if the stdout from services could be forwarded to the stdout for the main process, perhaps prefixed with the name of the process. To be concrete, from a test running pebble in a container, I get the following output:

$ docker run --rm --name test test
daemon.go:318: Started daemon.
daemon.go:289: POST /v1/services 2.114401ms 202
cmd_run.go:154: Started default services with change 1.
cmd_run.go:162: Exiting on terminated signal.

But I can't tell what the stdout from my service is. I'm sure there is a way I can get this from the pebble service, but what about providing something like the following as well:

$ docker run --rm --name test test
[PEBBLE] Started daemon.
[PEBBLE] POST /v1/services 2.114401ms 202
[PEBBLE] Started default services with change 1.
[A_SVC] My web app has started!
[B_SVC] Started B service
[A_SVC] Serving on port 80
[A_SVC] Shutting down
[PEBBLE] Exiting on terminated signal.

One issue is that if a pod is terminated, I really only have kubernetes logs to look at, which by default will only be the Pebble output and won't tell me about what's happening in my service.

Apologies if this is already in progress or if there's a flag I missed.

autostart fails if service has already started

This happens when pebble_ready is deferred, but another event (re)starts the service. When pebble_ready re-enters, I get this error:

Traceback (most recent call last):
  File "./src/charm.py", line 443, in <module>
    main(AlertmanagerCharm)
  File "/var/lib/juju/agents/unit-alertmanager-0/charm/venv/ops/main.py", line 406, in main
    _emit_charm_event(charm, dispatcher.event_name)
  File "/var/lib/juju/agents/unit-alertmanager-0/charm/venv/ops/main.py", line 140, in _emit_charm_event
    event_to_emit.emit(*args, **kwargs)
  File "/var/lib/juju/agents/unit-alertmanager-0/charm/venv/ops/framework.py", line 278, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-alertmanager-0/charm/venv/ops/framework.py", line 722, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-alertmanager-0/charm/venv/ops/framework.py", line 767, in _reemit
    custom_handler(event)
  File "./src/charm.py", line 67, in wrapped
    func(self, event, *args, **kwargs)
  File "./src/charm.py", line 213, in _on_alertmanager_pebble_ready
    container.autostart()
  File "/var/lib/juju/agents/unit-alertmanager-0/charm/venv/ops/model.py", line 1042, in autostart
    self._pebble.autostart_services()
  File "/var/lib/juju/agents/unit-alertmanager-0/charm/venv/ops/pebble.py", line 791, in autostart_services
    return self._services_action('autostart', [], timeout, delay)
  File "/var/lib/juju/agents/unit-alertmanager-0/charm/venv/ops/pebble.py", line 831, in _services_action
    raise ChangeError(change.err, change)
ops.pebble.ChangeError: cannot perform the following tasks:
- Start service "alertmanager" (service "alertmanager" was previously started)

This is expected when trying to start the same service twice, but with autostart I expected this to work. Is this a bug or by design?

Environments should be a plain dictionary

I was the one proposing it as a list, mainly because if we want to use interpolation of variables we need to know the order to take place. With that said, looking at code in the wild that has a list of Python dictionaries inside a Python list inside a Python dict, made me cringe a bit, and I fear we'll regret that decision, which soon cannot be undone once 2.9 is out.

As such, I suggest some quick action on our end to undo that both in the operator and in pebble, so that becomes just a plain map (dict) everywhere.

To solve the ordering issue, we can do a slightly more complex explicit dependency check across the strings, to make sure that the ordering is right and to prevent clients having to deal with unusual ordered dictionaries.

How does that sound, @benhoyt?

juju unit IP will change on each charm upgrade

How to reproduce:

deploy any k8s charm to a microk8s juju env, you will see unit IP in juju status
run a charm upgrade: juju upgrade-charm --path ./$APP.charm $APP
when juju settle, you will noticed unit IP changed.

I understand the juju unit in k8s env is a pod, and this makes sense in k8s context.
However, in juju context, for classic juju unit/charm, IP won't change in charm upgrade.

I am not sure this is a bug or not, raise the issue for discussion.

ip address is not availabe or incorrect on pebble_ready

Environment

mickrok8s in multipass (4 cpus, 8G ram):

$ juju --version
2.9.0-ubuntu-amd64

$ microk8s kubectl version
Client Version: version.Info{Major:"1", Minor:"20+", GitVersion:"v1.20.7-34+df7df22a741dbc", GitCommit:"df7df22a741dbc18dc3de3000b2393a1e3c32d36", GitTreeState:"clean", BuildDate:"2021-05-12T21:08:20Z", GoVersion:"go1.15.10", Compiler:"gc", Platform:"linux/amd64"}

Description

When I deploy and then add 3 units (alertmanager, in my case), occasionally the ip address is not ready.
This does not happen if I manually (slowly) add the units one by one.

Expected vs actual

Expected: An IP address is ready (and correct) by the time pebble_ready fires.
Actual: Occasionally, when adding multiple units at once, IP address is not available (or old) when queried from within pebble_ready.

Reproducible scenarios

While adding 3 units

Very consistently, the following code assigns None to bind_address, for 1-2 of the added units:

relation = self.model.get_relation("replicas")
bind_address = self.model.get_binding(relation).network.bind_address

Similarly, unit-get occasionally returns an empty string under the same circumstances:

bind_address = check_output(["unit-get", "private-address"]).decode().strip()

When restarting the machine

I have 4 units running, and then suddenly I sudo reboot. When the application (alertmanager) starts up, bind_address returns the IP address from the previous boot.

Add Go client and CLI commands for files API

Progress on this:

For completeness, we need to Go back and implement the Go client and CLI commands for the files API. Our previously discussed approach to this is shown in the spec. See there for details, but in summary:

# Go client. For now, we're not proposing to support multi-file push/pull (same as Python client).
func (*Client) Pull(opts *PullOptions) (PushResult, error)
func (*Client) Push(opts *PushOptions) (PushResult, error)
func (*Client) ListFiles(opts *ListFilesOptions) (ListFilesResult, error)
func (*Client) MakeDir(opts *MakeDirOptions) (MakeDirResult, error)
func (*Client) RemovePath(opts *RemovePathOptions) (RemovePathResult, error)

# Pebble CLI.
pebble push file.txt /path [-m|--mode=644 --user=USER|uid --group=GROUP|gid]
pebble pull /path file.txt
pebble cat /path # same as pull, but write to stdout
pebble mkdir /path [-p|--parents -m|--mode=755 --user=USER|uid --group=GROUP|gid]
pebble rm /path [-r|--recursive]
pebble ls '/etc/*.conf'  # path can be dir, file, or glob

@tunahanertekin's original message:

Hi there, Pebble's File API usage is explained here when using it on Juju operator framework. Is there a way to use Pebble's File API on Golang? I've tried to find it in Pebble's Go client but I couldn't.

Pebble not passing configured environment variables to services

When testing out the config of a new Pebble-based charm, it seems to not be passing env variables to child processes.

A simple test (these files are uploaded to this issue - including the built pebble binary I'm using) :

Dockerfile:

FROM ubuntu:focal

ENV PEBBLE /config

COPY pebble /usr/bin/pebble
COPY pebble-test /usr/bin/pebble-test
COPY 000-test-layer.yaml /config/layers/000-test-layer.yaml

CMD ["/usr/bin/pebble", "run"]

pebble-test:

#!/bin/bash
if [[ -z "${MYVAR:-}" ]]; then
  echo "Variable MYVAR is unset" > /tmp/out.log
else
  echo "The value of MYVAR is: ${MYVAR}" > /tmp/out.log
fi

000-test-layer.yaml

---
summary: Test Layer

services:
  pebbletest:
    override: replace
    summary: Simple test layer
    command: /usr/bin/pebble-test
    default: start
    environment:
      - MYVAR: my-very-important-value

The following commands then illustrate the issue:

$ docker build -t jnsgruk/pebble-test:latest .
$ docker run -d --name pbl --rm -it jnsgruk/pebble-test:latest
$ docker exec -it pbl cat /tmp/out.log
Variable MYVAR is unset

# Cleanup: docker stop pbl

Thanks :)

pebble-env-issue-test.zip

Discussion: feature to allow passing environment variables from Pebble's own environment to services

This is part request, part discussion...!

Could we/should we support the ability for Pebble to pass elements of its own environment to running services?

For example. if I were to start pebble with SOME_VAR=foobar pebble run --hold, it would be interesting to allow a mechanism for passing SOMEVAR to the services it manages?

Files API: add way to create symlinks (and discuss symlink handling)

Per https://bugs.launchpad.net/juju/+bug/1945941, it would be good to have a Pebble way to create symlinks using the files API. However, we should also think through and discuss how symlinks work in the files API more broadly, e.g., how do push and pull interact with symlinks (eg: does pull read the link or the contents of the file at the link), and so on. So this may need a small spec.

fire an event when the managed process dies

I ssh-ed into the workload container and killed process that was started by the layer service command, and... all green and no event firing.
Would it be possible to have pebble fire an event when the process it's managing dies?

Wait for child processes to terminate when stopping a service

Per discussion on Tuesday, we should update the stop-service API to wait for all child processes in the process group to terminate before returning (or sending SIGKILL).

Currently when doing "stop" we send SIGTERM to the process group, which won't change. But then we only wait for the immediate PID to exit. We should change that to use https://linux.die.net/man/2/waitpid and wait for all processes in the process group to finish. That will do the right thing for services that start other commands, for example a shell script /bin/sh -c 'foo bar'.

Use entrypoint of a Docker Image as default Command

Currently if you want to start a container through Pebble you will have to provide the command to start the container in the command field.

This command seems to be equivalent to the ENTRYPOINT field of a Dockerfile, for example.

In the case of a container definition in a Pod in Kubernetes the command field is optional and will default to the entrypoint if none is stated. Similar default behavior in Pebble would make sense. Additionally an arguments field to expand the entrypoint would also be very useful, for further configuration.

More context on how this is done in Kubernetes: https://kubernetes.io/docs/tasks/inject-data-application/define-command-argument-container/

Use case of a pod_spec charm where no command option is provided, but a argument option is: https://github.com/charmed-kubernetes/kubernetes-dashboard-operator/blob/main/charms/kubernetes-dashboard/src/charm.py#L79

Add a prometheus-compatible metrics endpoint format to pebble check URL

As documented in https://juju.is/docs/sdk/pebble#heading--check-health-endpoint-and-probes:

"""
As of Juju version 2.9.26, Pebble includes an HTTP /v1/health endpoint that allows a user to query the health of configured checks, optionally filtered by check level with the query string ?level=<level>. This endpoint returns an HTTP 200 status if the checks are healthy, HTTP 502 otherwise.
"""

It would be great if there was an option to expose this in a format that prometheus could ingest. If so, we could easily integrate charms with COS by configuring them to hit this pebble endpoint and ingest the data exposed there.

`pebble logs -f` blocks shutdown of pebble

SIGINT/SIGTERM to pebble daemon while pebble logs -f is active does not behave as expected.

Pebble daemon waits for the /v1/logs?follow=true request to be cancelled.

Allow to specify working directory for exec commands

When executing a command, especially within containers that do not have a shell (e.g., Distroless), it is very useful to be able to optionally specify the working directory for running the command. When a working directory is specify, but it does exist or the resolved file handle is not a directory, the exec command should fail.

Need a way to send signals to services (eg: SIGHUP to reload config)

Currently you can stop and start services using pebble, but it would be useful to have a way of reloading them for services that support that, to avoid service interruptions where possible.

I've also filed canonical/operator#491 to enable this in the Operator Framework when it's available.

Pebble `replan`, restart, and any other invocation does not seem to re-invoke with new env variables

Grafana is commonly provisioned using environment variables to override defaults rather than using the config file. It doesn't support SIGHUP in any case.

When the environment for a service is changed, the new Pebble plan accurately reflects it. When the service is restarted (or replanned), it's restarted as normal.

But the new environment variables are not present.

After a new config value was set in the charm which updated the environment:

root@grafana-0:/usr/share/grafana# /charm/bin/pebble plan                                                                                                                                                                                       
services:                                                                                                                                                                                                                                       
    grafana:                                                                                                                                                                                                                                    
        summary: grafana-k8s service                                                                                                                                                                                                            
        startup: enabled                                                                                                                                                                                                                        
        override: replace
        command: grafana-server -config /etc/grafana/grafana-config.ini
        environment:
            GF_DATABASE_TYPE: sqlite3
            GF_LOG_LEVEL: info
            GF_PATHS_PROVISIONING: /etc/grafana/provisioning 
            GF_SECURITY_ADMIN_PASSWORD: rIMaoW7Fml26
            GF_SECURITY_ADMIN_USER: admin
            GF_SERVER_HTTP_PORT: "3000"
            GF_SERVER_ROOT_URL: http://0.0.0.0:3000/grafana
            GF_SERVER_SERVE_FROM_SUB_PATH: "true"
root@grafana-0:/usr/share/grafana# xargs -0 -L1 -a /proc/45/environ | sort
GF_DATABASE_TYPE=sqlite3
GF_LOG_LEVEL=info
GF_PATHS_CONFIG=/etc/grafana/grafana.ini
GF_PATHS_DATA=/var/lib/grafana
GF_PATHS_HOME=/usr/share/grafana
GF_PATHS_LOGS=/var/log/grafana
GF_PATHS_PLUGINS=/var/lib/grafana/plugins
GF_PATHS_PROVISIONING=/etc/grafana/provisioning
GF_SECURITY_ADMIN_PASSWORD=rIMaoW7Fml26
GF_SECURITY_ADMIN_USER=admin
GF_SERVER_HTTP_PORT=3000
GRAFANA_PORT=tcp://10.152.183.68:65535
GRAFANA_PORT_65535_TCP=tcp://10.152.183.68:65535
GRAFANA_PORT_65535_TCP_ADDR=10.152.183.68
GRAFANA_PORT_65535_TCP_PORT=65535
GRAFANA_PORT_65535_TCP_PROTO=tcp
GRAFANA_SERVICE_HOST=10.152.183.68
GRAFANA_SERVICE_PORT=65535
GRAFANA_SERVICE_PORT_PLACEHOLDER=65535
HOME=/root
HOSTNAME=grafana-0
JUJU_CONTAINER_NAME=grafana
KUBERNETES_PORT=tcp://10.152.183.1:443
KUBERNETES_PORT_443_TCP=tcp://10.152.183.1:443
KUBERNETES_PORT_443_TCP_ADDR=10.152.183.1
KUBERNETES_PORT_443_TCP_PORT=443
KUBERNETES_PORT_443_TCP_PROTO=tcp
KUBERNETES_SERVICE_HOST=10.152.183.1
KUBERNETES_SERVICE_PORT=443
KUBERNETES_SERVICE_PORT_HTTPS=443
MODELOPERATOR_PORT=tcp://10.152.183.153:17071
MODELOPERATOR_PORT_17071_TCP=tcp://10.152.183.153:17071
MODELOPERATOR_PORT_17071_TCP_ADDR=10.152.183.153
MODELOPERATOR_PORT_17071_TCP_PORT=17071
MODELOPERATOR_PORT_17071_TCP_PROTO=tcp
MODELOPERATOR_SERVICE_HOST=10.152.183.153
MODELOPERATOR_SERVICE_PORT=17071
PATH=/usr/share/grafana/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PEBBLE_SOCKET=/charm/container/pebble.socket
TZ=UTC

Where are GF_SERVER_SERVE_FROM_SUBPATH and GF_SERVER_ROOT_URL?

Easier support for debugging container pebble

So we have support for "export PEBBLE=..." when you want to play with pebble as an application manager and interact with it. (you can set PEBBLE for both the client and the server and it will work).

However, if you want to juju ssh into a workload and debug against pebble, you actually have to use the undocumented PEBBLE_SOCKET environment variable.
PEBBLE always appends .pebble.socket while in containers the actual path is, /charm/containers/$CONTAINERNAME/pebble.socket

Ideally it would be something like
pebble -c container get-plan

and 'juju ssh' could set an environment variable pointing at /charm or somesort.

At a minimum, we need to officially document PEBBLE_SOCKET and how to interact with sidecars and their pebble.

Pebble's not robust to wrong input

If pebble is fed a layer which has a service with no command specified, it will panic upon starting that service. The daemon will instead crash immediately upon start if the service's dictionary is empty.

Steps to reproduce

git clone https://github.com/canonical/pebble.git
cd pebble
make build
export PEBBLE=`pwd`/pebble
mkdir $PEBBLE/layers
cat >$PEBBLE/layers/001-config.yaml <<EOF
services:
    my-service:
        override: replace
EOF
# NOTE no command on the my-service service
_build/_/bin/pebble run -v --create-dirs

In another terminal:

_build/_/bin/pebble start my-service

That command produces the following error in the terminal were the pebble daemon is running:

panic: runtime error: slice bounds out of range [1:0]

goroutine 52 [running]:
github.com/canonical/pebble/internal/overlord/servstate.(*ServiceManager).doStart(0xc00014c1e0, 0xc00014b320, 0xc00021c0f0, 0x0, 0x0)
	/home/facu/canonical/sidecar-charm/pebble/internal/overlord/servstate/handlers.go:89 +0x177d
github.com/canonical/pebble/internal/overlord/state.(*TaskRunner).run.func1(0x0, 0x0)
	/home/facu/canonical/sidecar-charm/pebble/internal/overlord/state/taskrunner.go:195 +0xca
gopkg.in/tomb%2ev2.(*Tomb).run(0xc00021c0f0, 0xc000212300)
	/home/facu/go/pkg/mod/gopkg.in/[email protected]/tomb.go:163 +0x3c
created by gopkg.in/tomb%2ev2.(*Tomb).Go
	/home/facu/go/pkg/mod/gopkg.in/[email protected]/tomb.go:159 +0x112

If the service's dictionary is empty:

cat >$PEBBLE/layers/001-config.yaml <<EOF
services:
    my-service: ~
EOF
_build/_/bin/pebble run -v --create-dirs

That'll produce:

2021-06-23T10:21:00.044Z [pebble] Started daemon.
2021/06/23 12:21:00 http: panic serving pid=2083067;uid=1000;socket=/home/facu/canonical/sidecar-charm/pebble/pebble/.pebble.socket;: runtime error: invalid memory address or nil pointer dereference
goroutine 50 [running]:
net/http.(*conn).serve.func1(0xc000208000)
	/usr/lib/go-1.13/src/net/http/server.go:1767 +0x149
panic(0xa032a0, 0xe21550)
	/usr/lib/go-1.13/src/runtime/panic.go:679 +0x1e0
github.com/canonical/pebble/internal/plan.ParseLayer(0x1, 0xc00020208c, 0x6, 0xc000246000, 0x1c, 0x21c, 0x0, 0x0, 0x0)
	/home/facu/canonical/sidecar-charm/pebble/internal/plan/plan.go:279 +0x697
github.com/canonical/pebble/internal/plan.ReadLayersDir(0xc0001fe100, 0x37, 0x0, 0x0, 0x0, 0x0, 0x0)
	/home/facu/canonical/sidecar-charm/pebble/internal/plan/plan.go:340 +0xe5d
github.com/canonical/pebble/internal/plan.ReadDir(0xc000028087, 0x30, 0x0, 0x0, 0x0)
	/home/facu/canonical/sidecar-charm/pebble/internal/plan/plan.go:362 +0x27d
github.com/canonical/pebble/internal/overlord/servstate.(*ServiceManager).reloadPlan(0xc000122140, 0x0, 0x0)
	/home/facu/canonical/sidecar-charm/pebble/internal/overlord/servstate/manager.go:60 +0x64
github.com/canonical/pebble/internal/overlord/servstate.(*ServiceManager).acquirePlan(0xc000122140, 0x0, 0x0, 0x0)
	/home/facu/canonical/sidecar-charm/pebble/internal/overlord/servstate/manager.go:174 +0x11e
github.com/canonical/pebble/internal/overlord/servstate.(*ServiceManager).DefaultServiceNames(0xc000122140, 0x0, 0x0, 0x0, 0x0, 0x0)
	/home/facu/canonical/sidecar-charm/pebble/internal/overlord/servstate/manager.go:258 +0x7c
github.com/canonical/pebble/internal/daemon.v1PostServices(0xe28520, 0xc00021c200, 0x0, 0x0, 0x0)
	/home/facu/canonical/sidecar-charm/pebble/internal/daemon/api_services.go:73 +0x5f3
github.com/canonical/pebble/internal/daemon.(*Command).ServeHTTP(0xe28520, 0xb2a740, 0xc0002000c0, 0xc00021c200)
	/home/facu/canonical/sidecar-charm/pebble/internal/daemon/daemon.go:248 +0x547
github.com/gorilla/mux.(*Router).ServeHTTP(0xc0001a2000, 0xb2a740, 0xc0002000c0, 0xc00021c200)
	/home/facu/go/pkg/mod/github.com/gorilla/[email protected]/mux.go:210 +0x1e6
github.com/canonical/pebble/internal/daemon.logit.func1(0xb2ab00, 0xc000222000, 0xc00021c000)
	/home/facu/canonical/sidecar-charm/pebble/internal/daemon/daemon.go:308 +0x143
net/http.HandlerFunc.ServeHTTP(0xc000127200, 0xb2ab00, 0xc000222000, 0xc00021c000)
	/usr/lib/go-1.13/src/net/http/server.go:2007 +0x44
net/http.serverHandler.ServeHTTP(0xc0001b4000, 0xb2ab00, 0xc000222000, 0xc00021c000)
	/usr/lib/go-1.13/src/net/http/server.go:2802 +0x20f
net/http.(*conn).serve(0xc000208000, 0xb2bec0, 0xc0001f4040)
	/usr/lib/go-1.13/src/net/http/server.go:1890 +0x1716
created by net/http.(*Server).Serve
	/usr/lib/go-1.13/src/net/http/server.go:2928 +0x931

Expected behavior

Pebble daemon shouldn't crash, and ideally should return to the client/user some indication that the missing command is an issue or that the service's dictionary is invalid.

Also, pebble's README.md lists command as an optional field, which apparently it's not.

The service appears to be running, even the command ends with exit code 1

I tried setting up the service with a command that ends with output code 1 (due missing python package). However, the service says that it is running without any errors.

An example of what I mean.

Add API to send a signal to an active service

This is described in spec JU029. Basically a simple API at /v1/signals to send a signal to the given services (and similar functionality in the Python Operator Framework). Just opening this issue as a placeholder for the feature.

Ability to run one-shot commands

At present, Pebble is only capable of running daemon style processes. If a short-running process is specified in a layer, an error is returned stating the process exited too quickly.

I can see two use cases for an API that would allow the running of arbitrary commands;

Initial setup scripts/tasks prior to starting the "main" service in a container

A nice way to expose this could be include a type field under services, something like the following:

services:
  setup:
    # summary, override, startup fields omitted
    type: command  # or perhaps 'one-shot'?
    before:
      - mysql
    command: /setup.sh
  
  mysql:
    # summary, override, startup fields omitted
    type: daemon
    after:
      - setup
    command: /usr/bin/mysqld

Should this be introduced, it should perhaps assume daemon if not specified, so that existing charms written using Pebble will continue to function as they currently do.

Throughout the charm's lifecycle, perhaps as part of an action, an API to run commands in the workload container would be helpful (and could easily be used to obviate the feature above by allowing the developer to just run commands in their pebble_ready handler):

# ...
def _on_mysql_pebble_ready(self, event):
  container = event.workload
  result = container.run(["/path/to/bin", "--arg1", "--arg2"])
  logger.info("return code: %d", result.return_code)
  logger.info("stdout: %s ", result.stdout)
  logger.info("stderr: %s ", result.stderr)
  # ...

This would undoubtedly help with the sort of issues being experienced in this PR for the mysql charm, and is related to LP#1923822.

Start a service with a custom user

While migrating mysql-operator to pebble I discover that some services (mysql for instance) refuses to start with root user.

Would be nice if we have the chance of setting a custom user to run the service, like for example:

        layer = {
            "summary": "MySQL layer",
            "description": "Pebble layer configuration for MySQL",
            "services": {
                "mysql": {
                    "override": "merge",
                    "summary": "mysql daemon",
                    "command": "docker-entrypoint.sh mysqld",
                    "command_user": "mysql", 
                    "startup": "enabled",
                    "environment": {
                        "MYSQL_ROOT_PASSWORD": "SuperSecretPassword",
                    },
                },
            },
        }

Forward signals to child processes

Assuming pebble runs a PID 1 in a container, it needs to take care of some tasks, most notably forwarding signals to child processes and dealing with zombie & orphan processes. Handling signals is especially key, as this is required to allow processes to shutdown cleanly e.g. databases to finish writes and log, web apps to handle final requests. At the moment, signal forwarding doesn't seem to be happening by default at least.

To test this I used the following Dockerfile:

FROM debian:latest

COPY pebble /pebble
COPY test.sh /test.sh
RUN mkdir --parents /.pebble/layers
COPY 01.yaml /.pebble/layers/
ENV PEBBLE /.pebble

CMD ["/pebble", "run"]

With the pebble config:

summary: Test Layer

services:

    srv1:
        override: replace
        summary: Service summary
        command: /test.sh
        default: start

And test.sh:

#!/bin/bash

function shutdown()
{
    echo "Graceful shutdown" >> /log
    exit
}

trap shutdown SIGTERM

echo "Started" >> /log
while true
do
    echo Sleeping >> /log
    sleep 10
done

Build this image and run it using Docker:

$ docker build -t test .
...
$ docker run -d --name test testdocker run

Now run docker stop which will send a SIGTERM to the container and then look at the logfile:

$ docker stop test
test
$ docker cp test:/log ./log
$ cat log
Started
Sleeping
Sleeping

Note that "Graceful shutdown" does not appear, indicating that the signal is not being forwarded to the child process. If we test the same script with tini we can observe the correct behaviour:

$ cat Dockerfile.tini
FROM debian:latest

ENV TINI_VERSION v0.19.0
ADD https://github.com/krallin/tini/releases/download/${TINI_VERSION}/tini /tini
RUN chmod +x /tini
COPY test.sh /test.sh

CMD ["/tini", "--", "/test.sh"]

$ docker build -f Dockerfile.tini -t test-tini .
...
$ docker rm test-tini                     
test-tini
$ docker run -d --name test-tini test-tini 
853db290cfeeb797d21203ccfa07221b7e2f6ef34b0f755a6e97bd4471ad8d16
$ docker stop test-tini
test-tini

$ docker cp test-tini:/log ./log
$ cat log
Started
Sleeping
Graceful shutdown

In the future, signal handling could be a great place for pebble to innovate, for example adding hooks to allow extra processing on signals - I can imagine writing extra messages to logs or sending shutdown warnings to dependents.

Strip timestamp from service logs to avoid double-timestamp issue

Currently when a service outputs timestamps itself we get double timestamps in the log, because Pebble also adds a timestamp-and-service-name prefix to every log line. We want to trim/strip the service's timestamp to avoid the double-up.

Some notes per discussion with Harry and Gustavo:

Add a new "time-trim" field per service: defaults to "auto", can also be "disabled" (or a format string -- maybe later)
How to do time.ParsePrefix prefix parsing? Go's time.Parse doesn't support parsing a timestamp as a prefix
- See also https://pypi.org/project/python-dateutil/ and https://github.com/araddon/dateparse
Auto-detect might detect format once at startup, or re-detect every so often
Should we used the parsed time in place of the Pebble timestamp, or just drop it?
Feedback from Thomas: fair bit of overhead, do performance testing (logs/s)

'pebble run CUSTOMDIR'

This is a feature request to support instantiating a pebble instance to a custom directory, rather than /var/lib/pebble/default.
This is mostly to make it easy to test against pebble, without having it actually as PID 1.

I guess you can export PEBBLE=$HOME/test, so maybe that should just be documented as part of pebble run help text?

Container.push() won't use passed user:group to create directories.

When calling container.push(make_dirs=True, user=user, group=group), the directories are not created using the user:group. This can lead to the file being unreachable when Pebble is running a workload with specific user:group.

I created a custom path for a file on /mydir, pushing a file with:

container.push(path, rendered, permissions=0o660, make_dirs=True, user="redis", group="redis")

After container is running, I can inspect the permissions of the directory:

root@redis-k8s-0:/mydir# ls -la                         
total 12                                                
drwxr-xr-x 2 root  root  4096 Jul 15 06:56 .            
drwxr-xr-x 1 root  root  4096 Jul 15 06:56 ..           
-rw-r----- 1 redis redis  539 Jul 15 06:56 sentinel.conf

Only the file has the correct user:group

Expose Pod and Container fields to a running Container

I'm doing a proof of concept conversion of the k8s postgresql charm to the new sidecar/pebble approach, but I'm getting an error because I'm trying to implement what the old charm does, which is to pass the following via pod-spec-set:

config_fields = {
    "JUJU_NODE_NAME": "spec.nodeName",
    "JUJU_POD_NAME": "metadata.name",
    "JUJU_POD_NAMESPACE": "metadata.namespace",
    "JUJU_POD_IP": "status.podIP",
    "JUJU_POD_SERVICE_ACCOUNT": "spec.serviceAccountName",
}
env_config = {k: {"field": {"path": p, "api-version": "v1"}} for k, p in config_fields.items()}

The env_config above can be passed to pod-spec-set as envConfig, but can't be passed to pebble due to unmarshal errors (pebble is expecting a dictionary of key/value pairs).

Make exec checks consistently not inherit parent environment

Currently exec checks inherit the parent/Pebble environment if the check configuration has no environment variables set, but does not inherit it if there are one or more environment fields. This is inconsistent and unexpected -- an artifact of how we're setting up exec.Command.Env, which inherits os.Environ() if nil.

I'm not sure whether checkers should inherit the parent environment or not.

Services do inherit, and one-shot commands do not. I think both of those are by design. I remember discussing with Gustavo that one-shot commands probably should not inherit. But we need to decide whether or not checkers should inherit the environment. I suspect it'll be less of a breaking change if we have them inherit, because most people probably aren't setting environment variables for checkers. Are there any security or other concerns with doing so?

Either way we should be consistent, and not have inheritance depend on whether or not there are any environment variables in the config.

Pebble Connection Issue

Using this charm, a user reported an issue connecting to pebble socket. Feel free to close if this has been reported

pebble exec: sending stdin to "use-terminal" process doesn't work

Per discussion on PR #61, there's a bug in the one-shot commands handling where you can't send stdin to an exec process with "use-terminal" true.

We were aware of this issue before merging #61, but wanted to get that PR merged and fix this later, as it isn't a critical issue.

It's easy to reproduce:

$ echo foo | PEBBLE=~/pebble go run ./cmd/pebble exec -t -- cat  # -t is the default
exit status 129

# You either get that error, or sometimes this one:
$ echo foo | PEBBLE=~/pebble go run ./cmd/pebble exec -t -- cat
error: cannot perform the following tasks:
- exec command "cat" (fork/exec /usr/bin/cat: input/output error)
exit status 1

# In non-terminal mode ("use-terminal" false), it works fine:
$ echo foo | PEBBLE=~/pebble go run ./cmd/pebble exec -T -- cat
foo

On the server end, it's receiving some kind of POLLNVAL signal, and prints that in the logs:

2021-09-30T20:55:14.332Z [pebble] POST /v1/exec 14.740358ms 202
2021-09-30T20:55:14.345Z [pebble] GET /v1/tasks/1153/websocket/control 12.786948ms 200
2021-09-30T20:55:14.346Z [pebble] GET /v1/tasks/1153/websocket/stdio 54.774µs 200
2021-09-30T20:55:14.346Z [pebble] Detected poll(POLLNVAL) event: exiting.
2021-09-30T20:55:14.375Z [pebble] GET /v1/changes/1153/wait 29.210628ms 200

Add a way to detect and introspect check failures

Per thread at #86 (comment), I previously had LastError and ErrorDetails fields on the CheckInfo API struct. However, this was a bit ad-hoc and didn't fit our existing constructs like task logs. I've pulled those out of that PR for now, but we should design a better approach.

It would seem quite heavy to just use Pebble "changes" and "tasks" for checks. If we did that, I think each check would have to be a change (with one task?) and that would flood the changes list. Besides, these aren't really state changes, they're checks. ("Change represents a tracked modification to the system state.")

So instead I suggest (idea from @niemeyer) using the same "log" concept that a task has, which is a Log []string field, and adding that to the CheckInfo object. We could keep the last 10 like a Task does. Each log string would be a timestamp-formatted string with a prefix like ERROR and a message, like so:

2022-02-22T11:40:53+13:00 ERROR timed out after 100ms: context deadline exceeded
2022-02-22T11:50:53+13:00 ERROR timed out after 100ms: context deadline exceeded

This field would be in the CheckInfo struct for a particular named check, so no further context is needed. The log would contain check failure errors and the details that was previously in ErrorDetails (truncate HTTP body for http checks, last few line of command output for exec checks).

The /v1/checks API would return this information as a list of strings. The CLI pebble checks could return the (possibly truncated) first log line by default, and the full log if you specify pebble checks --verbose. Something like this:

$ pebble checks
Check  Level  Status  Failures  Log
chk1   -      up      0/3
chk2   alive  down    2/2       2022-02-22T11:40:53+13:00 ERROR timed out after 100ms...

$ pebble checks --verbose
Check  Level  Status  Failures
chk1   -      up      0/3
chk2   alive  down    2/2
    2022-02-22T11:40:53+13:00 ERROR timed out after 100ms: context deadline exceeded 
    2022-02-22T11:50:53+13:00 ERROR timed out after 100ms: context deadline exceeded

It's an open question whether the log would be cleared/reset when the check became healthy again. Probably not, as it's a log/history?

Pebble creates a large number of threads

In testing @benhoyt's snappass charm, I noticed that pebble was creating (and not terminating) an ever growing number of threads.

This can be replicated like so (assuming Juju is bootstrapped on Microk8s)

$ juju add-model snappass
$ juju deploy snappass-test --resource snappass-image=benhoyt/snappass-test --resource redis-image=redis
# Wait for the app to start - watch in `juju status` output
$ kubectl -n snappass exec -it snappass-test-0 -c snappass -- /bin/bash
$ watch -n1 "cat /proc/1/status | grep Thread"

After running the workload for ~1hour there were over 800 threads (albeit sleeping but still present) in each container. The climbing number of threads seems to correlate to calls to the /v1/system-info endpoint. I only came to this conclusion by watching the growing number of threads, which seemed to correlate to additions to the container logs as observed with:

kubectl logs -n snappass -c snappass snappass-test-0 -f

`pebble logs -f` does not see services started after API call.

If pebble logs -f happens before pebble start <svc>, logs from <svc> are omitted.

`pebble logs` does not see previous runs of a service

Restarting a service pebble stop <svc> && pebble start <svc> causes logs from previous service processes to be lost.

When starting a service with non-root user Pebble environment variables used own to root

We are creating a Pebble charm that creates a service with user and group: "appuser". That user exists in the image.

layer_config = {
            "summary": "lcm layer",
            "description": "pebble config layer for nbi",
            "services": {
                self.service_name: {
                    "override": "replace",
                    "summary": "lcm service",
                    "command": "python3 -m osm_lcm.lcm",
                    "startup": "enabled",
                    "user": "appuser",
                    "group": "appuser",
                    "environment": environments,
                }
            },
        }

The process in the container runs as "appuser", so that's fine:

root@lcm-0:/app/osm_lcm# ps -fe
UID          PID    PPID  C STIME TTY          TIME CMD
root           1       0  0 14:15 ?        00:00:01 /charm/bin/pebble run --create-dirs --hold --http :38813 --verbose
appuser       16       1  1 14:15 ?        00:00:21 python3 -m osm_lcm.lcm

But in the code we execute we use the $HOME variable, and instead of getting the home from "appuser" we get the home from "root", and we get a "Permission error" for that reason:

2022-07-21T14:17:25.355Z [lcm] PermissionError: [Errno 13] Permission denied: '/root/.ssh'

Is that the expected behaviour?

[RFE] Add Stop command to the pebble layer

Hi,

I am currently writing a charm for postgresql on top of k8s and I am having some issues when it comes to the stop routine.

Pebble does not allow me to choose how to stop my services, and the way it does according to the code pointed by pebble#62 is executed on handlers.go as follows:

Send a SIGTERM
Sleep for 250 msecs
Send a SIGKILL

However, that is not the best way to deal with postgresql:

It is best not to use SIGKILL to shut down the server. Doing so will prevent the server from releasing shared memory and semaphores. Furthermore, SIGKILL kills the postgres process without letting it relay the signal to its subprocesses, so it might be necessary to kill the individual subprocesses by hand as well.

Therefore, we should have a stop command similar to the "command" on pebble layer, where we define a routine for stopping the application:

        pebble_layer = {
            "summary": "postgresql layer",
            "description": "pebble config layer for httpbin",
            "services": {
                "postgresql": {
                    "override": "replace",
                    "summary": "postgresql unit",
                    "command": "/usr/local/bin/docker-entrypoint.sh postgres",
                    "stopCommand": "pg_ctl stop -D /var/lib/postgresql/data", <<<-----------
                    "startup": "enabled"
                }
            },
        }

That command would be executed whenever the operator issues a stop action to the pebble.

Right now, the way I am dealing with it is by manually stopping Postgresql services before calling the actual service stop:

                # Stop service so we can run the replication
                self._stop_app()
                # Run the stop to inform pebble
                container.stop("postgresql")

Directories created via pebble with requested permissions 0777 get created with 0775 instead

I've found that creating directories via the make-dirs action (POST @ /v1/files) may not respect the requested permissions. Specifically, during testing I found that if I tried to create a directory with 0777 permissions, it got created with 0775 permissions instead.

Add a way to introspect failing/restarting services

Per discussion at #86 (comment), it wasn't obvious when ServiceInfo.Restarts was reset to 0 (if at all), and whether this was the right thing to expose. Knowing whether Pebble has restarted 1000 times vs 1001 isn't particularly useful, for example.

On the other hand, when debugging a f(l)ailing service, we want the user to be able to see how often it's restarting. We kind of want a differential, like "restarts per minute" or "restarts in last 10 minutes". But that might be just as hard to explain, so maybe the absolute number is simplest and the user can see how fast it's going up over by querying over time, or by knowing when Pebble was started.

I lean towards keeping this field -- it seems quite useful to me -- and having it never reset (until Pebble restarts). It would also be exposed in pebble services.

"stop" does not check errors when sending SIGTERM and SIGKILL

In internal/overload/servstate/handlers.go, doStop does not check the error return from syscall.Kill when sending the SIGTERM and SIGKILL signals. It's unlikely to error, especially given that Pebble is what started the process, but I think we should return the error and bubble it up (or at the least log it).

We should also clean up the commented-out calls to cmd.Process.Signal (determine which one is the better approach, or just choose one).

	syscall.Kill(-cmd.Process.Pid, syscall.SIGTERM)
	//cmd.Process.Signal(syscall.SIGTERM)

I noticed this when digging into the code to reply to this Discourse thread.

Simplify RingBuffer closeChan handling to avoid double-close issues

Per discussion here, we'd like to get rid of the closeChan on the iterator -- probably combine it into one channel on the RingBuffer itself. This will avoid the double-close issues. To quote:

I've discussed with @hpidcock, and there is a better way to structure the RingBuffer to avoid the double close, as well as combine the individual Iterator.closeChan into one (semantically it should probably be one channel on the ring buffer). This change isn't hurting, so I'll leave it here, but we plan to refactor that in a subsequent commit.

Pebble should let the container crash when a service fails to auto-start

If Pebble has one or more pre-defined services to automatically run inside a container (as opposed to being started by a Juju sidecar), by default it should make the container crash if any of them fails to start. This behavior would be in line with Docker's ENTRYPOINT, and Pebble not letting containers crash play poorly on Kubernetes hiding crash loops and associated K8S events; accustomed as I am to Docker or K8S, the current behavior of Pebble seems to leave by default containers in a zombie state and invalidate key operational aspects of K8S pods and kubectl.

Additionally, it may be desirable to have optional services, explicitly flagged as such (which would be more or less the equivalent of optional containers in an ECS task), with the current behavior that Pebble is using for services.