Coder Social home page Coder Social logo

pg_prometheus's Introduction

SUNSET NOTICE

We'll be sunsetting this project in the coming months as we focus on a new implementation with additional functionality and better support for new TimescaleDB features (such as compression). You can find the new project at https://github.com/timescale/promscale.

More details can be found in our design document for the new project.

This project will continue only in maintenance mode.

Prometheus metrics for PostgreSQL

pg_prometheus is an extension for PostgreSQL that defines a Prometheus metric samples data type and provides several storage formats for storing Prometheus data.

Related packages to install:

Running from Docker

A PostgreSQL docker image with both pg_prometheus and TimescaleDB installed is available in Docker Hub at timescale/pg_prometheus.

Example usage:

docker run --name pg_prometheus -d -p 5432:5432 timescale/pg_prometheus:latest-pg11 postgres \
      -csynchronous_commit=off

Note that this image inherits from the official postgres image and so all options documented there are applicable to this image as well. Especially important for users that wish to persist data outside of docker volumes is the PGDATA environmental variable and accompanying volume mount.

Installation

Requirements

  • Install PostgreSQL libraries and headers for C language backend development (https://www.postgresql.org/download/)
  • Make sure you have PostgreSQL bin in your PATH and the postgresql-devel package for your version of PostgreSQL before compiling from source

To install from source, do:

make
make install # Might require super user permissions

Edit postgresql.conf to include the pg_prometheus extension:

shared_preload_libraries = 'pg_prometheus'

Start PostgreSQL and install the extension as a superuser using the psql CLI:

CREATE EXTENSION pg_prometheus;

Optionally grant permissions to the database user (prometheus) that will own the Prometheus data:

-- Create the role
CREATE ROLE prometheus WITH LOGIN PASSWORD 'secret';

-- Grant access to the schema
GRANT ALL ON SCHEMA prometheus TO prometheus;

This also requires superuser privileges.

Integrating with Prometheus

For quickly connecting Prometheus to pg_prometheus simply connect the Prometheus PostgreSQL adapter to a database that has pg_prometheus installed.

For more technical details, or to use pg_prometheus without Prometheus, read below.

Creating the Prometheus tables.

To create the appropriate Prometheus tables use:

SELECT create_prometheus_table('metrics');

This will create a metrics table for inserting data in the Prometheus exposition format using the Prometheus data type. It will also create a metrics_view to easily query data.

Other supporting tables may also be created depending on the storage format (see below).

Inserting data

With either storage format, data can be inserted in Prometheus format into the main table (e.g. metrics in our running example). Data should be formatted according to the Prometheus exposition format.

INSERT INTO metrics VALUES ('cpu_usage{service="nginx",host="machine1"} 34.6 1494595898000');

Since metrics is a view, and PostgreSQL does not allow COPY to views, we create a specialized table to be the target of copy commands for normalized tables (raw tables could write directly to the underlying _sample table). By default, copy tables have a _copy suffix.

One interesting usage is to scrape a Prometheus endpoint (e.g. http://localhost:8080/metrics) directly (without using Prometheus):

curl http://localhost:8080/metrics | grep -v "^#" | psql -h localhost -U postgres -p 5432 -c "COPY metrics_copy FROM STDIN"

Querying data

The metrics view has the following schema:

  Column |           Type           | Modifiers
 --------+--------------------------+-----------
  time   | timestamp with time zone |
  name   | text                     |
  value  | double precision         |
  labels | jsonb                    |

An example query would be

SELECT time, value
FROM metrics
WHERE time > NOW() - interval '10 min' AND
      name = 'cpu_usage' AND
      labels @> '{ "service": "nginx"}';

Storage formats

Pg_prometheus allows two main ways of storing Prometheus metrics: raw and normalized (the default). With raw, a table simply stores all the Prometheus samples in a single column of type prom_sample. The normalized storage format separates out the labels into a separate table. The advantage of the normalized format is disk space savings when labels are long and repetitive.

Note that the metrics view can be used to query and insert data regardless of the storage format and serves to hide the underlying storage from the user.

Raw format

In raw format, data is stored in a table with one column of type prom_sample. To define a raw table use pass normalized_tables=>false to create_prometheus_table. This will also create appropriate indexes on the raw table. The schema is:

  Column   |           Type           | Modifiers
-----------+--------------------------+-----------
 sample    | prom_sampe               |

Normalized format

In the normalized format, data is stored in two tables. The values table holds the data values with a foreign key to the labels. It has the following schema:

  Column   |           Type           | Modifiers
-----------+--------------------------+-----------
 time      | timestamp with time zone |
 value     | double precision         |
 labels_id | integer                  |

Labels are stored in a companion table called labels (note that metric_name is in its own column since it is always present):

   Column    |  Type   |                          Modifiers
-------------+---------+-------------------------------------------------------------
 id          | integer | not null default nextval('metrics_labels_id_seq'::regclass)
 metric_name | text    | not null
 labels      | jsonb   |

Use with TimescaleDB

TimescaleDB scales PostgreSQL for time-series data workloads (of which metrics is one example). If TimescaleDB is installed, pg_prometheus will use it by default. To install TimescaleDB, follow the instruction here. You can explicitly control whether or not to use TimescaleDB with the use_timescaledb parameter to create_prometheus_table.

For example, the following will force pg_prometheus to use Timescale (and will error out if it isn't installed):

SELECT create_prometheus_table('metrics',use_timescaledb=>true);

Contributing

We welcome contributions to this extension, which like TimescaleDB is released under the Apache2 Open Source License. The same Contributors Agreement applies; please sign the Contributor License Agreement (CLA) if you're a new contributor.

pg_prometheus's People

Contributors

antekresic avatar atanasovskib avatar cevian avatar erimatnor avatar gdcrocx avatar inolddays avatar jesperpedersen avatar josephreynolds avatar kiefersmith avatar leehampton avatar mfreed avatar niksajakovljevic avatar robatticus avatar spolcyn avatar svenklemm avatar thiagorp avatar tylerfontaine avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pg_prometheus's Issues

how to specify "synchronous_commit=OFF" in docker-compose

I cannot start up pg_prometheus using docker-compose.

ERROR: for pg_prometheus  Cannot start service pg_prometheus: OCI runtime create failed: container_linux.go:344: starting container process caused "exec: \"-c\": executable file not found in $PATH": unknown

This line caused trouble. Any thoughts?

command: -c synchronous_commit=OFF

Project status unclear, especially regarding container image releases

Hi,

I'm currently investigating potential backends for Prometheus and noticed timescale + pg_prometheus being an option.

However, I'm unsure about the project status. The last release on DockerHub was in January, which correlates with the 0.2.1 release, but isn't up to date with timescaledb releases at all. It's not very clear to me which timescaledb release is going to be included with that image, for example. Tags are very different from the main project releases. The DockerHub page hasn't any documentation too.

Is the Docker image considered actively maintained? Is building it from source with an updated timescaledb version the recommended way for production usage?

Continuous aggregates creation not supported by the normalized storage

I am not 100% sure that what I will write is completely true, but we are using the pg_prometheus with normalized storage for some time now. Today we wanted to create some basic continuous aggregates view so that our dashboards are more responsive when looking to the long-term data.

Unfortunately we constantly fail to do so. Even the most simply aggregate like:

prometheus=# CREATE VIEW basic
prometheus-# WITH (timescaledb.continuous)
prometheus-# AS
prometheus-# SELECT
prometheus-# time_bucket('1 day', time),
prometheus-# sum(value) AS "total"
prometheus-# FROM metrics
prometheus-# GROUP BY time_bucket('1 day', time);
ERROR:  invalid SELECT query for continuous aggregate

fails with the invalid select query error message. The select query itself works without any problem of course.

As I understand this, the creation most likely fails as the metric is not a table, but a view. If this is the case, how can we perform continuous aggregates with the normalized storage? Would I need to go directly to the metrics_labels and metrics_values tables?

Any suggestions would be great here.

Unique metrics

In production, we have processes uploading metrics via an API I created, which causes duplication of datas.

I did a workaround locally (using database migrations in the API), to alter a bit the structure:

  1. Adding a unique constraint :
ALTER TABLE metrics_values ADD CONSTRAINT unique_time_value_labels_id UNIQUE (time, value, labels_id);
  1. Adding an "ON CONFLICT" within the function prometheus.insert_view_normal(), at line 149 :
    EXECUTE format('INSERT INTO %I (time, value, labels_id) VALUES (%L, %L, %L)',

    Transforming this line as :
EXECUTE format('INSERT INTO %I (time, value, labels_id) VALUES (%L, %L, %L) ON CONFLICT DO NOTHING', 

so the unique constraint allow to push metrics, and no duplicates to be found afterwards.

Does this qualify to be proposed as a PR ?

problem while installing from source

Environment:
CentOS7
pgsql10.6
pgsqlserver10.2

while installing from source
[root@node1 pg_prometheus-0.2]# make
Makefile:47: /usr/lib64/pgsql/pgxs/src/makefiles/pgxs.mk: No such file or directory
make: *** No rule to make target `/usr/lib64/pgsql/pgxs/src/makefiles/pgxs.mk'. Stop.

I'm confused what should I do with the soure code, directly tar into " /usr/pgsql-10/share/extension/" , or make && make install.

Lack of update path

When attempting to update pg_prometheus:

ERROR:  extension "pg_prometheus" has no update path from version "0.0.1" to version "0.2"

I had to place an empty 0.0.1--0.2.sql file in the extension directory as suggested by @niksajakovljevic to work around it.

Timescale prometheus, missing hypertable

Hello,

I'm trying to use timescale with prometheus, I followed the tutorial without issue
https://docs.timescale.com/latest/tutorials/tutorial-setup-timescale-prometheus

  • Ubuntu 20.04 (docker)
  • PostgreSQL 12.2 Ubuntu 12.2-4
  • TimescaleDB 1.7.2

However, There is something odd

 \dx
                                       List of installed extensions
     Name      | Version |   Schema   |                            Description                            
---------------+---------+------------+-------------------------------------------------------------------
 pg_prometheus | 0.2.2   | public     | Prometheus metrics for PostgreSQL
 plpgsql       | 1.0     | pg_catalog | PL/pgSQL procedural language
 timescaledb   | 1.7.2   | public     | Enables scalable inserts and complex queries for time-series data
(3 rows)

postgres=# select * from _timescaledb_catalog.hypertable;
 id | schema_name | table_name | associated_schema_name | associated_table_prefix | num_dimensions | chunk_sizing_func_schema | chunk_sizing_func_name | chunk_target_size | compressed | comp
ressed_hypertable_id 
----+-------------+------------+------------------------+-------------------------+----------------+--------------------------+------------------------+-------------------+------------+-----
---------------------
(0 rows)


postgres=# SELECT * from hypertable_approximate_row_count();
 schema_name | table_name | row_estimate 
-------------+------------+--------------
(0 rows)

Maybe I missed something ?

Version inconsistencies

The versions listed on hub.docker.com (tags), github.com (releases), and in the code itself (Dockerfile) don't seem to line up.

Docker Hub
latest/master: pg_prometheus 0.0.1, timescaledb 0.7.0, postgresql 9.6

If pg_prometheus 0.1 is released and stable I would expect "latest" to point to this version. It seems like this tag wasn't created with the Makefile since latest/master/commit are not updated.

Github
latest (only) release: pg_prometheus 0.1 (?), timescaledb 0.8.0, postgresql 10.2

This release seems ok except for the Dockerfile (see next section).

Dockerfile
master branch: pg_prometheus 0.0.1, timescaledb 0.8.0, postgresql 10.2

Even though the latest released version is pg_prometheus 0.1 the Dockerfile has "ENV PG_PROMETHEUS_VERSION 0.0.1"

implicit declaration of function ‘PG_GETARG_JSONB’; did you mean ‘PG_GETARG_JSONB_P’? [-Wimplicit-function-declaration] Jsonb *jb = PG_GETARG_JSONB(3);

gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation -g -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -fno-omit-frame-pointer -fPIC -DINCLUDE_PACKAGE_SUPPORT=0 -MMD -I. -I./ -I/usr/include/postgresql/11/server -I/usr/include/postgresql/internal -Wdate-time -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE -I/usr/include/libxml2 -I/usr/include/mit-krb5 -c -o src/prom.o src/prom.c
src/prom.c: In function ‘prom_construct’:
src/prom.c:407:17: warning: implicit declaration of function ‘PG_GETARG_JSONB’; did you mean ‘PG_GETARG_JSONB_P’? [-Wimplicit-function-declaration]
Jsonb *jb = PG_GETARG_JSONB(3);
^~~~~~~~~~~~~~~
PG_GETARG_JSONB_P
src/prom.c:407:17: warning: initialization makes pointer from integer without a cast [-Wint-conversion]
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation -g -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -fno-omit-frame-pointer -fPIC -DINCLUDE_PACKAGE_SUPPORT=0 -MMD -I. -I./ -I/usr/include/postgresql/11/server -I/usr/include/postgresql/internal -Wdate-time -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE -I/usr/include/libxml2 -I/usr/include/mit-krb5 -c -o src/parse.o src/parse.c
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation -g -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -fno-omit-frame-pointer -fPIC -DINCLUDE_PACKAGE_SUPPORT=0 -MMD -I. -I./ -I/usr/include/postgresql/11/server -I/usr/include/postgresql/internal -Wdate-time -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE -I/usr/include/libxml2 -I/usr/include/mit-krb5 -c -o src/utils.o src/utils.c
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation -g -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -fno-omit-frame-pointer -fPIC -DINCLUDE_PACKAGE_SUPPORT=0 -MMD -shared -o pg_prometheus.so src/prom.o src/parse.o src/utils.o -L/usr/lib/x86_64-linux-gnu -Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-z,now -L/usr/lib/llvm-6.0/lib -L/usr/lib/x86_64-linux-gnu/mit-krb5 -Wl,--as-needed
/usr/bin/clang-6.0 -Wno-ignored-attributes -fno-strict-aliasing -fwrapv -O2 -I. -I./ -I/usr/include/postgresql/11/server -I/usr/include/postgresql/internal -Wdate-time -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE -I/usr/include/libxml2 -I/usr/include/mit-krb5 -flto=thin -emit-llvm -c -o src/prom.bc src/prom.c
src/prom.c:407:17: warning: implicit declaration of function 'PG_GETARG_JSONB' is invalid in C99 [-Wimplicit-function-declaration]
Jsonb *jb = PG_GETARG_JSONB(3);
^
src/prom.c:407:12: warning: incompatible integer to pointer conversion initializing 'Jsonb *' with an expression of type 'int' [-Wint-conversion]
Jsonb *jb = PG_GETARG_JSONB(3);
^ ~~~~~~~~~~~~~~~~~~
2 warnings generated.
/usr/bin/clang-6.0 -Wno-ignored-attributes -fno-strict-aliasing -fwrapv -O2 -I. -I./ -I/usr/include/postgresql/11/server -I/usr/include/postgresql/internal -Wdate-time -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE -I/usr/include/libxml2 -I/usr/include/mit-krb5 -flto=thin -emit-llvm -c -o src/parse.bc src/parse.c
/usr/bin/clang-6.0 -Wno-ignored-attributes -fno-strict-aliasing -fwrapv -O2 -I. -I./ -I/usr/include/postgresql/11/server -I/usr/include/postgresql/internal -Wdate-time -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE -I/usr/include/libxml2 -I/usr/include/mit-krb5 -flto=thin -emit-llvm -c -o src/utils.bc src/utils.c

Unable to configure High Availability of Prometheus with timescaleDB

I am trying to set up the High Availability of Prometheus using timescaleDB with below configurations.

Node exporter

docker run -d -p 9100:9100 quay.io/prometheus/node-exporter

Prometheus

  • prometheus-1
    docker run -it -p 9090:9090 -v /root/prometheus/prometheus1.yml:/etc/prometheus/prometheus.yml prom/prometheus
  • prometheus1.yml

global:
scrape_interval: 5s
evaluation_interval: 10s
scrape_configs:
job_name: prometheus
static_configs:
targets: ['10.128.15.221:9100']
remote_write:
url: "http://10.128.15.221:9201/write"
remote_read:
url: "http://10.128.15.221:9201/read"
read_recent: true

  • prometheus-2
    docker run -it -p 9091:9090 -v /root/prometheus/prometheus2.yml:/etc/prometheus/prometheus.yml prom/prometheus
  • prometheus2.yml

global:
scrape_interval: 5s
evaluation_interval: 10s
scrape_configs:
job_name: prometheus
static_configs:
targets: ['10.128.15.221:9100']
remote_write:
url: "http://10.128.15.221:9202/write"
remote_read:
url: "http://10.128.15.221:9202/read"
read_recent: true

Prometheus adapter

  • prometheus-adapter-1
    docker run -it -p 9201:9201 timescale/prometheus-postgresql-adapter:latest -pg-host=10.128.15.221 -pg-password=secret -leader-election-pg-advisory-lock-id=2 -leader-election-pg-advisory-lock-prometheus-timeout=7s

  • prometheus-adapter-2
    docker run -it -p 9202:9201 timescale/prometheus-postgresql-adapter:latest -pg-host=10.128.15.221 -pg-password=secret -leader-election-pg-advisory-lock-id=2 -leader-election-pg-advisory-lock-prometheus-timeout=7s

pg_prometheus

docker run --name pg_prometheus -e POSTGRES_PASSWORD=secret -it -p 5432:5432 timescale/pg_prometheus:latest-pg11 postgres -csynchronous_commit=off

When I spin up, everything is working fine with the below status.

  • prometheus-adapter-1 -> Leader
    log

{"caller":"log.go:27","count":100,"duration":0.007022144,"level":"debug","msg":"Wrote samples","ts":"2020-03-09T10:02:01.146Z"}
{"caller":"log.go:27","count":100,"duration":0.007113201,"level":"debug","msg":"Wrote samples","ts":"2020-03-09T10:02:06.119Z"}
{"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":100,"ts":"2020-03-09T10:02:06.119Z"}
{"caller":"log.go:27","count":100,"duration":0.006514815,"level":"debug","msg":"Wrote samples","ts":"2020-03-09T10:02:06.128Z"}
{"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":200,"ts":"2020-03-09T10:02:06.128Z"}
{"caller":"log.go:27","count":100,"duration":0.00611504,"level":"debug","msg":"Wrote samples","ts":"2020-03-09T10:02:06.136Z"}
{"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":100,"ts":"2020-03-09T10:02:06.136Z"}
{"caller":"log.go:27","count":100,"duration":0.006294438,"level":"debug","msg":"Wrote samples","ts":"2020-03-09T10:02:06.144Z"}

  • prometheus-adapter-2 -> Not a leader
    log

{"caller":"log.go:27","level":"debug","msg":"Election id 1: Instance is not a leader. Can't write data","ts":"2020-03-09T10:01:33.135Z"}
{"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:01:33.135Z"}
{"caller":"log.go:27","level":"debug","msg":"Election id 1: Instance is not a leader. Can't write data","ts":"2020-03-09T10:01:33.138Z"}
{"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:01:33.138Z"}
{"caller":"log.go:27","level":"debug","msg":"Election id 1: Instance is not a leader. Can't write data","ts":"2020-03-09T10:01:33.140Z"}
{"caller":"log.go:27","level":"debug","msg":"Election id 1: Instance is not a leader. Can't write data","ts":"2020-03-09T10:01:38.133Z"}
{"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:01:38.133Z"}
{"caller":"log.go:27","level":"debug","msg":"Election id 1: Instance is not a leader. Can't write data","ts":"2020-03-09T10:01:38.135Z"}

But when I stop the prometheus-1, prometheus-adapter-2 is not picking the leadership. Please find the below logs for adapters.

prometheus-adapter-1

{"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":100,"ts":"2020-03-09T10:29:56.513Z"}
{"caller":"log.go:27","count":93,"duration":0.005575618,"level":"debug","msg":"Wrote samples","ts":"2020-03-09T10:29:59.668Z"}
{"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":100,"ts":"2020-03-09T10:29:59.668Z"}
{"caller":"log.go:35","level":"warn","msg":"Prometheus timeout exceeded","timeout":"7s","ts":"2020-03-09T10:30:06.960Z"}
{"caller":"log.go:35","level":"warn","msg":"Scheduled election is paused. Instance is removed from election pool.","ts":"2020-03-09T10:30:06.960Z"}
{"caller":"log.go:31","level":"info","msg":"Instance is no longer a leader","ts":"2020-03-09T10:30:06.962Z"}
{"caller":"log.go:27","level":"debug","msg":"Scheduled election is paused. Instance can't become a leader until scheduled election is resumed (Prometheus comes up again)","ts":"2020-03-09T10:30:10.958Z"}
{"caller":"log.go:27","level":"debug","msg":"Scheduled election is paused. Instance can't become a leader until scheduled election is resumed (Prometheus comes up again)","ts":"2020-03-09T10:30:15.958Z"}
{"caller":"log.go:27","level":"debug","msg":"Scheduled election is paused. Instance can't become a leader until scheduled election is resumed (Prometheus comes up again)","ts":"2020-03-09T10:30:20.958Z"}
{"caller":"log.go:27","level":"debug","msg":"Scheduled election is paused. Instance can't become a leader until scheduled election is resumed (Prometheus comes up again)","ts":"2020-03-09T10:30:25.958Z"}

prometheus-adapter-2

{"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:30:55.046Z"}
{"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:30:55.047Z"}
{"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:30:55.048Z"}
{"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:31:00.041Z"}
{"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:31:00.041Z"}
{"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:31:00.043Z"}
{"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:31:00.044Z"}
{"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:31:00.045Z"}
{"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:31:00.046Z"}
{"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:31:05.041Z"}
{"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:31:05.041Z"}
{"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:31:05.044Z"}
{"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:31:05.044Z"}
{"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:31:05.046Z"}
{"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:31:05.046Z"}
{"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:31:05.048Z"}
{"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:31:10.041Z"}
{"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:31:10.042Z"}
{"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:31:10.043Z"}
{"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:31:10.044Z"}
{"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:31:10.045Z"}
{"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:31:10.045Z"}
{"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:31:10.046Z"}

But when I stop the prometheus-adapter-1 then prometheus-adapter-2 is picking the leadership.

Another interesting thing is when I again start the promethus-1 then I see "Election id 2: Instance is not a leader. Can't write data" in prometheus-adapter-1 log. Please see the below log.

{"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:33:34.566Z"}
{"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":93,"ts":"2020-03-09T10:33:34.571Z"}
{"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:33:34.576Z"}
{"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:33:34.576Z"}
{"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:33:34.578Z"}
{"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:33:34.579Z"}
{"caller":"log.go:31","level":"info","msg":"Prometheus seems alive. Resuming scheduled election.","ts":"2020-03-09T10:33:34.959Z"}
{"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:33:39.550Z"}
{"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:33:39.551Z"}
{"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:33:39.553Z"}
{"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:33:39.553Z"}
{"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:33:39.555Z"}
{"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:33:39.556Z"}
{"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:33:39.558Z"}
{"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:33:44.551Z"}
{"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:33:44.551Z"}
{"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:33:44.554Z"}
{"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:33:44.554Z"}

So, am I followed any wrong step while setting this. or is this bug?

Please help me to resolve this issue.

Strange behaviour of queries with 'irate' function

screenshot-2018-7-17 grafana - report parser2
screenshot-2018-7-17 grafana - rabbitmq
screenshot-2018-7-17 grafana1 - rabbitmq

Hello,
We are using prometheus-postgresql-adapter and pg-prometheus services for keeping prometheus metrics in database storage. I'm not sure which one is causing the problem, but after switching from prometheus local persistent storage to those two services, all queries containing 'irate' function seem to behave incorrectly. It is especially visible in Grafana dashboards - graphs with 'irate' are showing 'no data' error, or there are no charts visible, only values are shown, or charts are visible only for ~2 seconds, and they disappear again. After changing 'irate' to 'rate' function there are no problems and everything works just fine.

how to create this extension for postgresql manual

My company has one postgresql product,I have the username and password, can I use it to create pg_prometheus。I don't want to create myself postgresql with docker。I want to use sql type to create this extension. how can I do it?

persistent volume data

hi where located persistent volume pg_prometheus

default like postgresql in /var/lib/postgresql ?

thanks

Buiding from source

What is step to install extention from source

root@instance-1 pg_prometheus]# make install
make: pg_config: Command not found
make: Nothing to be done for `install'.
[root@instance-1 pg_prometheus]# pwd
/var/lib/pgsql/pg_prometheus
[root@instance-1 pg_prometheus]#

Multiple schema support for pg_prometheus

Hi,

We are try to use pg_prometheus extension segregating each product as a schema. Each schema will hold all set of tables, views, types and functions of pg_prometheus. For example:

Schema: service_a
Tables: metrics_labels, metrics_values, metrics_copy
Views: metrics
Schema: service_b
Tables: metrics_labels, metrics_values, metrics_copy
Views: metrics
and so on

We enabled pg_prometheus extension as superuser in public and then created schema as follows:

create schema service_a AUTHORIZATION  prometheus;
grant usage on schema service_a to prometheus;
ALTER DEFAULT PRIVILEGES IN SCHEMA service_a GRANT all privileges ON tables TO prometheus;
ALTER DEFAULT PRIVILEGES IN SCHEMA service_a GRANT all privileges ON functions TO prometheus;
ALTER DEFAULT PRIVILEGES IN SCHEMA service_a GRANT all privileges ON sequences TO prometheus;
ALTER DEFAULT PRIVILEGES IN SCHEMA service_a GRANT all privileges ON types TO prometheus;
set search_path = service_a ,"$user", public;
SELECT create_prometheus_table('metrics');

I can see the tables are created under service_a. But during insert, we are getting error that "metrics_labels" not found. Investigating that, we found it is failing in the trigger function "prometheus.insert_view_normal()".

To avoid this problem, we added TG_TABLE_SCHEMA in the trigger function and modified the source of the procedure as below:

CREATE OR REPLACE FUNCTION prometheus.insert_view_normal()
 RETURNS trigger
 LANGUAGE plpgsql
AS $function$
DECLARE
    metric_labels     JSONB = prom_labels(NEW.sample);
    metric_labels_id  INTEGER;
    labels_table      NAME;
    values_table      NAME;
BEGIN
    IF TG_NARGS != 2 THEN
        RAISE EXCEPTION 'insert_view_normal requires 2 parameters';
    END IF;

    values_table := TG_ARGV[0];
    labels_table := TG_ARGV[1];

    -- Insert labels
    EXECUTE format('SELECT id FROM %I.%I l WHERE %L = l.labels AND %L = l.metric_name',
          TG_TABLE_SCHEMA,labels_table, metric_labels, prom_name(NEW.sample)) INTO metric_labels_id;

    IF metric_labels_id IS NULL THEN
      EXECUTE format(
          $$
          INSERT INTO %I.%I (metric_name, labels) VALUES (%L, %L) RETURNING id
          $$,
          TG_TABLE_SCHEMA,
          labels_table,
          prom_name(NEW.sample),
          metric_labels
      ) INTO STRICT metric_labels_id;
    END IF;

    EXECUTE format('INSERT INTO %I.%I (time, value, labels_id) VALUES (%L, %L, %L)',
          TG_TABLE_SCHEMA,values_table, prom_time(NEW.sample), prom_value(NEW.sample), metric_labels_id);

    RETURN NULL;
END
$function$
;
CREATE OR REPLACE FUNCTION prometheus.insert_view_sample()
 RETURNS trigger
 LANGUAGE plpgsql
AS $function$
DECLARE
    sample_table      NAME;
BEGIN
    IF TG_NARGS != 1 THEN
        RAISE EXCEPTION 'insert_view_normal requires 2 parameters';
    END IF;

    sample_table := TG_ARGV[0];

    EXECUTE format('INSERT INTO %I.%I (sample) VALUES (%L)',
          TG_TABLE_SCHEMA,sample_table, NEW.sample);

    RETURN NULL;
END
$function$
;

Will there be any problem with this setup? If not, can we add these changes to support schema in pg_prometheus?
Please advise.

Thanks.

Request for new version

It has been a while since timescale/timescaledb:0.9.2 was released.

I have been using Docker image created with Dockerfile modified several lines for about 3 weeks and seems to be working well.
Like this:

FROM postgres:10.4-alpine

ENV PG_MAJOR 10.4
ENV TIMESCALEDB_VERSION 0.9.2

I am happy that pg_prometheus follows timescaledb:0.9.2.

Specify build dependencies

Hello there!

I think it makes sense to specify in README.md the dependencies needed to compile the package in the RHEL and Debian families. The specific version would also be cool, since CentOS specifically requires the use of third party repos to have updated versions of LLVM, Clang and GCC.

Note: I specifically use version 11 of PostgreSQL because of TimescaleDB.

For now I am currently using the following packages for Debian based:

  • git
  • make
  • gcc
  • libpq-dev
  • postgresql-server-dev-11

and on Red Hat based systems:

CentOS 7 (need epel-release and centos-release-scl):

  • centos-release-scl
  • epel-release
  • git
  • make
  • gcc
  • libpqxx-devel
  • postgresql11-devel
  • devtoolset-7
  • llvm-toolset-7

On CentOS 8 (need only epel-release):

  • git
  • make
  • gcc
  • libpqxx-devel
  • postgresql11-devel
  • clang
  • redhat-rpm-config
  • ccache
  • llvm

If every1 agree I can do a PR with the content above.

add pg_prometheus and timescaledb ,start postgresql will throw error

I use prometheus adapter for store data with timescaledb.
in my postrgresql.conf.

shared_preload_libraries ='pg_prometheus'

shared_preload_libraries ='timescaledb'

and pg_prometheus and timescaledb extension can be used
but for psql:
show shared_preload_libraries.it can be show timescaledb only
why?
some one help me?

Not able to write metrics on remote postgres database

Hi
We are trying to set up Prometheus-Postgres on different machines where we want Prometheus on one machine and Postgres Database on another machine and trying to monitor third machine by using Node exporter. but we are failing to do so, Could anyone please suggest way forward to write the prometheus
data on different machine in postgres Database. Thanks in advance.

Make package

Currently make package fails due to missing pg_prometheus.so

[root@db7a59fabec1 pg_prometheus]# make
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -fPIC -DINCLUDE_PACKAGE_SUPPORT=0 -MMD -I. -I./ -I/usr/pgsql-9.6/include/server -I/usr/pgsql-9.6/include/internal -D_GNU_SOURCE -I/usr/include/libxml2  -I/usr/include  -c -o src/prom.o src/prom.c
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -fPIC -DINCLUDE_PACKAGE_SUPPORT=0 -MMD -I. -I./ -I/usr/pgsql-9.6/include/server -I/usr/pgsql-9.6/include/internal -D_GNU_SOURCE -I/usr/include/libxml2  -I/usr/include  -c -o src/parse.o src/parse.c
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -fPIC -DINCLUDE_PACKAGE_SUPPORT=0 -MMD -I. -I./ -I/usr/pgsql-9.6/include/server -I/usr/pgsql-9.6/include/internal -D_GNU_SOURCE -I/usr/include/libxml2  -I/usr/include  -c -o src/utils.o src/utils.c
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -fPIC -DINCLUDE_PACKAGE_SUPPORT=0 -MMD -shared -o pg_prometheus.so src/prom.o src/parse.o src/utils.o -L/usr/pgsql-9.6/lib  -Wl,--as-needed  -L/usr/lib64 -Wl,--as-needed -Wl,-rpath,'/usr/pgsql-9.6/lib',--enable-new-dtags  
[root@db7a59fabec1 pg_prometheus]# make package
rm -f pg_prometheus.so   libpg_prometheus.a  libpg_prometheus.pc
rm -f src/prom.o src/parse.o src/utils.o
rm -rf sql/pg_prometheus--0.2.1.sql src/prom.d src/parse.d src/utils.d
rm -rf test/results/ test/regression.diffs test/regression.out tmp_check/ log/
/bin/sh /usr/pgsql-9.6/lib/pgxs/src/makefiles/../../config/install-sh -c -m 755 pg_prometheus.so 'package/lib/pg_prometheus.so'
/usr/pgsql-9.6/lib/pgxs/src/makefiles/../../config/install-sh: pg_prometheus.so does not exist.
make: *** [package] Error 1

Consider changing --

package: clean $(EXT_SQL_FILE)

to:

package: $(EXT_SQL_FILE) $(EXTENSION).so

Additional question -- Is there a pre-built version pg_prometheus available? Was looking specifically for a RPM and not the docker image.

Duplicate key values when inserting values from multiple connections

I have two connections to the database using pg_prometheus, which both insert data on the form test_id{sender=nina} id timestamp, to the database with different id values and timestamps. I sometimes, two times out of 5 perhaps, get the following exception:

IntegrityError: duplicate key value violates unique constraint "metrics_labels_metric_name_labels_key" DETAIL: Key (metric_name, labels)=(test_id, {"sender": "nina1"}) already exists. CONTEXT: SQL statement " INSERT INTO metrics_labels (metric_name, labels) VALUES ('test_id', '{"sender": "nina1"}') RETURNING id

from the following function: PL/pgSQL function prometheus.insert_view_normal() line 20 at EXECUTE

Worth noting is that I am using python 3 + psycopg2 lib, using execute_values() with an upsert doing nothing on conflict.

Is the problem at your side, or is it bad timing/my setup/the library I'm using?

Thanks in advance~

Modelling Prometheus with two tables (`values` and `labels`) works an order of magnitude slower than Prometheus on a basic query

The design with two tables: values and labels is suggested here, in the "Pg_Prometheus" section.

Schema

The part of our implementation of this design, corresponding to the "values" table in the documentation:

CREATE TABLE IF NOT EXISTS series_values (
    time TIMESTAMPTZ NOT NULL, 
    value DOUBLE PRECISION NOT NULL, 
    series_id INTEGER NOT NULL,
    seq BIGSERIAL
);

SELECT create_hypertable('series_values', 'time');

ALTER TABLE series_values SET (
  timescaledb.compress,
  timescaledb.compress_segmentby = 'series_id',
  timescaledb.compress_orderby = 'time DESC, seq DESC'
);

CREATE INDEX IF NOT EXISTS series_values_series_id_idx ON series_values USING BTREE (series_id, time desc);

Our data has about 250 unique series, all with 1 data point per second, i. e. about 250 data points per second in total.

Problem no. 1 - Hash Join when selecting just 190 indexed series_id is suboptimal

EXPLAIN ANALYZE SELECT series_id, avg("value") from series_values where series_id IN (select id from series where metric = 'some_metric_name') and time between '2020-09-19 09:00:00' and '2020-09-19 11:00:00' group by series_id;

                                                                                                         QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize GroupAggregate  (cost=65310.35..65386.85 rows=200 width=12) (actual time=24017.786..24023.399 rows=192 loops=1)
   Group Key: _hyper_1_40_chunk.series_id
   ->  Gather Merge  (cost=65310.35..65381.35 rows=600 width=36) (actual time=24017.712..24106.440 rows=768 loops=1)
         Workers Planned: 3
         Workers Launched: 3
         ->  Sort  (cost=64310.31..64310.81 rows=200 width=36) (actual time=23901.632..23901.857 rows=192 loops=4)
               Sort Key: _hyper_1_40_chunk.series_id
               Sort Method: quicksort  Memory: 42kB
               Worker 0:  Sort Method: quicksort  Memory: 42kB
               Worker 1:  Sort Method: quicksort  Memory: 42kB
               Worker 2:  Sort Method: quicksort  Memory: 42kB
               ->  Partial HashAggregate  (cost=64300.67..64302.67 rows=200 width=36) (actual time=23892.381..23893.048 rows=192 loops=4)
                     Group Key: _hyper_1_40_chunk.series_id
                     ->  Hash Join  (cost=15.29..62614.73 rows=337187 width=12) (actual time=5.554..22098.783 rows=325488 loops=4)
                           Hash Cond: (_hyper_1_40_chunk.series_id = series.id)
                           ->  Parallel Append  (cost=0.43..61307.99 rows=484707 width=12) (actual time=4.074..20498.857 rows=349109 loops=4)
                                 ->  Parallel Index Scan using _hyper_1_40_chunk_series_values_time_idx on _hyper_1_40_chunk  (cost=0.43..58884.45 rows=484707 width=12) (actual time=4.070..19998.534 rows=349109 loops=4)
                                       Index Cond: (("time" >= '2020-09-19 09:00:00+00'::timestamp with time zone) AND ("time" <= '2020-09-19 11:00:00+00'::timestamp with time zone))
                           ->  Hash  (cost=12.45..12.45 rows=192 width=4) (actual time=1.217..1.218 rows=192 loops=4)
                                 Buckets: 1024  Batches: 1  Memory Usage: 10kB
                                 ->  Seq Scan on series  (cost=0.00..12.45 rows=192 width=4) (actual time=0.143..0.865 rows=192 loops=4)
                                       Filter: (metric = 'some_metric_name'::text)
                                       Rows Removed by Filter: 84
 Planning Time: 3.240 ms
 Execution Time: 24109.908 ms

Query planner decided to do a Hash Join with just 192 rows from the series table, which doesn't make sense.

When I replace the nested SELECT query with series_id IN (21,22,...212), i. e. literally all the selected values, Timescale does the same (I don't post the query and the result because the query string is very long).

But I can make Timescale to do an Index Scan instead of a Hash Join by using a series_id between 21 and 212 clause. This is 40% faster:

EXPLAIN ANALYZE SELECT series_id, avg("value") from series_values where series_id BETWEEN 21 and 212 and time between '2020-09-19 09:00:00' and '2020-09-19 11:00:00' group by series_id;
                                                                                                      QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize GroupAggregate  (cost=66793.95..66870.45 rows=200 width=12) (actual time=14220.642..14227.494 rows=192 loops=1)
   Group Key: _hyper_1_40_chunk.series_id
   ->  Gather Merge  (cost=66793.95..66864.95 rows=600 width=36) (actual time=14220.567..14299.768 rows=768 loops=1)
         Workers Planned: 3
         Workers Launched: 3
         ->  Sort  (cost=65793.91..65794.41 rows=200 width=36) (actual time=14098.952..14099.158 rows=192 loops=4)
               Sort Key: _hyper_1_40_chunk.series_id
               Sort Method: quicksort  Memory: 42kB
               Worker 0:  Sort Method: quicksort  Memory: 42kB
               Worker 1:  Sort Method: quicksort  Memory: 42kB
               Worker 2:  Sort Method: quicksort  Memory: 42kB
               ->  Partial HashAggregate  (cost=65784.27..65786.27 rows=200 width=36) (actual time=14096.069..14096.652 rows=192 loops=4)
                     Group Key: _hyper_1_40_chunk.series_id
                     ->  Parallel Append  (cost=0.43..63525.22 rows=451809 width=12) (actual time=4.246..12750.656 rows=325488 loops=4)
                           ->  Parallel Index Scan using _hyper_1_40_chunk_series_values_time_idx on _hyper_1_40_chunk  (cost=0.43..61266.18 rows=451809 width=12) (actual time=4.241..12429.488 rows=325488 loops=4)
                                 Index Cond: (("time" >= '2020-09-19 09:00:00+00'::timestamp with time zone) AND ("time" <= '2020-09-19 11:00:00+00'::timestamp with time zone))
                                 Filter: ((series_id >= 21) AND (series_id <= 212))
                                 Rows Removed by Filter: 23621
 Planning Time: 11.483 ms
 Execution Time: 14304.284 ms

Problem no. 2 (deeper) - data layout is inefficient

If you look carefully at the plan of the query above (taking 14 seconds), you realise that it's still utterly inefficient for the data problem at hand. Data points belonging to the same series_id are about 250 rows away from each other, all interleaved in the main table (because there are about 250 series, and they all have the same frequency of 1 second). So the Index Scan is much closer to random access than to sequential data access to the columns in the series_values table.

Prometheus does the same query on the same data in just 1.5 seconds, i. e. an order of magnitude faster. I this is because Prometheus always groups the data points by the metric. Timescale can do the same, but only on the compressed chunks, via the compress_segmentby option. (Note that the sole chunk appearing in the queries above is uncompressed.)

Another way to group the data points belonging to the same series in Timescale is partitioning via add_dimension(). However, this solution is actively discouraged in the docs:

TimescaleDB does not benefit from a very large number of space partitions (such as the number of unique items you expect in partition field). A very large number of such partitions leads both to poorer per-partition load balancing (the mapping of items to partitions using hashing), as well as much increased planning latency for some types of queries.

I didn't try this yet, but indeed I doubt that PostgreSQL/Timescale will always do projection pushdowns properly, and that query planning will not become super slow with thousands of chunks per table. In fact, we already had problems with query planning latency with just hundreds of chunks (by time only), so we had to increase the chunking interval.

COPY "metrics_copy" FROM STDIN hogs a lot of CPU

32 core CPU, 256 GB RAM, Raid 10 of HDD for TimescaleDB (single tablespace), pg_prometheus extension

archive_command = 'envdir /etc/patroni/patroni.env.d wal-e wal-push %p'
archive_mode = 'on'
archive_timeout = '1800s'
bgwriter_delay = '200ms'
bgwriter_flush_after = '0'
bgwriter_lru_maxpages = '100'
bgwriter_lru_multiplier = '2.0'
checkpoint_completion_target = '0.9'
checkpoint_timeout = '15 min'
cluster_name = 'poclts'
effective_cache_size = '179 GB'
effective_io_concurrency = '0'
hot_standby = 'on'
huge_pages = 'off'
listen_addresses = '127.0.0.1,sprvrt001.zone'
log_destination = 'csvlog'
log_filename = 'postgresql'
log_min_duration_statement = '1000'
log_statement = 'ddl'
log_timezone = 'UTC'
logging_collector = 'True'
maintenance_work_mem = '1920 MB'
max_connections = '100'
max_locks_per_transaction = '64'
max_parallel_workers = '24'
max_parallel_workers_per_gather = '12'
max_prepared_transactions = '0'
max_replication_slots = '10'
max_wal_senders = '10'
max_wal_size = '10240 MB'
max_worker_processes = '24'
min_wal_size = '5120 MB'
pg_stat_statements.max = '5000'
pg_stat_statements.save = 'True'
pg_stat_statements.track = 'all'
port = '5432'
shared_buffers = '65536 MB'
shared_preload_libraries = 'pg_stat_statements,timescaledb'
superuser_reserved_connections = '3'
track_commit_timestamp = 'off'
wal_buffers = '-1'
wal_compression = 'True'
wal_keep_segments = '1290'
wal_level = 'replica'
wal_log_hints = 'on'
work_mem = '128 MB'

Single Prometheus 2. server collecting metrics from 180 node-exporters.

remote_write:
- url http://prometheus_postgresql_adapter:9201/write
  remote_timeout: 30s
  queue_config:
    capacity: 100000
    max_shards: 100
    max_samples_per_send: 5000
    batch_send_deadline: 5s
    max_retries: 10
    min_backoff: 30ms
    max_backoff: 100ms

htop

Just over 1 hour we got ~15gb of data inside PostgreSQL. Just by adding second Prometheus server CPU% usage increased by 20% more. We will have many more Prometheus servers. Will need to keep 2 years data.
Found this https://wiki.postgresql.org/wiki/COPY and but no answer.
This is completely unacceptable.

Maybe you have a hunch what we are doing wrong?

Disk space much larger than table sizes

I'm running the pg_prometheus docker image (0.2.2-pg11) connected to Prometheus (2.15.2) via the prometheus-postgres-adapter (0.6.0). I've been running for a couple weeks like this and just looked at the disk usage. It's at 900GB, but the tables (metrics*) add up to only 1 GB.
I saw that autovacuum wasn't running, so I manually ran "vacuum full". It cleaned about 40 GB. I then set the autovacuum_vacuum_scale_factor to 0.01 to get it to run more often. It didn't run all weekend.
Anybody know why the disk usage is so much higher than the table size or why the vacuum stuff doesn't seem to be doing much of anything?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.