Check that compress_chunk_interval is used correctly

If timescaledb.compress_chunk_interval is used the time column should be first in the timescaledb.compress_orderby list for the compression settings and also be marked as ascending.

Check that all hypertable have valid users for chunks and compressed chunks

Background

It is possible that a user is dropped without removing the user from hypertables nor compressed hypertables and that an ACL item with just the OID is present in the compressed hypertable.

For example:

production=> \z _timescaledb_internal._hyper_65_579660_chunk
			  Access privileges
        Schema         |          Name          | Type  |  Access privileges   | Column privileges | Policies 
-----------------------+------------------------+-------+----------------------+-------------------+----------
 _timescaledb_internal | _hyper_65_579660_chunk | table | red=r/admin         +|                   | 
                       |                        |       | green=r/admin       +|                   | 
                       |                        |       | admin=arwdDxt/admin +|                   | 
                       |                        |       | blue=r/admin        +|                   | 
                       |                        |       | yellow=r/admin      +|                   | 
                       |                        |       | 230452=r/admin      +|                   | 
                       |                        |       | mauve=r/admin       +|                   | 
(1 row)

Problem

Since permissions and users are copied when creating the compressed chunks, this will cause the compress_chunk function to fail with the error role 230452 was concurrently dropped.

Check

For a hypertable, check that all users that have assigned permissions actually exist and that both the chunks, compressed hypertable, and all compressed chunks have users that exist.

Provide summary table for suggestions

Background

Right now, the tool emits a message for each object that matches a rule. The message is usually very elaborate, for example:

mats@fury:~/work/timescale/doctor$ timescale-doctor 
Table might benefit from being transformed to a hypertable.

    1. The table 'rides' has a column 'pickup_datetime' of timestamp type 'timestamp'
    2. The table 'rides' is not partitioned
    3. There are index scans done on 'rides'
    4. There are rows in 'rides'
    5. There are more than 10 pages allocated to 'rides'.

Problem

With a lot of tables under consideration, it can be difficult to get a good summary of the available tables and/or queries, what rules they match, and what the recommendation is.

Solution

Print a summary table with reference to the name of the rule. Something like this

Rule	Object	Description
hypertable_candidate	rides

The tool can then print the full recommendation using a option and a rule name to provide more information.

Check that all chunks are accounted for

Some old installations can have extra chunks that are not accounted for, that is, tables that are children of the hypertable but are not registered in the metadata. Check that:

All children of the hypertable have a corresponding entry in _timescaledb_catalog.chunk with the correct hypertable.
All chunks that are not marked dropped in _timescaledb_catalog.chunk exists as children to the correct hypertable.

Error when timescaledb not installed in source database

When run against a database without timescaledb:

> timescale-doctor postgres://postgres:1@localhost:5432
Traceback (most recent call last):
  File "/home/james/workspace/doctor/venv/bin/timescale-doctor", line 8, in <module>
    sys.exit(main())
  File "/home/james/workspace/doctor/venv/lib/python3.10/site-packages/doctor/cli.py", line 57, in main
    check_rules(dsn)
  File "/home/james/workspace/doctor/venv/lib/python3.10/site-packages/doctor/__init__.py", line 83, in check_rules
    for report in rule.execute(conn, rule.message):
  File "/home/james/workspace/doctor/venv/lib/python3.10/site-packages/doctor/__init__.py", line 62, in execute
    cursor.execute(self.query) # pylint: disable=E1101
  File "/nix/store/khvw06lqds60i0npy58zs27xpwb8x0d9-python3.10-psycopg2-2.9.5/lib/python3.10/site-packages/psycopg2/extras.py", line 236, in execute
    return super().execute(query, vars)
psycopg2.errors.UndefinedTable: relation "_timescaledb_catalog.hypertable" does not exist
LINE 5:       FROM _timescaledb_catalog.hypertable ht
                   ^

Support PostgreSQL URLs for connection

Passing a PostgreSQL URL as a connection string.

timescale-doctor postgres://mats@localhost/foobar

Add unit tests for rules

Add unit tests for each rule so that we can check the case that it detects.

This means refactoring the code so that we can run each unit tests for a single rule only on a table or schema that we create based on some existing code. Since we want to avoid getting false positives if messages are changed, we should be able to get a "report" from a rule run that just contains the rule id and the parameters resulting from the SELECT. That way, we can just make sure that the rule triggers (or does not trigger) for particular objects in the database.

Example

[Add an example unit tests here.]

Tasks

Beta Give feedback

Refactor to be able to run an individual rule (#22)
Add unit tests for compression rules (#22)
Add TestCase class for setting up database for unit tests (#34)
Add rule identifier for each report
feat: check for duplicate indexes #43

enhancement
test: add test for unused index #45
Options

Refactor rules definition

Right now, rules can be defined so that a single function is executed for the rule and the documentation for the function is used as a template for the message. This works well for small rules and small number of objects, but there is no way to provide recommendations, more details, for a rule.

Refactoring the rule to use a class so that you can define brief message, detailed message with the reasoning, hints for how eliminate the recommendation would make the system more useful and open opportunities for more helpful usage.

Find chunks that do not have same permissions as hypertable

For users that created hypertables prior to us fixing the permission handling, you might end up with a case where chunks have different permissions from the hypertable they belong to.

Identify this case and list the chunks that need to have their permissions updated.

Generate messages using template

To be able to generate customized reports, add the ability to read a Jinja template file and generate a report from that.

Idea: create an internal structure of a report consisting of all the failing rules and add iteration over is so that a Jinja template can write things like:

{% for item in report['compression'].items() -%}
 - {{ item.detail }}
{% endfor %}

Check for continuous aggregates which haven't been migrated from finalized format

Timescale 2.7 introduced the new storage format (non-finalized) for continuous aggregates. 2.8 Introduced functions to migrate from the old format to the new format, and with 2.10 it was announced that support for old-format caggs would be removed in the near future.

Error when timescaledb database contains dropped chunks with tombstone

When run against a database with a dropped chunk tombstone, that is entries matching:

SELECT * FROM _timescaledb_catalog.chunk WHERE dropped;

> psql postgres://postgres:1@localhost:5433 -c "SELECT * FROM _timescaledb_catalog.chunk WHERE dropped;"
 id | hypertable_id |      schema_name      |    table_name     | compressed_chunk_id | dropped | status | osm_chunk
----+---------------+-----------------------+-------------------+---------------------+---------+--------+-----------
 20 |             2 | _timescaledb_internal | _hyper_2_20_chunk |                     | t       |      0 | f
(1 row)

timescale-doctor fails:

> timescale-doctor postgres://postgres:1@localhost:5433
Traceback (most recent call last):
  File "/home/james/workspace/doctor/venv/bin/timescale-doctor", line 8, in <module>
    sys.exit(main())
  File "/home/james/workspace/doctor/venv/lib/python3.10/site-packages/doctor/cli.py", line 57, in main
    check_rules(dsn)
  File "/home/james/workspace/doctor/venv/lib/python3.10/site-packages/doctor/__init__.py", line 83, in check_rules
    for report in rule.execute(conn, rule.message):
  File "/home/james/workspace/doctor/venv/lib/python3.10/site-packages/doctor/__init__.py", line 62, in execute
    cursor.execute(self.query) # pylint: disable=E1101
  File "/nix/store/khvw06lqds60i0npy58zs27xpwb8x0d9-python3.10-psycopg2-2.9.5/lib/python3.10/site-packages/psycopg2/extras.py", line 236, in execute
    return super().execute(query, vars)
psycopg2.errors.UndefinedTable: relation "_timescaledb_internal._hyper_2_20_chunk" does not exist

List available rules and their description

Since it is not clear what rules are available it would be good to have an option to list rules and their descriptions, something like

timescale-doctor --list=* --show=brief

Option --list would take a glob pattern to match the rules to list, so * here means match all (and would be the default).

Option --show would select how much to show for each rule: just the description, brief or detailed messages, or even the query.

Recommend suitable segmentby and orderby columns for compression

Background

With an existing hypertable, it is very easy to turn on compression and start compressing the data. Compression can be tailored using timescaledb.compress_segmentby and timescaledb.compress_orderby and will improve performance if used the right way.

Problem

Turning on compression for a hypertable can have unexpected effects in query performance and compression depending on what segmentby and orderby columns are selected.

Question

Can we create recommendations based on the schema and the statistics for the table that we want to enable compression for?

Check for duplicate indexes

Check if there are duplicate indexes for a table, that is, two different indexes that are for the same relation and have the same set of key columns.

Improve error handling

The tool is just throwing an exception on errors, but we should capture them and write better error messages. For example, without support for PostgreSQL URLs proposed in #5, we get this error:

mats@fury:~/work/timescale/doctor$ timescale-doctor postgres://mats@localhost/hax0r
Traceback (most recent call last):
  File "/home/mats/.local/bin/timescale-doctor", line 8, in <module>
    sys.exit(main())
  File "/home/mats/.local/lib/python3.10/site-packages/doctor/__init__.py", line 39, in main
    rules.check_rules(user=args.user, dbname=args.dbname, port=args.port, host=args.host)
  File "/home/mats/.local/lib/python3.10/site-packages/doctor/rules/__init__.py", line 113, in check_rules
    conn = psycopg2.connect(dbname=dbname, user=user,
  File "/usr/lib/python3/dist-packages/psycopg2/__init__.py", line 122, in connect
    conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
psycopg2.OperationalError: connection to server on socket "/var/run/postgresql/.s.PGSQL.5432" failed: FATAL:  database "postgres://mats@localhost/hax0r" does not exist

timescale / doctor Goto Github PK

doctor's People

Contributors

Stargazers

Watchers

Forkers

doctor's Issues

Background

Problem

Check

Background

Problem

Solution

Example

Tasks

Background

Problem

Question

Recommend Projects

Recommend Topics

Recommend Org