Coder Social home page Coder Social logo

doctor's People

Contributors

github-actions[bot] avatar jamesguthrie avatar mkindahl avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

doctor's Issues

Recommend suitable segmentby and orderby columns for compression

Background

With an existing hypertable, it is very easy to turn on compression and start compressing the data. Compression can be tailored using timescaledb.compress_segmentby and timescaledb.compress_orderby and will improve performance if used the right way.

Problem

Turning on compression for a hypertable can have unexpected effects in query performance and compression depending on what segmentby and orderby columns are selected.

Question

Can we create recommendations based on the schema and the statistics for the table that we want to enable compression for?

Generate messages using template

To be able to generate customized reports, add the ability to read a Jinja template file and generate a report from that.

Idea: create an internal structure of a report consisting of all the failing rules and add iteration over is so that a Jinja template can write things like:

{% for item in report['compression'].items() -%}
 - {{ item.detail }}
{% endfor %}

Check that all hypertable have valid users for chunks and compressed chunks

Background

It is possible that a user is dropped without removing the user from hypertables nor compressed hypertables and that an ACL item with just the OID is present in the compressed hypertable.

For example:

production=> \z _timescaledb_internal._hyper_65_579660_chunk
			  Access privileges
        Schema         |          Name          | Type  |  Access privileges   | Column privileges | Policies 
-----------------------+------------------------+-------+----------------------+-------------------+----------
 _timescaledb_internal | _hyper_65_579660_chunk | table | red=r/admin         +|                   | 
                       |                        |       | green=r/admin       +|                   | 
                       |                        |       | admin=arwdDxt/admin +|                   | 
                       |                        |       | blue=r/admin        +|                   | 
                       |                        |       | yellow=r/admin      +|                   | 
                       |                        |       | 230452=r/admin      +|                   | 
                       |                        |       | mauve=r/admin       +|                   | 
(1 row)

Problem

Since permissions and users are copied when creating the compressed chunks, this will cause the compress_chunk function to fail with the error role 230452 was concurrently dropped.

Check

For a hypertable, check that all users that have assigned permissions actually exist and that both the chunks, compressed hypertable, and all compressed chunks have users that exist.

Find chunks that do not have same permissions as hypertable

For users that created hypertables prior to us fixing the permission handling, you might end up with a case where chunks have different permissions from the hypertable they belong to.

Identify this case and list the chunks that need to have their permissions updated.

Check for duplicate indexes

Check if there are duplicate indexes for a table, that is, two different indexes that are for the same relation and have the same set of key columns.

Refactor rules definition

Right now, rules can be defined so that a single function is executed for the rule and the documentation for the function is used as a template for the message. This works well for small rules and small number of objects, but there is no way to provide recommendations, more details, for a rule.

Refactoring the rule to use a class so that you can define brief message, detailed message with the reasoning, hints for how eliminate the recommendation would make the system more useful and open opportunities for more helpful usage.

Improve error handling

The tool is just throwing an exception on errors, but we should capture them and write better error messages. For example, without support for PostgreSQL URLs proposed in #5, we get this error:

mats@fury:~/work/timescale/doctor$ timescale-doctor postgres://mats@localhost/hax0r
Traceback (most recent call last):
  File "/home/mats/.local/bin/timescale-doctor", line 8, in <module>
    sys.exit(main())
  File "/home/mats/.local/lib/python3.10/site-packages/doctor/__init__.py", line 39, in main
    rules.check_rules(user=args.user, dbname=args.dbname, port=args.port, host=args.host)
  File "/home/mats/.local/lib/python3.10/site-packages/doctor/rules/__init__.py", line 113, in check_rules
    conn = psycopg2.connect(dbname=dbname, user=user,
  File "/usr/lib/python3/dist-packages/psycopg2/__init__.py", line 122, in connect
    conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
psycopg2.OperationalError: connection to server on socket "/var/run/postgresql/.s.PGSQL.5432" failed: FATAL:  database "postgres://mats@localhost/hax0r" does not exist

Check that all chunks are accounted for

Some old installations can have extra chunks that are not accounted for, that is, tables that are children of the hypertable but are not registered in the metadata. Check that:

  • All children of the hypertable have a corresponding entry in _timescaledb_catalog.chunk with the correct hypertable.
  • All chunks that are not marked dropped in _timescaledb_catalog.chunk exists as children to the correct hypertable.

Add unit tests for rules

Add unit tests for each rule so that we can check the case that it detects.

This means refactoring the code so that we can run each unit tests for a single rule only on a table or schema that we create based on some existing code. Since we want to avoid getting false positives if messages are changed, we should be able to get a "report" from a rule run that just contains the rule id and the parameters resulting from the SELECT. That way, we can just make sure that the rule triggers (or does not trigger) for particular objects in the database.

Example

[Add an example unit tests here.]

Tasks

Provide summary table for suggestions

Background

Right now, the tool emits a message for each object that matches a rule. The message is usually very elaborate, for example:

mats@fury:~/work/timescale/doctor$ timescale-doctor 
Table might benefit from being transformed to a hypertable.

    1. The table 'rides' has a column 'pickup_datetime' of timestamp type 'timestamp'
    2. The table 'rides' is not partitioned
    3. There are index scans done on 'rides'
    4. There are rows in 'rides'
    5. There are more than 10 pages allocated to 'rides'.

Problem

With a lot of tables under consideration, it can be difficult to get a good summary of the available tables and/or queries, what rules they match, and what the recommendation is.

Solution

Print a summary table with reference to the name of the rule. Something like this

Rule Object Description
hypertable_candidate rides

The tool can then print the full recommendation using a option and a rule name to provide more information.

Error when timescaledb not installed in source database

When run against a database without timescaledb:

> timescale-doctor postgres://postgres:1@localhost:5432
Traceback (most recent call last):
  File "/home/james/workspace/doctor/venv/bin/timescale-doctor", line 8, in <module>
    sys.exit(main())
  File "/home/james/workspace/doctor/venv/lib/python3.10/site-packages/doctor/cli.py", line 57, in main
    check_rules(dsn)
  File "/home/james/workspace/doctor/venv/lib/python3.10/site-packages/doctor/__init__.py", line 83, in check_rules
    for report in rule.execute(conn, rule.message):
  File "/home/james/workspace/doctor/venv/lib/python3.10/site-packages/doctor/__init__.py", line 62, in execute
    cursor.execute(self.query) # pylint: disable=E1101
  File "/nix/store/khvw06lqds60i0npy58zs27xpwb8x0d9-python3.10-psycopg2-2.9.5/lib/python3.10/site-packages/psycopg2/extras.py", line 236, in execute
    return super().execute(query, vars)
psycopg2.errors.UndefinedTable: relation "_timescaledb_catalog.hypertable" does not exist
LINE 5:       FROM _timescaledb_catalog.hypertable ht
                   ^

Error when timescaledb database contains dropped chunks with tombstone

When run against a database with a dropped chunk tombstone, that is entries matching:

SELECT * FROM _timescaledb_catalog.chunk WHERE dropped;
> psql postgres://postgres:1@localhost:5433 -c "SELECT * FROM _timescaledb_catalog.chunk WHERE dropped;"
 id | hypertable_id |      schema_name      |    table_name     | compressed_chunk_id | dropped | status | osm_chunk
----+---------------+-----------------------+-------------------+---------------------+---------+--------+-----------
 20 |             2 | _timescaledb_internal | _hyper_2_20_chunk |                     | t       |      0 | f
(1 row)

timescale-doctor fails:

> timescale-doctor postgres://postgres:1@localhost:5433
Traceback (most recent call last):
  File "/home/james/workspace/doctor/venv/bin/timescale-doctor", line 8, in <module>
    sys.exit(main())
  File "/home/james/workspace/doctor/venv/lib/python3.10/site-packages/doctor/cli.py", line 57, in main
    check_rules(dsn)
  File "/home/james/workspace/doctor/venv/lib/python3.10/site-packages/doctor/__init__.py", line 83, in check_rules
    for report in rule.execute(conn, rule.message):
  File "/home/james/workspace/doctor/venv/lib/python3.10/site-packages/doctor/__init__.py", line 62, in execute
    cursor.execute(self.query) # pylint: disable=E1101
  File "/nix/store/khvw06lqds60i0npy58zs27xpwb8x0d9-python3.10-psycopg2-2.9.5/lib/python3.10/site-packages/psycopg2/extras.py", line 236, in execute
    return super().execute(query, vars)
psycopg2.errors.UndefinedTable: relation "_timescaledb_internal._hyper_2_20_chunk" does not exist

List available rules and their description

Since it is not clear what rules are available it would be good to have an option to list rules and their descriptions, something like

timescale-doctor --list=* --show=brief

Option --list would take a glob pattern to match the rules to list, so * here means match all (and would be the default).

Option --show would select how much to show for each rule: just the description, brief or detailed messages, or even the query.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.