timescale / doctor Goto Github PK
View Code? Open in Web Editor NEWRule-based recommendations about your timeseries database.
License: Apache License 2.0
Rule-based recommendations about your timeseries database.
License: Apache License 2.0
With an existing hypertable, it is very easy to turn on compression and start compressing the data. Compression can be tailored using timescaledb.compress_segmentby
and timescaledb.compress_orderby
and will improve performance if used the right way.
Turning on compression for a hypertable can have unexpected effects in query performance and compression depending on what segmentby and orderby columns are selected.
Can we create recommendations based on the schema and the statistics for the table that we want to enable compression for?
If timescaledb.compress_chunk_interval
is used the time column should be first in the timescaledb.compress_orderby
list for the compression settings and also be marked as ascending.
To be able to generate customized reports, add the ability to read a Jinja template file and generate a report from that.
Idea: create an internal structure of a report consisting of all the failing rules and add iteration over is so that a Jinja template can write things like:
{% for item in report['compression'].items() -%}
- {{ item.detail }}
{% endfor %}
It is possible that a user is dropped without removing the user from hypertables nor compressed hypertables and that an ACL item with just the OID is present in the compressed hypertable.
For example:
production=> \z _timescaledb_internal._hyper_65_579660_chunk
Access privileges
Schema | Name | Type | Access privileges | Column privileges | Policies
-----------------------+------------------------+-------+----------------------+-------------------+----------
_timescaledb_internal | _hyper_65_579660_chunk | table | red=r/admin +| |
| | | green=r/admin +| |
| | | admin=arwdDxt/admin +| |
| | | blue=r/admin +| |
| | | yellow=r/admin +| |
| | | 230452=r/admin +| |
| | | mauve=r/admin +| |
(1 row)
Since permissions and users are copied when creating the compressed chunks, this will cause the compress_chunk
function to fail with the error role 230452 was concurrently dropped
.
For a hypertable, check that all users that have assigned permissions actually exist and that both the chunks, compressed hypertable, and all compressed chunks have users that exist.
For users that created hypertables prior to us fixing the permission handling, you might end up with a case where chunks have different permissions from the hypertable they belong to.
Identify this case and list the chunks that need to have their permissions updated.
Timescale 2.7 introduced the new storage format (non-finalized) for continuous aggregates. 2.8 Introduced functions to migrate from the old format to the new format, and with 2.10 it was announced that support for old-format caggs would be removed in the near future.
Check if there are duplicate indexes for a table, that is, two different indexes that are for the same relation and have the same set of key columns.
Right now, rules can be defined so that a single function is executed for the rule and the documentation for the function is used as a template for the message. This works well for small rules and small number of objects, but there is no way to provide recommendations, more details, for a rule.
Refactoring the rule to use a class so that you can define brief message, detailed message with the reasoning, hints for how eliminate the recommendation would make the system more useful and open opportunities for more helpful usage.
Passing a PostgreSQL URL as a connection string.
timescale-doctor postgres://mats@localhost/foobar
The tool is just throwing an exception on errors, but we should capture them and write better error messages. For example, without support for PostgreSQL URLs proposed in #5, we get this error:
mats@fury:~/work/timescale/doctor$ timescale-doctor postgres://mats@localhost/hax0r
Traceback (most recent call last):
File "/home/mats/.local/bin/timescale-doctor", line 8, in <module>
sys.exit(main())
File "/home/mats/.local/lib/python3.10/site-packages/doctor/__init__.py", line 39, in main
rules.check_rules(user=args.user, dbname=args.dbname, port=args.port, host=args.host)
File "/home/mats/.local/lib/python3.10/site-packages/doctor/rules/__init__.py", line 113, in check_rules
conn = psycopg2.connect(dbname=dbname, user=user,
File "/usr/lib/python3/dist-packages/psycopg2/__init__.py", line 122, in connect
conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
psycopg2.OperationalError: connection to server on socket "/var/run/postgresql/.s.PGSQL.5432" failed: FATAL: database "postgres://mats@localhost/hax0r" does not exist
Some old installations can have extra chunks that are not accounted for, that is, tables that are children of the hypertable but are not registered in the metadata. Check that:
_timescaledb_catalog.chunk
with the correct hypertable._timescaledb_catalog.chunk
exists as children to the correct hypertable.Add unit tests for each rule so that we can check the case that it detects.
This means refactoring the code so that we can run each unit tests for a single rule only on a table or schema that we create based on some existing code. Since we want to avoid getting false positives if messages are changed, we should be able to get a "report" from a rule run that just contains the rule id and the parameters resulting from the SELECT. That way, we can just make sure that the rule triggers (or does not trigger) for particular objects in the database.
[Add an example unit tests here.]
Right now, the tool emits a message for each object that matches a rule. The message is usually very elaborate, for example:
mats@fury:~/work/timescale/doctor$ timescale-doctor
Table might benefit from being transformed to a hypertable.
1. The table 'rides' has a column 'pickup_datetime' of timestamp type 'timestamp'
2. The table 'rides' is not partitioned
3. There are index scans done on 'rides'
4. There are rows in 'rides'
5. There are more than 10 pages allocated to 'rides'.
With a lot of tables under consideration, it can be difficult to get a good summary of the available tables and/or queries, what rules they match, and what the recommendation is.
Print a summary table with reference to the name of the rule. Something like this
Rule | Object | Description |
---|---|---|
hypertable_candidate | rides |
The tool can then print the full recommendation using a option and a rule name to provide more information.
When run against a database without timescaledb:
> timescale-doctor postgres://postgres:1@localhost:5432
Traceback (most recent call last):
File "/home/james/workspace/doctor/venv/bin/timescale-doctor", line 8, in <module>
sys.exit(main())
File "/home/james/workspace/doctor/venv/lib/python3.10/site-packages/doctor/cli.py", line 57, in main
check_rules(dsn)
File "/home/james/workspace/doctor/venv/lib/python3.10/site-packages/doctor/__init__.py", line 83, in check_rules
for report in rule.execute(conn, rule.message):
File "/home/james/workspace/doctor/venv/lib/python3.10/site-packages/doctor/__init__.py", line 62, in execute
cursor.execute(self.query) # pylint: disable=E1101
File "/nix/store/khvw06lqds60i0npy58zs27xpwb8x0d9-python3.10-psycopg2-2.9.5/lib/python3.10/site-packages/psycopg2/extras.py", line 236, in execute
return super().execute(query, vars)
psycopg2.errors.UndefinedTable: relation "_timescaledb_catalog.hypertable" does not exist
LINE 5: FROM _timescaledb_catalog.hypertable ht
^
When run against a database with a dropped chunk tombstone, that is entries matching:
SELECT * FROM _timescaledb_catalog.chunk WHERE dropped;
> psql postgres://postgres:1@localhost:5433 -c "SELECT * FROM _timescaledb_catalog.chunk WHERE dropped;"
id | hypertable_id | schema_name | table_name | compressed_chunk_id | dropped | status | osm_chunk
----+---------------+-----------------------+-------------------+---------------------+---------+--------+-----------
20 | 2 | _timescaledb_internal | _hyper_2_20_chunk | | t | 0 | f
(1 row)
timescale-doctor fails:
> timescale-doctor postgres://postgres:1@localhost:5433
Traceback (most recent call last):
File "/home/james/workspace/doctor/venv/bin/timescale-doctor", line 8, in <module>
sys.exit(main())
File "/home/james/workspace/doctor/venv/lib/python3.10/site-packages/doctor/cli.py", line 57, in main
check_rules(dsn)
File "/home/james/workspace/doctor/venv/lib/python3.10/site-packages/doctor/__init__.py", line 83, in check_rules
for report in rule.execute(conn, rule.message):
File "/home/james/workspace/doctor/venv/lib/python3.10/site-packages/doctor/__init__.py", line 62, in execute
cursor.execute(self.query) # pylint: disable=E1101
File "/nix/store/khvw06lqds60i0npy58zs27xpwb8x0d9-python3.10-psycopg2-2.9.5/lib/python3.10/site-packages/psycopg2/extras.py", line 236, in execute
return super().execute(query, vars)
psycopg2.errors.UndefinedTable: relation "_timescaledb_internal._hyper_2_20_chunk" does not exist
Since it is not clear what rules are available it would be good to have an option to list rules and their descriptions, something like
timescale-doctor --list=* --show=brief
Option --list
would take a glob pattern to match the rules to list, so *
here means match all (and would be the default).
Option --show
would select how much to show for each rule: just the description, brief or detailed messages, or even the query.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.