Coder Social home page Coder Social logo

jobinau / pg_gather Goto Github PK

View Code? Open in Web Editor NEW
113.0 113.0 16.0 1000 KB

Scan PostgreSQL Instance for potential problems. pg_gather is a SQL-only script leveraging the built-in features of psql.

License: Other

Awk 1.87% HTML 75.06% Shell 7.52% PLpgSQL 15.56%
database performance-analysis postgres postgresql scannner

pg_gather's People

Contributors

arronax avatar jobinau avatar psvampa avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pg_gather's Issues

Create a wrapper script to execute pg_gather

There are two versions of the pg_gather script: for 9.5 and 9.6, and for the newer versions. This results in unnecessary overhead for the operator. Also, sometimes, when pg_gather is executed (or is requested to be executed), PostgreSQL version is not known in advance.

There should be a standard way to execute pg_gather without thinking about the underlying complexity. Something like pg_gather.sh -h -P -p -d... that takes care of checking the target instance's version, and runs the required SQL script.

Rarely users reported that additonal empty column appears along with wait-event data

A few users directly contacted and reported that sometimes they experience data import failures due to an additional empty column appearing data related to wait-events to pg_pid_wait. Even though the issue was reported on a very old version (v7), since it was not reproducible, noone filed a bug yet..
So I am creating this bug report on behalf of all of them.

Those databases where blks_fetch is zero also need to be present in the report

As per earlier design those databases where there is no blks_fetch information is avoided from the report.
In generally, they are small, newly created databases. So missing them from report was not a major concern.
However, When we share the report with end-users, missing databases from the list could be a problem
So it is important to list them also for the completeness of the report.

When generate_report fails, it may leave a docker container running

When generate_report.sh fails, it may leave a running docker container. For example, that happens when the target report file is not found.

$ ./generate_report.sh $PWD/CS0018462_out.txt
13: Pulling from library/postgres
Digest: sha256:117c3ea384ce21421541515edfb11f2997b2c853d4fdd58a455b77664c1adc20
Status: Image is up to date for postgres:13
docker.io/library/postgres:13
Docker container is 035c3073ad9ea3f5dd0dc9a46326caa8bd31ff04c517c693d5f8c6c9e137956a; will wait 3 seconds before proceeding
SET
DROP TABLE
...
CREATE TABLE
sed: can't read /home/skuzmichev/projects/pg_gather/CS0018462_out.txt: No such file or directory
$ docker ps
CONTAINER ID   IMAGE         COMMAND                  CREATED          STATUS          PORTS                 NAMES
035c3073ad9e   postgres:13   "docker-entrypoint.s…"   18 seconds ago   Up 17 seconds   5432/tcp              musing_liskov
...

Column leader_pid from pg_get_activity table too small

Hitting this error when running the gather_schema.sql script:

psql:<stdin>:71: ERROR:  value "3610288274858318246" is out of range for type integer
CONTEXT:  COPY pg_get_activity, line 1, column leader_pid: "3610288274858318246"

Changing the column type to bigint fixed it:

diff --git a/gather_schema.sql b/gather_schema.sql
index f465c32..8bb666f 100644
--- a/gather_schema.sql
+++ b/gather_schema.sql
@@ -68,7 +68,7 @@ CREATE UNLOGGED TABLE pg_get_activity (
     gss_auth boolean,
     gss_princ text,
     gss_enc boolean,
-    leader_pid integer
+    leader_pid bigint
 )
```;

Automate running pg_gather against all databases

Currently, pg_gather requires a connection to specific database in a cluster to get most details. However, what specific database should be targeted may not be clear immediately, and moreover there may be dozens of databases in the cluster.

Since we've started planning for a wrapper script to run pg_gather, perhaps we can also add some flag like (--all-dbs) to run pg_gather against every database automatically?

There are global and non-global parts in the resulting report, so we can also discuss separating them if we're going to be aiming for multiple sub-reports. In the beginning, however, just having multiple regular reports generated automatically should be a good quality of life improvement.

@jobinau, what's your take on this?

Bulk History import feature sometimes results in ERROR

\Importing :2021-09-16 02:30:01.492215+00
Partial
Pager usage is off.
COPY 1
psql::39: ERROR: invalid input syntax for type timestamp with time zone: "16405"
CONTEXT: COPY pg_get_activity, line 20, column collect_ts: "16405"
COPY 6198
COPY 9
COPY 0
COPY 0
COPY 1
COPY 1
COPY 8196

function max(pg_lsn) does not exist

Thank you very much for your work !

On the other hand, I have this error when trying to run the report.

It is normal?

Thanks.

-bash-4.2$ /usr/pgsql-11/bin/psql -X -f gather_report.sql > GatherReport.html
psql:gather_report.sql:180: ERROR: function max(pg_lsn) does not exist
LINE 1: ...TEST((SELECT(current_wal) FROM pg_gather),(SELECT MAX(sent_l...
^
HINT: No function matches the given name and argument types. You might need to add explicit type casts.

Gather per-table tuple statistics

When analyzing vacuum-related issues, it's important to know the load profile on the table: is it insert-heavy? Is it update heavy? How many updates have been done since the last vacuum or analyze? Are most updates HOT?
pg_gather should gather and provide stats similar to what pg_stat_user_tables view provides:

                                  View "pg_catalog.pg_stat_user_tables"
       Column        |           Type           | Collation | Nullable | Default | Storage | Description
---------------------+--------------------------+-----------+----------+---------+---------+-------------
 relid               | oid                      |           |          |         | plain   |
 schemaname          | name                     |           |          |         | plain   |
 relname             | name                     |           |          |         | plain   |
...
 n_tup_ins           | bigint                   |           |          |         | plain   |
 n_tup_upd           | bigint                   |           |          |         | plain   |
 n_tup_del           | bigint                   |           |          |         | plain   |
 n_tup_hot_upd       | bigint                   |           |          |         | plain   |
 n_live_tup          | bigint                   |           |          |         | plain   |
 n_dead_tup          | bigint                   |           |          |         | plain   |
 n_mod_since_analyze | bigint                   |           |          |         | plain   |
...

New feature request: MultiXactId reporting

Hello Jobin,
Currently, pg_gather reports the database/relation age. Over the last time, I faced a few MultiXactId scenarios. I would like pg_gather to report this if possible.
Best Regards,

Timestamp information on the report should be same as PG logs

It is found very difficult to correlate information from pg_gather report with PostgreSQL logs as they are displaying timestamp information in different timezones.
So it is important to have the pg_gather report have the timestamp information as the PostgreSQL Server

Timestamp from the moment when pg_gather is executed

It would be very good to have the time when pg_gather started to collect data logged and displayed in the header of the HTML report. Something as simple as:
select now(); show log_timezone;
we could later use to match entries in the PostgreSQL log and/or other monitoring systems.

ERRORs while running. ERROR: syntax error at or near ":" invalid command \if invalid command \endif

A sample pg_gather (gather.sql) output is

psql:/app/gather.sql:5: ERROR:  syntax error at or near ":"
LINE 1: SELECT ( :SERVER_VERSION_NUM > 120000 ) AS pg12, ( :SERVER_V...
                 ^
psql:/app/gather.sql:7: invalid command \if
SELECT (SELECT count(*) > 1 FROM pg_srvr) AS conlines \gset
\if :conlines
\echo SOMETHING WRONG, EXITING
SOMETHING WRONG, EXITING;
\q
\endif
psql:/app/gather.sql:15: invalid command \endif
Tuples only is on.
\t
\r
psql:/app/gather.sql:27: invalid command \if
COPY pg_srvr FROM stdin;

Multiple times of page rendering causing CPU spikes in browser.

When the Report is loaded, First HTML DOM with CSS is displayed as it is.
The analysis data comes later on top of that. Currently, jquery is used to manipulate the DOM by using CSS for highlighting and Mouseover hints (element.title) are directly inserted into the DOM elements.
All these dynamic changes result in Rendering the HTML page multiple times in the browser, causing CPU usage spikes and sometimes delays. This problem is more visible when the report is big in size.

Catch per-table overrides

It is possible to override various parameters--like vacuum scale factor--per table. pg_gather should detect and report on every table that has custom reloptions.

New feature: prepared_xacts

Having xmin age from prepared_xacts could help
I'm thinking of

select *, (now()-prepared)::INTERVAL as duration, age(transaction) as trx_age from pg_prepared_xacts order by age(transaction) desc;

Report header goes missing of timezone info is not available.

When we analyze partial data collection on a continuous data gathering. parameter values are not collected each time because that adds unwanted overhead.
But this will result in an unwanted consequence: the header data goes missing because it joins with the parameter value.

Showing WaitEvents as percentages

Hello Jobin!

The Session Details section currently presents WaitEvent values as cardinal numbers, E.g., WalWriterMain:1937,WALWrite:40,WALSync:16,CPU:7
What do you think about presenting those cardinal numbers as percentages? Wouldn’t it be more human-readable/friendly?
Using cardinal numbers over the data-collection time, we could say something like, “This session was statistically spending ##% of its time doing this, this, and that".
It adds no additional value but still, it can help to understand the output.

Best regards,

Improvement : Eleminate Sed utility from the requirement for data import

There were historical reasons why Unix/Linux Sed is used for cleaning up the data collected before importing it.
Many of those cases are gone over a period of time with multiple revisions.
Now, conceptually it is possible to have collected data clean enough for importing.
So the use of Sed utility can be eliminated.

Provide a list of extensions

A feature request: provide a list of extensions that are installed and that are available.
Right now, the only way to guess what's installed/available is by looking at the value of shared_preload_libraries.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.