Coder Social home page Coder Social logo

pgsql-bloat-estimation's People

Contributors

ioguix avatar pgiraud avatar tilkow avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pgsql-bloat-estimation's Issues

replacing JOIN pg_catalog.pg_stats with LEFT JOIN

Currently, the script joins with pg_stats (and limits infos about the index only if a corresponding entry is found in pgstats). How about if we do a left join instead? then we need to adjust the code to avoid division by zero and we are done.

-- WARNING: executed with a non-superuser role, the query inspect only index on tables you are granted to read.
-- WARNING: rows with is_na = 't' are known to have bad statistics ("name" type is not supported).
-- This query is compatible with PostgreSQL 8.2 and after
SELECT current_database(), nspname AS schemaname, tblname, idxname, bs*(relpages)::bigint AS real_size,
  bs*(relpages-est_pages)::bigint AS extra_size,
  100 * (relpages-est_pages)::float / relpages AS extra_ratio,
  fillfactor,
  CASE WHEN relpages > est_pages_ff
    THEN bs*(relpages-est_pages_ff)
    ELSE 0
  END AS bloat_size,
  100 * (case when relpages > est_pages_ff then relpages-est_pages_ff else 0 end)::float / relpages AS bloat_ratio,
  is_na
  -- , 100-(pst).avg_leaf_density AS pst_avg_bloat, est_pages, index_tuple_hdr_bm, maxalign, pagehdr, nulldatawidth, nulldatahdrwidth, reltuples, relpages -- (DEBUG INFO)
FROM (
  SELECT 
      coalesce(1 +
         ceil(reltuples/greatest(floor(greatest(bs-pageopqdata-pagehdr,1)/(4+nulldatahdrwidth)::float), 1)), 0 -- ItemIdData size + computed avg size of a tuple (nulldatahdrwidth)
      ) AS est_pages,
      coalesce(1 +
         ceil(reltuples/greatest(floor(greatest(bs-pageopqdata-pagehdr,1)*fillfactor/(100*(4+nulldatahdrwidth)::float)), 1)), 0
      ) AS est_pages_ff,
      bs, nspname, tblname, idxname, relpages, fillfactor, is_na
  FROM (
      SELECT maxalign, bs, nspname, tblname, idxname, reltuples, relpages, idxoid, fillfactor,
            ( index_tuple_hdr_bm +
                maxalign - CASE -- Add padding to the index tuple header to align on MAXALIGN
                  WHEN index_tuple_hdr_bm%maxalign = 0 THEN maxalign
                  ELSE index_tuple_hdr_bm%maxalign
                END
              + nulldatawidth + maxalign - CASE -- Add padding to the data to align on MAXALIGN
                  WHEN nulldatawidth = 0 THEN 0
                  WHEN nulldatawidth::integer%maxalign = 0 THEN maxalign
                  ELSE nulldatawidth::integer%maxalign
                END
            )::numeric AS nulldatahdrwidth, pagehdr, pageopqdata, is_na
            -- , index_tuple_hdr_bm, nulldatawidth -- (DEBUG INFO)
      FROM (
          SELECT n.nspname, i.tblname, i.idxname, i.reltuples, i.relpages,
              i.idxoid, i.fillfactor, current_setting('block_size')::numeric AS bs,
              CASE -- MAXALIGN: 4 on 32bits, 8 on 64bits (and mingw32 ?)
                WHEN version() ~ 'mingw32' OR version() ~ '64-bit|x86_64|ppc64|ia64|amd64' THEN 8
                ELSE 4
              END AS maxalign,
              /* per page header, fixed size: 20 for 7.X, 24 for others */
              24 AS pagehdr,
              /* per page btree opaque data */
              16 AS pageopqdata,
              /* per tuple header: add IndexAttributeBitMapData if some cols are null-able */
              CASE WHEN max(coalesce(s.null_frac,0)) = 0
                  THEN 2 -- IndexTupleData size
                  ELSE 2 + (( 32 + 8 - 1 ) / 8) -- IndexTupleData size + IndexAttributeBitMapData size ( max num filed per index + 8 - 1 /8)
              END AS index_tuple_hdr_bm,
              /* data len: we remove null values save space using it fractionnal part from stats */
              sum( (1-coalesce(s.null_frac, 0)) * coalesce(s.avg_width, 1024)) AS nulldatawidth,
              max( CASE WHEN i.atttypid = 'pg_catalog.name'::regtype THEN 1 ELSE 0 END ) > 0 AS is_na
          FROM (
              SELECT ct.relname AS tblname, ct.relnamespace, ic.idxname, ic.attpos, ic.indkey, ic.indkey[ic.attpos], ic.reltuples, ic.relpages, ic.tbloid, ic.idxoid, ic.fillfactor,
                  coalesce(a1.attnum, a2.attnum) AS attnum, coalesce(a1.attname, a2.attname) AS attname, coalesce(a1.atttypid, a2.atttypid) AS atttypid,
                  CASE WHEN a1.attnum IS NULL
                  THEN ic.idxname
                  ELSE ct.relname
                  END AS attrelname
              FROM (
                  SELECT idxname, reltuples, relpages, tbloid, idxoid, fillfactor, indkey,
                      pg_catalog.generate_series(1,indnatts) AS attpos
                  FROM (
                      SELECT ci.relname AS idxname, ci.reltuples, ci.relpages, i.indrelid AS tbloid,
                          i.indexrelid AS idxoid,
                          coalesce(substring(
                              array_to_string(ci.reloptions, ' ')
                              from 'fillfactor=([0-9]+)')::smallint, 90) AS fillfactor,
                          i.indnatts,
                          pg_catalog.string_to_array(pg_catalog.textin(
                              pg_catalog.int2vectorout(i.indkey)),' ')::int[] AS indkey
                      FROM pg_catalog.pg_index i
                      JOIN pg_catalog.pg_class ci ON ci.oid = i.indexrelid
                      WHERE ci.relam=(SELECT oid FROM pg_am WHERE amname = 'btree')
                      AND ci.relpages > 0
                  ) AS idx_data
              ) AS ic
              JOIN pg_catalog.pg_class ct ON ct.oid = ic.tbloid
              LEFT JOIN pg_catalog.pg_attribute a1 ON
                  ic.indkey[ic.attpos] <> 0
                  AND a1.attrelid = ic.tbloid
                  AND a1.attnum = ic.indkey[ic.attpos]
              LEFT JOIN pg_catalog.pg_attribute a2 ON
                  ic.indkey[ic.attpos] = 0
                  AND a2.attrelid = ic.idxoid
                  AND a2.attnum = ic.attpos
            ) i
            JOIN pg_catalog.pg_namespace n ON n.oid = i.relnamespace
             left JOIN pg_catalog.pg_stats s ON s.schemaname = n.nspname
                                      AND s.tablename = i.attrelname
                                      AND s.attname = i.attname
            GROUP BY 1,2,3,4,5,6,7,8,9,10,11
      ) AS rows_data_stats
  ) AS rows_hdr_pdg_stats
) AS relation_stats
--ORDER BY nspname, tblname, idxname;
order by real_size desc

Big bloat discrepencies compared to pgstattuple

Using the table bloat checker here I get this on my database (PG 14):

 current_database | schemaname |          tblname           |  real_size  | extra_size  |     extra_pct      | fillfactor | bloat_size  |     bloat_pct      | is_na
------------------+------------+----------------------------+-------------+-------------+--------------------+------------+-------------+--------------------+-------
 cc_perf          | cc_merge   | ccx_ex_contactdrpservices  | 29351370752 | 14890582016 | 50.732151972784294 |        100 | 14890582016 | 50.732151972784294 | f

But with pgstattuple I get this:
SELECT tuple_percent, dead_tuple_percent, free_percent FROM pgstattuple('cc_merge.ccx_ex_contactdrpservices');

 tuple_percent | dead_tuple_percent | free_percent
---------------+--------------------+--------------
         94.65 |                  0 |         3.68

Why the big discrepency?

When fillfactor is 100%, bloat size is a negative value

according to the following line,
"Line 24 : FROM '%fillfactor=#"__#"%' FOR '#')::smallint, 100) AS fillfactor,",
It returns 10 when the fillfactor's value is 100.
So, there is a case where bloat size has negative value.
I wonder why you did not parse it to three digits.
Thanks.

Toasted text

As discussed today on daliconsultants:

In pg_stats, toasted data is exactly 18 bytes when data is totally toasted. So the query computes that the bloat is higher than real.

Perhaps is it possible to exclude text columns with 18.0 bytes and an extended/external storage ? That would not solve the case for columns with very different sizes, where avg_width is a mix of 18 and real sizes.

Why `IndexTupleData` size is 2

Hi, I noticed when you compute the b-tree bloat, the IndexTupleData size is 2. But I couldn't understand why the size of t_tid + t_info is 2 after I read the source code the 35th line in src/include/access/itup.h?

your sql:

 CASE WHEN max(coalesce(s.stanullfrac,0)) = 0
                  THEN 2 -- IndexTupleData size
                  ELSE 2 + (( 32 + 8 - 1 ) / 8) -- IndexTupleData size + IndexAttributeBitMapData size ( max num filed per index + 8 - 1 /8)
              END AS index_tuple_hdr_bm,

the source code:

typedef struct IndexTupleData
{
	ItemPointerData t_tid;		/* reference TID to heap tuple */

	/* ---------------
	 * t_info is laid out in the following fashion:
	 *
	 * 15th (high) bit: has nulls
	 * 14th bit: has var-width attributes
	 * 13th bit: AM-defined meaning
	 * 12-0 bit: size of tuple
	 * ---------------
	 */

	unsigned short t_info;		/* various info about tuple*/

} IndexTupleData;				/* MORE DATA FOLLOWS AT END OF STRUCT */

B-tree bloat estimation in PostgreSQL 13

Hi there,
I've faced with an issue while trying to estimate b-tree indexes bloat on newest PostgreSQL 13.
Old sql-queries gives irrelevant result on the same database while with PG 12 they work fine.
Could you please tell me do you have any plans to maintain compatibility with PG 13?

Thank you in advance

Alignement issues

As far as I understand the query, alignement is not taken care of. It is probably a bit far of the actual concern of the query but it might prove handy to tell users: such columns ordering means you lose that much space on disk because of alignment.

Table bloat query failing on PG 12

Hi, currently table bloat query breaks due to removal of pg_class.relhasoids.
I think best would be to just remove this part as it's been a deprecated feature for long time so only precision for system catalogs could suffer a bit, but they should be small anyways.

`real_size` in `pgsql-bloat-estimation/btree/btree_bloat.sql`

according to the definition of real_size: the real size of the index. the real_size is one-page size * the number of pages, but the 4th code, real_size = bs*(relpages). the relpages is Size of the on-disk representation of this table in pages (of size BLCKSZ), bs is also the size.

Question : is_na column for jsonb in postgreSQL 9.5

Hi @ioguix ,
Thank you for this useful tools.

I run table/table_bloat.sql against RDS Postgres instance running postgres 9.5.15.

Reading the output, only pg_catalog.pg_description table have is_na value f

Reading the README I found

any table which lacks some statistics. For example, the json type had no statistics available before 9.5

Checking my table schema I found that one of the column use jsonb type.

I'm confused is the is_na value in ouput is correct because my table use jsonb data type so the calculation is not correct and cannot be used as reference or the is_na value is incorrect since I use postgresql 9.5.15.

Thank you in advance.

bloat_ratio isn't well-documented

I'm new to this stuff. The docs say

bloat_ratio: estimated ratio of the real size used by bloat_size.

Should this read "percentage of the real table size estimated to be bloat"? So a value of 50.2 means "50.2%"?

(thanks for sharing these queries!)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.