Comments (1)
The reason for this is that we do partition sampling (default 100) and extrapolate our stats based on the fetched partition stats. By increasing the sample count to greater than actual partitions, we see that the table stats match the unpartitioned counterpart -
presto:tpcds_sf1_parquet> set session hive.partition_statistics_sample_size=1825;
SET SESSION
presto:tpcds_sf1_parquet> show stats for hive.tpcds_sf1000_parquet_varchar_part.store_sales;
column_name | data_size | distinct_values_count | nulls_fraction | row_count | low_value | high_value | histogram
-----------------------+-----------+-----------------------+----------------------+---------------+-----------+------------+-----------
ss_sold_time_sk | NULL | 47961.0 | 0.04499776111740666 | NULL | 28800 | 75599 | NULL
ss_item_sk | NULL | 297612.0 | 0.0 | NULL | 1 | 300000 | NULL
ss_customer_sk | NULL | 1.2124495E7 | 0.04499698125304584 | NULL | 1 | 12000000 | NULL
ss_cdemo_sk | NULL | 1890006.0 | 0.04500093578341331 | NULL | 1 | 1920800 | NULL
ss_hdemo_sk | NULL | 7082.0 | 0.044998298272422764 | NULL | 1 | 7200 | NULL
ss_addr_sk | NULL | 5947530.0 | 0.044996645487757815 | NULL | 1 | 6000000 | NULL
ss_store_sk | NULL | 513.0 | 0.04499048261485481 | NULL | 1 | 1000 | NULL
ss_promo_sk | NULL | 1483.0 | 0.04499917917887129 | NULL | 1 | 1500 | NULL
ss_ticket_number | NULL | 1.0071624E8 | 0.0 | NULL | 1 | 240000000 | NULL
ss_quantity | NULL | 100.0 | 0.04499472152140729 | NULL | 1 | 100 | NULL
ss_wholesale_cost | NULL | 10091.0 | 0.04499681007177697 | NULL | 1.0 | 100.0 | NULL
ss_list_price | NULL | 19495.0 | 0.04499915278987244 | NULL | 1.0 | 200.0 | NULL
ss_sales_price | NULL | 19348.0 | 0.044999514249711985 | NULL | 0.0 | 200.0 | NULL
ss_ext_discount_amt | NULL | 529048.0 | 0.04500334759901894 | NULL | 0.0 | 19778.0 | NULL
ss_ext_sales_price | NULL | 600147.0 | 0.04499642951463563 | NULL | 0.0 | 19972.0 | NULL
ss_ext_wholesale_cost | NULL | 388752.0 | 0.04499845764808689 | NULL | 1.0 | 10000.0 | NULL
ss_ext_list_price | NULL | 731384.0 | 0.04499803472965792 | NULL | 1.0 | 20000.0 | NULL
ss_ext_tax | NULL | 115731.0 | 0.04499627500010287 | NULL | 0.0 | 1797.48 | NULL
ss_coupon_amt | NULL | 529048.0 | 0.04500334759901894 | NULL | 0.0 | 19778.0 | NULL
ss_net_paid | NULL | 786675.0 | 0.0449999816127706 | NULL | 0.0 | 19972.0 | NULL
ss_net_paid_inc_tax | NULL | 1098165.0 | 0.04500332989061181 | NULL | 0.0 | 21769.48 | NULL
ss_net_profit | NULL | 1023699.0 | 0.044990789213354636 | NULL | -10000.0 | 9986.0 | NULL
ss_sold_date_sk | NULL | 1823.0 | 0.04500048022595944 | NULL | 2450816 | 2452642 | NULL
NULL | NULL | NULL | NULL | 2.879987999E9 | NULL | NULL | NULL
(24 rows)
from presto.
Related Issues (20)
- CI job SingleStore tests failing HOT 1
- Add ARM64 Support for Building Prestissimo Docker Image on Mac M1 HOT 3
- Backport https://github.com/prestodb/presto/pull/22926 into 0.285, 0.286 and 0.287
- Pushdown (partial) rowNumber under join
- Flaky test: TestMemoryManager.testReservedPoolDisabledMultiCoordinator
- Add documentation for Geospatial types in main types page HOT 1
- For each agg function with input param as <T>, Add an equivalent agg function with input param as array<T> HOT 1
- [docs] Combine the descriptions of session property with configuration property for history based optimization
- singlestore-dockerized-tests job is failing often HOT 1
- Getting error while building in intelli idea HOT 1
- How to build a custom connector?
- How to build and run presto in intellij idea? HOT 1
- [native] Flaky test Taskmanager.buildSpilledDirectory Failrue
- Writer scaling fails for Parquet with smaller files HOT 5
- Flaky test: TestNoisySumGaussianLongAggregation.testNoisySumGaussianLongClippingSomeNoiseScaleWithinSomeStd() HOT 1
- Iceberg $changelog read fails on table with only one snapshot version.
- Pushdown partial TopN and RowNumber into UNION
- Inline cosntant cross joins
- Allow Presto Coordinator to ignore (not throw) negative runtime metrics.
- Update the MongoDB connector to support binData data type HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from presto.