Comments (6)
@lei-su-awx Yes you are correct. the schema file grows with every commit and grows exponential which will cause this kind of issues. We need to redesign this schema commit file.
Created tracking JIRA for the same - https://issues.apache.org/jira/browse/HUDI-7481
from hudi.
@ad1happy2go got it, thanks for your reply.
from hudi.
Hi @ad1happy2go I was wondering if there is any configuration to compact schemacommit file or only store the latest schema or any other workaround?
from hudi.
@lei-su-awx
how many version of internalSchema stored in the schemacommit.
Only when there has been a schema change will a new version of the schema be generated. Do you have frequent schema changes for this job?
Normally, a table should not have too frequent schema changes.
If there is no change in field type, you can directly clear the. schema directory
from hudi.
@xiarixiaoyao Even without the schema changes, this grows infinitely, may be as sync time always change with every update.
from hudi.
@ad1happy2go @lei-su-awx will take a look for hudi 0.14.0.
we use hudi 0.11.0 and have no such problem.
from hudi.
Related Issues (20)
- [SUPPORT] Pulsar connection error for Hoodie Streamer HOT 1
- Failed insert schema compatibility mismatch issue HOT 9
- [SUPPORT] Datadog Metrics reporter fails with null pointer exception using hudi 0.14.0
- HUDI 0.14.1 and AWS GLUE 4.0 issues with schema evolution HOT 2
- [logical delete data] How to use flink-cdc to logical delete the hudi data HOT 1
- [SUPPORT] Flink bucket index partitioner may cause data skew HOT 6
- [SUPPORT] Failed to parse HoodieCommitMetadata HOT 1
- [SUPPORT] NPE when using PySpark with release-0.15.0 HOT 4
- org.apache.hudi.exception.HoodieException: org.apache.avro.AvroTypeException: Cannot encode decimal with precision 14 as max precision 13 HOT 6
- [SUPPORT] Failed to upsert for commit time xxxx ,HUDI 0.14.1 & Glue 4.0 HOT 4
- [SUPPORT] - Partial update of the MOR table after compaction with Hudi Streamer HOT 7
- [SUPPORT] Spark-Hudi: Unable to perform Hard delete using Pyspark on HUDI table from AWS Glue HOT 7
- [SUPPORT] Issue with RECORD_INDEX Initialization Falling Back to GLOBAL_SIMPLE HOT 1
- duplicated records when use insert overwrite HOT 4
- [SUPPORT] CVE problems in latest 0.14.1
- [SUPPORT] using spark's observe feature on dataframes saved by hudi is stuck HOT 3
- Corrupted parquet file in hudi partition | Deletion of partition in Hudi HOT 6
- [SUPPORT] Multi Writer Jobs with OCC (U1 and U2) with Async Cleaner
- [SUPPORT] how to migrate exist bloom index table to bucket table HOT 2
- [SUPPORT] Unable to Use DynamoDB Based Lock with Hudi PySpark Job Locally HOT 8
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hudi.