Comments (1)
The following is the gist of changes done
- All low-level operation of creating a commit code was in HoodieClient which made it hard to share code if there was a compaction commit.
- HoodieTableMetadata contained a mix of metadata and filtering files. (Also few operations required FileSystem to be passed in because those were called from TaskExecutors and others had FileSystem as a global variable). Since merge-on-read requires a lot of that code, but will have to change slightly on how it operates on the metadata and how it filters the files. The two set of operation are split into HoodieTableMetaClient and TableFileSystemView.
- Everything (active commits, archived commits, cleaner log, save point log and in future delta and compaction commits) in HoodieTableMetaClient is a HoodieTimeline. Timeline is a series of instants, which has an in-built concept of inflight and completed commit markers.
- A timeline can be queries for ranges, contains and also use to create new datapoint (create a new commit etc). Commit (and all the above metadata) creation/deletion is streamlined in a timeline
- Multiple timelines can be merged into a single timeline, giving us an audit timeline to whatever happened in a hoodie dataset. This also helps with #55.
- Move to java 8 and introduce java 8 succinct syntax in refactored code
from hudi.
Related Issues (20)
- [SUPPORT]Problems with Hudi's version using LSM-tree HOT 1
- [SUPPORT] Error upsetting bucketType UPDATE for partition :20240119 HOT 1
- [FeatureRequest] Inquiry Regarding Hudi Exporter with SQL Transformer for Data Filtering HOT 2
- [SUPPORT]How to continue an unfinished compaction task HOT 2
- [SUPPORT] Setting hoodie.datasource.insert.dup.policy to drop still upserts the record in 0.14 HOT 7
- [SUPPORT] Unable to insert record into Hudi table using Hudi Spark Connector through Golang HOT 3
- [SUPPORT] Can't read a table with timestamp based partition key generator HOT 1
- [SUPPORT] Error in compiling hudi-1.0.0_hudi-1.0.0_beta1 HOT 1
- [SUPPORT] Error in compiling hudi-1.0.0_beta1 HOT 7
- [SUPPORT] The query result of the partition column as the filter condition is incorrect HOT 3
- [SUPPORT] Hudi wants to write the database in s3://datalake HOT 1
- [SUPPORT] Trying to load a parquet file. org.apache.avro.AvroRuntimeException: Not a valid schema field: ts HOT 1
- [SUPPORT] Process killed with no additional info when loading large parquet files in Spark HOT 14
- [SUPPORT]The k8s cluster submitted a task to write Spark streaming to Hudi, but encountered an error HOT 2
- [SUPPORT] We are getting Parquet not found error while reading a Hudi table from Flink. HOT 5
- [SUPPORT] could hudi skip shuffle in SortMergeJoin, like what bucketby does in Spark? HOT 2
- [SUPPORT] Using Hudi 0.11 and getting org.apache.hudi.hive.ddl.HMSDDLExecutor HOT 1
- [SUPPORT]compaction plan generate too fast HOT 6
- [SUPPORT] Spark Write into MoR type hudi table small parquets issue + Athena Internal Error HOT 8
- [SUPPORT] Can process parquet file if using upsert or bulk_insert but cannot process parquet file if using insert HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hudi.