Comments (12)
Thanks @soumilshah1995 for the details. can you share the full Hudi Streamer command/code which you were using?
from hudi.
its here https://github.com/soumilshah1995/DeltaHudiTransformations
from hudi.
I tried --op insert_overwrite I guess its not natively supported on docs as well it says support for insert | bulk_insert| upsert
from hudi.
@soumilshah1995 I am just thinking does insert_overwrite is even make sense for streaming workloads. Do you have any such use case?
For sources like Kafka, for sure doesn't makes sense at all.
from hudi.
The only use case I can think of is for table maintenance activity, may be run it with mode run_once.
from hudi.
Could you confirm if DeltaStreamer supports "insert_overwrite"? If not, I'm interested in understanding why. The reason for this inquiry is that in scenarios where I'm utilizing SQLSource and need to rectify an entire partition from which I'm reading, I would prefer to use "insert_overwrite" as it facilitates index lookup, akin to what "upsert" would accomplish. Ideally, having support for "insert_overwrite" in DeltaStreamer would prove immensely beneficial.
from hudi.
adding insert_overwrite can also help to build gold zone I can read data from multiple hudi tables and insert_overwrite into gold aggregated tables
from hudi.
@soumilshah1995 This makes sense. Create a JIRA also to track - https://issues.apache.org/jira/browse/HUDI-7558
from hudi.
As, Sudha suggested, can you also send a mail to dev list thread and point the conversation here. Good to hear thought on this from others.
from hudi.
Roger that
from hudi.
you want me to close this ?
from hudi.
ill send email [email protected]
ill close this thread
from hudi.
Related Issues (20)
- [SUPPORT] The clean service can't clean historical version files after the savepoint instant when i set `hoodie.archive.beyond.savepoint=true` HOT 1
- [SUPPORT] Multi Writer DeltaStreamer (W1 and W2) Writing into Partition IN and US One of them failing
- [SUPPORT] Data deduplication caused by drawback in the delete invalid files before commit HOT 9
- [SUPPORT] hoodie.datasource.write.precombine.field is invalid HOT 3
- [SUPPORT] HOT 5
- [SUPPORT] hoodie.cleaner.commits.retained Setting Overridden, Warning to Increase to 20 HOT 3
- [SUPPORT] select lots of values via Record Index HOT 7
- [SUPPORT] URI too long error HOT 5
- [SUPPORT] AWS Glue: An error occurred while calling o333.save. Failed to apply clean commit to metadata HOT 1
- Does Hudi has the warm/cold data archive solution HOT 1
- [SUPPORT] Caused by: org.apache.hudi.exception.HoodieException: Executor executes action [commits the instant 20240618064120870] error HOT 6
- [SUPPORT] SqlQueryBasedTransformer new field issue with PostgresDebeziumSource HOT 8
- [SUPPORT] Caused by: java.lang.ClassNotFoundException: org.apache.hudi.DefaultSource after hudi upgraded to 6.15 HOT 7
- unable to connect hudi from hive CLI HOT 4
- read data from hudi using trino HOT 2
- [SUPPORT]Failed to Read .log file when i using trino to select hudi table HOT 7
- [SUPPORT] - Performance Variation in Hudi 0.14 HOT 6
- [SUPPORT] Unable to read Hudi table after hudi upgrade HOT 7
- [SUPPORT] hudi-example-dbt project in AWS EMR is not working HOT 6
- hoodie.properties.backup file does not exist HOT 8
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hudi.