Comments (7)
slack chat Thread https://apache-hudi.slack.com/archives/C4D716NPQ/p1718691646054409
this issue arises when attempting to use transformer with
--source-class org.apache.hudi.utilities.sources.debezium.PostgresDebeziumSource \
--transformer-class org.apache.hudi.utilities.transform.SqlQueryBasedTransformer
--hoodie-conf 'hoodie.deltastreamer.transformer.sql=SELECT * FROM '
Throws an issue even when using Select *
from hudi.
@ashwinagalcha-ps The sql should have SRC also, right?
"--hoodie-conf", "hoodie.streamer.transformer.sql=SELECT *, extract(year from a.created_at) as year FROM a",
https://hudi.apache.org/docs/transforms/
from hudi.
@ad1happy2go I missed to add here. But we did test with in the query. Sorry for the confusion, i'll update the description.
from hudi.
@ad1happy2go It was already added here but since it was plain text and not code github skipped <>
while saving.
from hudi.
Hey Aditya I personally verified SQL transformer throws error
Here is lab : https://github.com/soumilshah1995/universal-datalakehouse-postgres-ingestion-deltastreamer
I changed the streamer code
========================================================
spark-submit \
--class org.apache.hudi.utilities.streamer.HoodieStreamer \
--packages 'org.apache.hudi:hudi-spark3.4-bundle_2.12:0.14.0,org.apache.hadoop:hadoop-aws:3.3.2' \
--properties-file spark-config.properties \
--master 'local[*]' \
--executor-memory 1g \
/Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/jar/hudi-utilities-slim-bundle_2.12-0.14.0.jar \
--table-type COPY_ON_WRITE \
--target-base-path 's3a://warehouse/database=default/table_name=salestest' \
--target-table salestest \
--op UPSERT \
--source-limit 4000000 \
--source-class org.apache.hudi.utilities.sources.debezium.PostgresDebeziumSource \
--min-sync-interval-seconds 10 \
--continuous \
--source-ordering-field _event_origin_ts_ms \
--transformer-class org.apache.hudi.utilities.transform.SqlQueryBasedTransformer \
--hoodie-conf bootstrap.servers=localhost:7092 \
--hoodie-conf schema.registry.url=http://localhost:8081 \
--hoodie-conf hoodie.deltastreamer.schemaprovider.registry.url=http://localhost:8081/subjects/hive.public.sales-value/versions/latest \
--hoodie-conf hoodie.deltastreamer.source.kafka.value.deserializer.class=io.confluent.kafka.serializers.KafkaAvroDeserializer \
--hoodie-conf hoodie.deltastreamer.source.kafka.topic=hive.public.sales \
--hoodie-conf auto.offset.reset=earliest \
--hoodie-conf 'hoodie.deltastreamer.transformer.sql=SELECT * FROM <SRC>' \
--hoodie-conf hoodie.datasource.write.recordkey.field=salesid \
--hoodie-conf 'hoodie.datasource.write.partitionpath.field=' \
--hoodie-conf hoodie.datasource.write.precombine.field=_event_origin_ts_ms
if you see we just tried simple select *
--hoodie-conf 'hoodie.deltastreamer.transformer.sql=SELECT * FROM <SRC>' \
from hudi.
Thanks @soumilshah1995 . Will checkout.
from hudi.
aye aye captain
from hudi.
Related Issues (20)
- [SUPPORT] Unable to Use DynamoDB Based Lock with Hudi PySpark Job Locally HOT 8
- [SUPPORT] Serde properties missing after migrate from hivesync to gluesync HOT 4
- [SUPPORT] HOT 1
- [SUPPORT] The clean service can't clean historical version files after the savepoint instant when i set `hoodie.archive.beyond.savepoint=true` HOT 1
- [SUPPORT] Multi Writer DeltaStreamer (W1 and W2) Writing into Partition IN and US One of them failing
- [SUPPORT] Data deduplication caused by drawback in the delete invalid files before commit HOT 9
- [SUPPORT] hoodie.datasource.write.precombine.field is invalid HOT 3
- [SUPPORT] HOT 5
- [SUPPORT] hoodie.cleaner.commits.retained Setting Overridden, Warning to Increase to 20 HOT 3
- [SUPPORT] select lots of values via Record Index HOT 7
- [SUPPORT] URI too long error HOT 5
- [SUPPORT] AWS Glue: An error occurred while calling o333.save. Failed to apply clean commit to metadata HOT 1
- Does Hudi has the warm/cold data archive solution HOT 1
- [SUPPORT] Caused by: org.apache.hudi.exception.HoodieException: Executor executes action [commits the instant 20240618064120870] error HOT 6
- [SUPPORT] Caused by: java.lang.ClassNotFoundException: org.apache.hudi.DefaultSource after hudi upgraded to 6.15 HOT 6
- unable to connect hudi from hive CLI HOT 3
- read data from hudi using trino HOT 1
- [SUPPORT]Failed to Read .log file when i using trino to select hudi table HOT 4
- [SUPPORT] - Performance Variation in Hudi 0.14 HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hudi.