When using Kafka + Debezium + Streamer, we are able to write data and the job works fi

slack chat Thread <a href="https://apache-hudi.slack.com/archives/C4D716NPQ/p171869164

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

[SUPPORT] SqlQueryBasedTransformer new field issue with PostgresDebeziumSource about hudi HOT 7 OPEN

ashwinagalcha-ps commented on August 16, 2024

[SUPPORT] SqlQueryBasedTransformer new field issue with PostgresDebeziumSource

from hudi.

Comments (7)

soumilshah1995 commented on August 16, 2024

slack chat Thread https://apache-hudi.slack.com/archives/C4D716NPQ/p1718691646054409

this issue arises when attempting to use transformer with
--source-class org.apache.hudi.utilities.sources.debezium.PostgresDebeziumSource \

--transformer-class org.apache.hudi.utilities.transform.SqlQueryBasedTransformer
--hoodie-conf 'hoodie.deltastreamer.transformer.sql=SELECT * FROM '

Throws an issue even when using Select *

from hudi.

ad1happy2go commented on August 16, 2024

@ashwinagalcha-ps The sql should have SRC also, right?
"--hoodie-conf", "hoodie.streamer.transformer.sql=SELECT *, extract(year from a.created_at) as year FROM a",

https://hudi.apache.org/docs/transforms/

from hudi.

ashwinagalcha-ps commented on August 16, 2024

@ad1happy2go I missed to add here. But we did test with in the query. Sorry for the confusion, i'll update the description.

from hudi.

ashwinagalcha-ps commented on August 16, 2024

@ad1happy2go It was already added here but since it was plain text and not code github skipped <> while saving.

from hudi.

soumilshah1995 commented on August 16, 2024

Hey Aditya I personally verified SQL transformer throws error

Here is lab : https://github.com/soumilshah1995/universal-datalakehouse-postgres-ingestion-deltastreamer

I changed the streamer code


========================================================
spark-submit \
    --class org.apache.hudi.utilities.streamer.HoodieStreamer \
    --packages 'org.apache.hudi:hudi-spark3.4-bundle_2.12:0.14.0,org.apache.hadoop:hadoop-aws:3.3.2' \
    --properties-file spark-config.properties \
    --master 'local[*]' \
    --executor-memory 1g \
     /Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/jar/hudi-utilities-slim-bundle_2.12-0.14.0.jar \
    --table-type COPY_ON_WRITE \
    --target-base-path 's3a://warehouse/database=default/table_name=salestest'  \
    --target-table salestest \
    --op UPSERT \
    --source-limit 4000000 \
    --source-class org.apache.hudi.utilities.sources.debezium.PostgresDebeziumSource \
    --min-sync-interval-seconds 10 \
    --continuous \
    --source-ordering-field _event_origin_ts_ms \
    --transformer-class org.apache.hudi.utilities.transform.SqlQueryBasedTransformer \
    --hoodie-conf bootstrap.servers=localhost:7092 \
    --hoodie-conf schema.registry.url=http://localhost:8081 \
    --hoodie-conf hoodie.deltastreamer.schemaprovider.registry.url=http://localhost:8081/subjects/hive.public.sales-value/versions/latest \
    --hoodie-conf hoodie.deltastreamer.source.kafka.value.deserializer.class=io.confluent.kafka.serializers.KafkaAvroDeserializer \
    --hoodie-conf hoodie.deltastreamer.source.kafka.topic=hive.public.sales \
    --hoodie-conf auto.offset.reset=earliest \
    --hoodie-conf 'hoodie.deltastreamer.transformer.sql=SELECT * FROM <SRC>' \
    --hoodie-conf hoodie.datasource.write.recordkey.field=salesid \
    --hoodie-conf 'hoodie.datasource.write.partitionpath.field=' \
    --hoodie-conf hoodie.datasource.write.precombine.field=_event_origin_ts_ms

if you see we just tried simple select *

    --hoodie-conf 'hoodie.deltastreamer.transformer.sql=SELECT * FROM <SRC>' \

from hudi.

ad1happy2go commented on August 16, 2024

Thanks @soumilshah1995 . Will checkout.

from hudi.

soumilshah1995 commented on August 16, 2024

aye aye captain

from hudi.

[SUPPORT] SqlQueryBasedTransformer new field issue with PostgresDebeziumSource about hudi HOT 7 OPEN

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent