Coder Social home page Coder Social logo

Comments (7)

soumilshah1995 avatar soumilshah1995 commented on August 16, 2024

slack chat Thread https://apache-hudi.slack.com/archives/C4D716NPQ/p1718691646054409

this issue arises when attempting to use transformer with
--source-class org.apache.hudi.utilities.sources.debezium.PostgresDebeziumSource \

--transformer-class org.apache.hudi.utilities.transform.SqlQueryBasedTransformer
--hoodie-conf 'hoodie.deltastreamer.transformer.sql=SELECT * FROM '

Throws an issue even when using Select *

from hudi.

ad1happy2go avatar ad1happy2go commented on August 16, 2024

@ashwinagalcha-ps The sql should have SRC also, right?
"--hoodie-conf", "hoodie.streamer.transformer.sql=SELECT *, extract(year from a.created_at) as year FROM a",

https://hudi.apache.org/docs/transforms/

from hudi.

ashwinagalcha-ps avatar ashwinagalcha-ps commented on August 16, 2024

@ad1happy2go I missed to add here. But we did test with in the query. Sorry for the confusion, i'll update the description.

from hudi.

ashwinagalcha-ps avatar ashwinagalcha-ps commented on August 16, 2024

@ad1happy2go It was already added here but since it was plain text and not code github skipped <> while saving.

from hudi.

soumilshah1995 avatar soumilshah1995 commented on August 16, 2024

Hey Aditya I personally verified SQL transformer throws error

Here is lab : https://github.com/soumilshah1995/universal-datalakehouse-postgres-ingestion-deltastreamer

I changed the streamer code


========================================================
spark-submit \
    --class org.apache.hudi.utilities.streamer.HoodieStreamer \
    --packages 'org.apache.hudi:hudi-spark3.4-bundle_2.12:0.14.0,org.apache.hadoop:hadoop-aws:3.3.2' \
    --properties-file spark-config.properties \
    --master 'local[*]' \
    --executor-memory 1g \
     /Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/jar/hudi-utilities-slim-bundle_2.12-0.14.0.jar \
    --table-type COPY_ON_WRITE \
    --target-base-path 's3a://warehouse/database=default/table_name=salestest'  \
    --target-table salestest \
    --op UPSERT \
    --source-limit 4000000 \
    --source-class org.apache.hudi.utilities.sources.debezium.PostgresDebeziumSource \
    --min-sync-interval-seconds 10 \
    --continuous \
    --source-ordering-field _event_origin_ts_ms \
    --transformer-class org.apache.hudi.utilities.transform.SqlQueryBasedTransformer \
    --hoodie-conf bootstrap.servers=localhost:7092 \
    --hoodie-conf schema.registry.url=http://localhost:8081 \
    --hoodie-conf hoodie.deltastreamer.schemaprovider.registry.url=http://localhost:8081/subjects/hive.public.sales-value/versions/latest \
    --hoodie-conf hoodie.deltastreamer.source.kafka.value.deserializer.class=io.confluent.kafka.serializers.KafkaAvroDeserializer \
    --hoodie-conf hoodie.deltastreamer.source.kafka.topic=hive.public.sales \
    --hoodie-conf auto.offset.reset=earliest \
    --hoodie-conf 'hoodie.deltastreamer.transformer.sql=SELECT * FROM <SRC>' \
    --hoodie-conf hoodie.datasource.write.recordkey.field=salesid \
    --hoodie-conf 'hoodie.datasource.write.partitionpath.field=' \
    --hoodie-conf hoodie.datasource.write.precombine.field=_event_origin_ts_ms


if you see we just tried simple select *

    --hoodie-conf 'hoodie.deltastreamer.transformer.sql=SELECT * FROM <SRC>' \

from hudi.

ad1happy2go avatar ad1happy2go commented on August 16, 2024

Thanks @soumilshah1995 . Will checkout.

from hudi.

soumilshah1995 avatar soumilshah1995 commented on August 16, 2024

aye aye captain

from hudi.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.