Coder Social home page Coder Social logo

Comments (4)

hpgrahsl avatar hpgrahsl commented on June 10, 2024 1

Ok I see. Well the short answer is no, not out-of-the-box based on the current version. However, many functionalities of the sink connector implementation have been written with customizability in mind. That said, you could come up with your own individual write model behaviour by implementing the corresponding interfaces and / or extending existing classes with your custom ones where you override the default behaviour.

One more question, just to make sure I fully got you. Instead of sourcing your real-time processing application based on the change streams topic resulting after sinking the data in to MongoDB, why can't you just directly source this from the original kafka topic that results from the Postgres CDC?

from kafka-connect-mongodb.

hpgrahsl avatar hpgrahsl commented on June 10, 2024

Hi @HyunSangHan!

thanks for reaching out. So you are right that the MongoDB SinkConnector is implemented such that it applies a replace operation on the document that was matched. This is by design, and allows to employ an upsert semantic for both insert and update operations that it processes from the kafka topic. Also a CDC message from kafka always results in updating the full document on MongoDB side, so partial updates aren't supported because you don't really need them for a CDC pipeline use case.

So maybe you can elaborate a bit, why would you need to have a differentiation in that regard? The end result after processing the CDC records with the currently imposed write model semantics should give you the correct document state on MongoDB side. But it might be, that I misunderstood your question.

from kafka-connect-mongodb.

HyunSangHan avatar HyunSangHan commented on June 10, 2024

@hpgrahsl

I'm so glad to get your quick reply!! Thank you :)

So maybe you can elaborate a bit, why would you need to have a differentiation in that regard?

I am planning to make more pipeline after consuming the message.
Let me explain as below.

I told you "Postgres --> Kafka --> MongoDB", that exactly means

Postgres --> Kafka --> (MongoSinkConnector) --> MongoDB

but there was a part that has been omitted. My whole plan is:

Postgres --> Kafka --> (MongoSinkConnector) --> MongoDB --> (Mongo change streams) --> (Real-time processing application) --> (Kafka) --> (Many applications as consumers) --> ...

As a result, I will use changes of documents to produce Kafka messages again by Mongo change streams.
When using mongo change streams, I can get updateDescription.updatedFields that shows what fields are updated if it has update as the operationType. However, if operationType is replace, there's no field like that. As you can see mongodb docs:

Finally, I need to get updateDescription.updatedFields from Mongo change streams and that's why I want MongoSinkConnector to update documents with update operationType.
Is there any way to do it with update operationType by MongoSinkConnector?

from kafka-connect-mongodb.

HyunSangHan avatar HyunSangHan commented on June 10, 2024

@hpgrahsl
It is a good question! I think that my explanation was not enough.

There are a few reasons:

  1. First of all, I need to sink the data anyway from Postgres to MongoDB to reuse them next time from MongoDB. (So I cannot skip this sink process.)
  2. If I directly source the data from the original Kafka topic, I couldn't guarantee if those data also saved to MongoDB successfully, because it seems like splitting the pipeline in two!(Kafka --> MongoDB as well as Kafka --> real-time processing application) That's why I want to consume the data after being successfully saved.(Kafka --> MongoDB --> real-time processing application)
  3. It's not necessary to depend on something like Schema Registry when consuming the data in the application anymore.

from kafka-connect-mongodb.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.