Coder Social home page Coder Social logo

Comments (4)

hpgrahsl avatar hpgrahsl commented on May 19, 2024

Hi @abhisheksingh87 - thanks for reaching out.

  1. exacty-once semantics: in general sink connectors cannot provide exactly-once behaviour for any given data and/or configuration. what is possible with e.g. my sink connector is to write against the sink (mongodb collection) with upsert semantics which, when combined with any unique attribute found in kafka records, gives you idempotent write behaviour. so if you make sure that your data in kafka exhibits any unique attribute in the first place you can achieve exactly-once semantics against the sink given a proper configuration (see DocumentIdAdder strategy options in the README)

  2. when it comes to retries there is currently a very simple logic based on the following 2 config options: mongodb.max.num.retries (default 3) & mongodb.retries.defer.timeout (default 5000ms). this means if MongoDB is down for more than 15 secs (roughly the time of the 3 retries) the retries are exhausted in the sink connector and it will be killed and waits for manual intervention. the connector would of course continue its work in case MongoDB come back up during the retries. you can raise both config settings as you see fit. what's currently missing is e.g. a better strategy like exponential backoff. there is an open feature request #61 - feel free to help enhance this :)

currently cannot comment on the general DLQ feature of the connect framework itself since I haven't used it for now. be aware though that it may not be available to you in case you are running a version of kafka where this didn't exist yet.

please let me know if this helps or you need anything else. thx.

from kafka-connect-mongodb.

abhisheksingh87 avatar abhisheksingh87 commented on May 19, 2024

Hi @hpgrahsl,
Thanks for your reply. We have some more clarification regarding the connector as mentioned below.

  1. Does the connector guarantees atleast once semantics?
  2. Is it possible to filter events based on any given parameter. We have use cases where we need to filter the write operations to mongodb collection based on an attribute in the event message.
    3.How is the offset management handled by the connector?

from kafka-connect-mongodb.

hpgrahsl avatar hpgrahsl commented on May 19, 2024
  1. yes of course. at least once semantics you get basically out-of-the-box without taking any special care on sink connector configuration. be adviced though that in my experience this is very rarely what you want. as I said you can make sure to have idempotent writes by doing key-based upserts just by configuring things accordingly and thereby get exactly-once semantics.

  2. neither the kafka connect framework itself nor the sink connector is supposed to do filtering on records. while you could achieve this by implementing your custom write model for the sink connector I would discourage you from doing that. most likely the better way to go here is to have a stream processor (kstreams or KSQL) which takes care of filtering and then have let sink connector process the already filtered topic.

  3. there is no explicit offset management done by the sink connector. it relies on what is configured on framework level i.e. connect commits the offsets in (configurable) regular intervals.

hope this clarifies your questions! if you plan to keep using my sink connector I'd be happy to learn about your concrete use case. ideally you're willing to share it "publicly" as a user voice/testimonial in the README. let me know :)

from kafka-connect-mongodb.

hpgrahsl avatar hpgrahsl commented on May 19, 2024

@abhisheksingh87 getting back to you about this. if anything is clarified I'd be happy if you close the issue. THX!

from kafka-connect-mongodb.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.