Coder Social home page Coder Social logo

Comments (5)

ewencp avatar ewencp commented on August 27, 2024

@13h3r The HDFS connector manages offsets itself, so I suspect you're interpreting the last part of step 2 incorrectly. You are right that offsets are committed for the consumer because the framework does this and we haven't yet exposed a way for the connector to disable it. We have a JIRA KAFKA-3462 for that and it even has a patch, but I wasn't able to review it and merge it before the 0.10.0.0 deadline.

When a task starts, it figures out what data has been delivered to HDFS (by looking at files in the directory and the WAL). Ultimately this is what tells it where to start. You can see that if it is in an initial state, it starts by trying to run a recovery process. As part of that process, it will reset the current offset to the offset where data was actually delivered.

The key to all this is that in step 1, the data is only being written to a temp file. The WAL is used to indicate the temp file should be delivered into its final location. Even if we crash between the WAL entry being written but before it is moved to its final location, the recovery process can handle completing that process.

In your scenario, the temporary data is simply discarded, but we have not actually lost any data. The next task that resumes processing that topic partition will correctly pick up where the last one left off.

from kafka-connect-hdfs.

ewencp avatar ewencp commented on August 27, 2024

By the way, I'd love to get a full design section into our documentation -- we'd just need to find enough time to get that written up.

from kafka-connect-hdfs.

13h3r avatar 13h3r commented on August 27, 2024

@ewencp thanks for the detailed explanation.

But when we have no files we are unable to reset offset and messages that has been read before are lost. This can be solved by disabling offset committing to kafka, but what about providing API to commit offset from connector instead of disabling it? This maybe useful to monitor delivery lag.

from kafka-connect-hdfs.

Ishiihara avatar Ishiihara commented on August 27, 2024

@13h3r We have a PR in Kafka that allows disabling offset commits in Kafka apache/kafka#1139

from kafka-connect-hdfs.

Ishiihara avatar Ishiihara commented on August 27, 2024

Close this because of not an issue.

from kafka-connect-hdfs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.