Coder Social home page Coder Social logo

Comments (3)

zako avatar zako commented on July 24, 2024

There has not been a comment on this issue yet, but this is critical for anyone using a non-default partitioner based on the behavior I have experienced.

I was able to create a temporary fix by reverting changes made on 20151125 which would delete the WAL and tmp files instead of flushing and persisting them. Now a graceful shutdown or rebalance will commit inflight data and resume consuming data from Kafka based on its previous consumer offset. Obviously this is not ideal in case Kafka Connect is killed, someone nukes the persisted files in the commit directory or the consumer offsets are disrupted.

from kafka-connect-hdfs.

cotedm avatar cotedm commented on July 24, 2024

@zako I'd like to revisit this issue with you if you are still interested. Would you be able to share some more details here? The expected behavior of the state machine is such that the consumer offsets on the Kafka side will not be committed until there has been a confirmation of the write to the WAL file. In this way we are meant to guarantee that no data is marked as "read" from Kafka until it has been confirmed as "written" to HDFS in the form of the WAL file. This should be independent of your implementation of Partitioner as this logic is all done in TopicPartitionWriter.

The changes from 20151125 should make it so that any WAL file that is not yet been confirmed as written and closed by HDFS is discarded upon rebalance so that you end up with a clean transition. If you have a reproducible test case, I would be interested in trying it out.

from kafka-connect-hdfs.

cotedm avatar cotedm commented on July 24, 2024

@zako please let us know if you have a reproducible test case here as we would like to get this fixed, but without a test case we can't see the circumstances that lead to the problem. I'll reopen this if you are able to provide such a test case. Thanks!

from kafka-connect-hdfs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.