Comments (3)
There has not been a comment on this issue yet, but this is critical for anyone using a non-default partitioner based on the behavior I have experienced.
I was able to create a temporary fix by reverting changes made on 20151125 which would delete the WAL and tmp files instead of flushing and persisting them. Now a graceful shutdown or rebalance will commit inflight data and resume consuming data from Kafka based on its previous consumer offset. Obviously this is not ideal in case Kafka Connect is killed, someone nukes the persisted files in the commit directory or the consumer offsets are disrupted.
from kafka-connect-hdfs.
@zako I'd like to revisit this issue with you if you are still interested. Would you be able to share some more details here? The expected behavior of the state machine is such that the consumer offsets on the Kafka side will not be committed until there has been a confirmation of the write to the WAL file. In this way we are meant to guarantee that no data is marked as "read" from Kafka until it has been confirmed as "written" to HDFS in the form of the WAL file. This should be independent of your implementation of Partitioner
as this logic is all done in TopicPartitionWriter
.
The changes from 20151125 should make it so that any WAL file that is not yet been confirmed as written and closed by HDFS is discarded upon rebalance so that you end up with a clean transition. If you have a reproducible test case, I would be interested in trying it out.
from kafka-connect-hdfs.
@zako please let us know if you have a reproducible test case here as we would like to get this fixed, but without a test case we can't see the circumstances that lead to the problem. I'll reopen this if you are able to provide such a test case. Thanks!
from kafka-connect-hdfs.
Related Issues (20)
- Issue in a Kerberized environment a day after renew ticket HOT 2
- Explain limitation listed in the documentation HOT 3
- using wrong user/keytab while there are multiple hdfs-sink connections HOT 1
- template file isn't committed and uploaded to storage when using AvroFormat
- java.util.ConcurrentModificationException during task rebalancing HOT 1
- log4j update schedule HOT 1
- Hive table does not match column names present in the parquet data
- Exception when reading Decimal types written by connector
- Hive Merge Feature
- Incremental Co-operative Rebalancing Support for HDFS Connector
- Error after install and unistall connect-transforms
- Adding Hive partition threw unexpected error
- HDFS2 connect compatibility with HDFS3 server
- CVE-2021-34538 HIGH vulnerability HOT 2
- Task is being killed and will not recover until manually restarted
- Allow to limit retry write errors by timeout
- Kafka Issue while running on docker and adding new connector HOT 1
- can't build because repo conjars is down
- multiple keytab kerberos issue HOT 1
- OzoneFileSystem
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kafka-connect-hdfs.