Coder Social home page Coder Social logo

Comments (10)

ewencp avatar ewencp commented on July 24, 2024

@xianzhen 0.9 is the version of the broker, but which version of the connector are you using? It looks like the assignment and set of TopicPartitionWriters somehow got out of sync, which should not be possible.

from kafka-connect-hdfs.

xianzhen avatar xianzhen commented on July 24, 2024

@ewencp Thanks. Confluent version is 2.0.0. I check the source, One possible reason is that WorkerSinkTask clear all TopicPartitionWriter before ConsumerSinkTask is closed. I am not sure.

from kafka-connect-hdfs.

ewencp avatar ewencp commented on July 24, 2024

@xianzhen This has been fixed as of the 3.0.0 release. I think the issue was that onPartitionsRevoked was removing the TopicPartitionWriters but then close() was trying to use them. The newer version relies on the fact that the framework guarantees it will revoke the partitions before finally stopping the task.

from kafka-connect-hdfs.

raju-divakaran avatar raju-divakaran commented on July 24, 2024

I could see the same error popping up again and we are using confluent version 3.0.0!

This is mainly noticed in our staging setup, where we are having 3 kafka connect instances. So if I stop all 3 instances, start one by one this error comes up. Its like when one is started and when I start the second one the first dies. Then I start the first one the second dies. Eventually, I go through that cycle a couple of times to make it stable.

Also I think it happens at a particular stage, in the sense if I start all three together.. this happens. But when I start one by one really quick.. at times this is not noticed!

[2016-07-07 13:22:13,242] ERROR Uncaught exception in herder work thread, exiting: (org.apache.kafka.connect.runtime.distributed.DistributedHerder:183)
java.lang.NullPointerException
at org.apache.kafka.connect.runtime.WorkerSinkTask.stop(WorkerSinkTask.java:119)
at org.apache.kafka.connect.runtime.Worker.stopTask(Worker.java:397)
at org.apache.kafka.connect.runtime.Worker.stopTasks(Worker.java:373)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder$RebalanceListener.onRevoked(DistributedHerder.java:1064)
at org.apache.kafka.connect.runtime.distributed.WorkerCoordinator.onJoinPrepare(WorkerCoordinator.java:237)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:212)
at org.apache.kafka.connect.runtime.distributed.WorkerGroupMember.poll(WorkerGroupMember.java:147)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder.tick(DistributedHerder.java:286)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder.run(DistributedHerder.java:176)
at java.lang.Thread.run(Thread.java:745)
[2016-07-07 13:22:13,247] INFO Kafka Connect stopping (org.apache.kafka.connect.runtime.Connect:68)

from kafka-connect-hdfs.

ewencp avatar ewencp commented on July 24, 2024

@raju-divakaran Are there any other error messages or stacktraces earlier in the log? It looks like the null pointer is due to the consumer not being allocated yet. We protect against this in WorkerSinkTask.close() but not in WorkerSinkTask.stop(). But as far as I can tell, this shouldn't be a problem because the way the order in which we create & initialize the WorkerSinkTask (the latter half of which creates the consumer) and then add it to the collection of tasks, any calls to Worker.stopTasks shouldn't see the task until the consumer is already allocated. The only path I can see where this wouldn't happen is if there was an exception during WorkerSinkTask.initialize, in which case there should be an error message like Task {} failed initialization and will not be started. and a corresponding stack trace.

from kafka-connect-hdfs.

 avatar commented on July 24, 2024

I have similar issue. Version of confluent 2.0.1.

  1. Start worker in distributed mode in interactive mode (not daemon)
  2. add connector with configuration
    { "name": "hdfs-test-ign", "config": { "connector.class": "io.confluent.connect.hdfs.HdfsSinkConnector", "tasks.max": "6", "topics": "test_ign", "hdfs.url": "hdfs://hdfsd:8020", "hadoop.conf.dir": "/confluent/hadoop/conf", "flush.size": "100", "partitioner.class": "io.confluent.connect.hdfs.partitioner.TimeBasedPartitioner", "locale": "us", "partition.duration.ms": "86400000", "path.format": "YYYY-MM-dd", "timezone": "US/Eastern", "rotate.interval.ms": "60000" } }
  3. send some messages
  4. stop worker (Ctrl+C)
  5. start worker again.

After some restarts I get error and the only way to make it work again is to delete config and offset topics.
Worker config as well as output log are attached.
connect-avro-distributed.properties.txt
distributed_consumer_fail_log.txt

from kafka-connect-hdfs.

 avatar commented on July 24, 2024

I found out what causes this error. It turns out that config topic had more than one partition. Once I created topic with one partition, the problem disappeared.

from kafka-connect-hdfs.

blbradley avatar blbradley commented on July 24, 2024

@Tseretyan Thanks for sharing. I believe your information helped with some trouble I had with using connect-standalone.

from kafka-connect-hdfs.

negi-tribhuwan avatar negi-tribhuwan commented on July 24, 2024

I am getting the same error on Kafka-Connect start. I have made the config topic with single partition.
I checked the zookeeper and Kafka. I can produce and consume message on "test" topic.
My setup is single server with Zookeeper, Kafka, Schema registry, Kafka-connect on same server:

I get following exception in log:

2016-10-20 04:42:15,915 - ERROR DistributedHerder - Uncaught exception in herder work thread, exiting:
org.apache.kafka.common.errors.TimeoutException: Timeout expired while fetching topic metadata
2016-10-20 04:42:15,916 - INFO Thread-1 - Kafka Connect stopping
2016-10-20 04:42:15,916 - INFO Thread-1 - Stopping REST server
2016-10-20 04:42:15,917 - DEBUG Thread-2 - stopping org.eclipse.jetty.server.Server@347d46d4

from kafka-connect-hdfs.

cotedm avatar cotedm commented on July 24, 2024

@tnegi7519 your issue looks a bit different than the one reported here. It looks more like your worker is not able to fetch topic metadata which is different than a NullPointerException coming from TopicPartitionWriters getting out of sync. If you are still having trouble with a timeout fetching topic metadata, please open a new issue with more context around the logging and the worker/connector configs and we'll see if we can help on that issue.

from kafka-connect-hdfs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.