Coder Social home page Coder Social logo

kafka-connect-transform-xml's Introduction

Introduction

This project provides transformations for Kafka Connect that will convert XML text to a Kafka Connect struct based on the configured XML schema. This transformation works by dynamically generating JAXB source with XJC with the xjc-kafka-connect-plugin loaded. This allows the transformation to efficiently convert XML to structured data for Kafka connect.

Use it in conjunction with a Source connector that reads XML data, such as from a HTTP REST endpoint.

Transformations

FromXML(Key)

This transformation is used to transform XML in the Value of the input into a JSON struct based on the provided XSD.

Configuration

Name Type Importance Default Value Validator Documentation
schema.path List High Urls to the schemas to load. http and https paths are supported
xjc.options.automatic.name.conflict.resolution.enabled Boolean False
xjc.options.strict.check.enabled Boolean True
xjc.options.verbose.enabled Boolean False

Standalone Example

transforms=xml_key
transforms.xml_key.type=com.github.jcustenborder.kafka.connect.transform.xml.FromXml$Key
# The following values must be configured.
transforms.xml_key.schema.path = http://web.address/my.xsd

Distributed Example

"transforms": "xml_key",
"transforms.xml_key.type": "com.github.jcustenborder.kafka.connect.transform.xml.FromXml$Key",
"transforms.xml_key.schema.path": "http://web.address/my.xsd"

FromXML(Value)

This transformation is used to transform XML in the Value of the input into a JSON struct based on the provided XSD.

Configuration

Name Type Importance Default Value Validator Documentation
schema.path List High Urls to the schemas to load. http and https paths are supported
xjc.options.automatic.name.conflict.resolution.enabled Boolean False
xjc.options.strict.check.enabled Boolean True
xjc.options.verbose.enabled Boolean False

Standalone Example

transforms=xml_value
transforms.xml_value.type=com.github.jcustenborder.kafka.connect.transform.xml.FromXml$Value
# The following values must be configured.
transforms.xml_value.schema.path = http://web.address/my.xsd

Distributed Example

"transforms": "xml_value",
"transforms.xml_value.type": "com.github.jcustenborder.kafka.connect.transform.xml.FromXml$Value",
"transforms.xml_value.schema.path": "< Configure me >"

kafka-connect-transform-xml's People

Contributors

jcustenborder avatar rmoff avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

kafka-connect-transform-xml's Issues

Java NullPointerException while using XML transform

Hi, we are trying to use Kafka connect xml transform to convert xml (https://www.confluent.io/hub/jcustenborder/kafka-connect-transform-xml), so that the xml message can be parsed as avro in confluent.
We use Confluent 5.5.0, java jdk1.8.0_252
We try from a simple xml file, however we encountered an error with this error stack:
[2021-02-05 15:05:24,715] ERROR Failed to start task XML_TEST_TRANSFORM-0 (org.apache.kafka.connect.runtime.Worker)
org.apache.kafka.connect.errors.ConnectException: java.lang.NullPointerException
at org.apache.kafka.connect.runtime.ConnectorConfig.transformations(ConnectorConfig.java:264)
at org.apache.kafka.connect.runtime.Worker.buildWorkerTask(Worker.java:520)
at org.apache.kafka.connect.runtime.Worker.startTask(Worker.java:472)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder.startTask(DistributedHerder.java:1147)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder.access$1600(DistributedHerder.java:126)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder$12.call(DistributedHerder.java:1162)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder$12.call(DistributedHerder.java:1158)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException
at com.github.jcustenborder.kafka.connect.transform.xml.XSDCompiler.compileContext(XSDCompiler.java:146)
at com.github.jcustenborder.kafka.connect.transform.xml.FromXml.configure(FromXml.java:130)
at org.apache.kafka.connect.runtime.ConnectorConfig.transformations(ConnectorConfig.java:261)
... 10 more

No additional trace available in the log (root logger alrady set to TRACE),
I wonder what causes the error, any idea regarding this error/ get more verbosity of the error appreciated
Config
xmlfile
xsdschema

Unable to compile project

Hi,

I'm trying to compile the project (master branch) but are unable to do so. With a fresh maven installation the error says
[ERROR] Failed to execute goal on project kafka-connect-transform-xml: Could not resolve dependencies for project com.github.jcustenborder.kafka.connect:kafka-connect-transform-xml:jar:0.1.0-SNAPSHOT: Failed to collect dependencies at org.apache.kafka:connect-api:jar:2.2.1-cp1: Failed to read artifact descriptor for org.apache.kafka:connect-api:jar:2.2.1-cp1: Could not transfer artifact org.apache.kafka:connect-api:pom:2.2.1-cp1 from/to maven-default-http-blocker (http://0.0.0.0/): Blocked mirror for repositories: [confluent (http://packages.confluent.io/maven/, default, releases+snapshots)] -> [Help 1]

This seems to be caused by an old parent pom. I've tried several newer versions without success. I assume the way forward is to pick the latest version, so I tried to get that one to work. The XSDCompiler seems to have a dependency on guava 24+ but the project has version 18.0 attached. After fixing the code some testcases fail. I'm unsure if this is due to my code change, or if the testcases didn't run in the first place. Skipping the testcases results in a failed javadoc plugin.

I'm willing to create a PR and try to fix those issues, but need some help to make sure i'm running into the right direction. Any advice would be appreciated.

java.lang.ClassNotFoundException: com.sun.xml.internal.bind.v2.ContextFactory

Hi,

i get this Exception any ideas what could be the problem?

thx
`
[2021-10-19 08:07:53,320] ERROR Failed to start task mq-source_test_02-0 (org.apache.kafka.connect.runtime.Worker)
connect | java.lang.AssertionError: javax.xml.bind.JAXBException: Implementation of JAXB-API has not been found on module path or classpath.
connect | - with linked exception:
connect | [java.lang.ClassNotFoundException: com.sun.xml.internal.bind.v2.ContextFactory]
connect | at com.sun.tools.xjc.reader.xmlschema.bindinfo.BindInfo.getCustomizationContext(BindInfo.java:356)
connect | at com.sun.tools.xjc.reader.xmlschema.bindinfo.BindInfo.getCustomizationUnmarshaller(BindInfo.java:362)
connect | at com.sun.tools.xjc.reader.xmlschema.bindinfo.AnnotationParserFactoryImpl$1.(AnnotationParserFactoryImpl.java:85)
connect | at com.sun.tools.xjc.reader.xmlschema.bindinfo.AnnotationParserFactoryImpl.create(AnnotationParserFactoryImpl.java:84)
connect | at com.sun.xml.xsom.impl.parser.NGCCRuntimeEx.createAnnotationParser(NGCCRuntimeEx.java:401)

`

Premature end of file

Premature end of file on the XML file.

`ERROR [xml-source|task-0] WorkerSourceTask{id=xml-source-0} Task threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask:186)
org.apache.kafka.connect.errors.ConnectException: Tolerance exceeded in error handler
        at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:178)
        at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:104)
        at org.apache.kafka.connect.runtime.TransformationChain.apply(TransformationChain.java:50)
        at org.apache.kafka.connect.runtime.WorkerSourceTask.sendRecords(WorkerSourceTask.java:320)
        at org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:245)
        at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:184)
        at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:234)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.kafka.connect.errors.DataException: Exception thrown while processing xml
        at com.github.jcustenborder.kafka.connect.transform.xml.FromXml.processString(FromXml.java:82)
        at com.github.jcustenborder.kafka.connect.utils.transformation.BaseKeyValueTransformation.process(BaseKeyValueTransformation.java:152)
        at com.github.jcustenborder.kafka.connect.transform.xml.FromXml$Value.apply(FromXml.java:172)
        at org.apache.kafka.connect.runtime.TransformationChain.lambda$apply$0(TransformationChain.java:50)
        at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:128)
        at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:162)
        ... 11 more
Caused by: javax.xml.bind.UnmarshalException
 - with linked exception:
[org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 40; Premature end of file.]
        at javax.xml.bind.helpers.AbstractUnmarshallerImpl.createUnmarshalException(AbstractUnmarshallerImpl.java:335)
        at com.sun.xml.bind.v2.runtime.unmarshaller.UnmarshallerImpl.createUnmarshalException(UnmarshallerImpl.java:578)
        at com.sun.xml.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(UnmarshallerImpl.java:264)
        at com.sun.xml.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal(UnmarshallerImpl.java:229)
        at javax.xml.bind.helpers.AbstractUnmarshallerImpl.unmarshal(AbstractUnmarshallerImpl.java:157)
        at javax.xml.bind.helpers.AbstractUnmarshallerImpl.unmarshal(AbstractUnmarshallerImpl.java:214)
        at com.github.jcustenborder.kafka.connect.transform.xml.FromXml.processString(FromXml.java:79)
        ... 16 more
Caused by: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 40; Premature end of file.
        at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:203)
        at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:177)
        at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:400)
        at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:327)
        at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1472)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(XMLDocumentScannerImpl.java:1014)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:602)
        at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:112)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:505)
        at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:842)
        at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:771)
        at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
        at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213)
        at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:643)
        at com.sun.xml.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(UnmarshallerImpl.java:258)
        ... 20 more`

XML File : https://github.com/jcustenborder/kafka-connect-examples/blob/master/activemq-xml/books.xml
XSD File : https://raw.githubusercontent.com/jcustenborder/kafka-connect-examples/master/activemq-xml/books.xsd

Connector config :
{ "name": "xml-source", "config": { "connector.class": "FileStreamSource", "name": "xml-source", "kafka.topic": "output", "confluent.topic.bootstrap.servers": "kafka:9092", "confluent.topic.replication.factor": "1", "tasks.max": "1", "file": "/books.xml", "value.converter": "org.apache.kafka.connect.storage.StringConverter", "key.converter": "org.apache.kafka.connect.storage.StringConverter", "topic": "output", "transforms": "FromXml", "transforms.FromXml.type": "com.github.jcustenborder.kafka.connect.transform.xml.FromXml$Value", "transforms.FromXml.schema.path": "file:///books.xsd" } }

NullPointerException after reading a large XSD

Versions:
jcustenborder/kafka-connect-transform-xml:0.1.0.18

  2020-10-08 16:59:03,806 ERROR Failed to start task kafka.ibm-mq.source.connector.tomo-0 (org.apache.kafka.connect.runtime.Worker) [pool-5-thread-8]
  org.apache.kafka.connect.errors.ConnectException: java.lang.NullPointerException
  	at org.apache.kafka.connect.runtime.ConnectorConfig.transformations(ConnectorConfig.java:264)
  	at org.apache.kafka.connect.runtime.Worker.buildWorkerTask(Worker.java:513)
  	at org.apache.kafka.connect.runtime.Worker.startTask(Worker.java:465)
  	at org.apache.kafka.connect.runtime.distributed.DistributedHerder.startTask(DistributedHerder.java:1140)
  	at org.apache.kafka.connect.runtime.distributed.DistributedHerder.access$1700(DistributedHerder.java:125)
  	at org.apache.kafka.connect.runtime.distributed.DistributedHerder$13.call(DistributedHerder.java:1155)
  	at org.apache.kafka.connect.runtime.distributed.DistributedHerder$13.call(DistributedHerder.java:1151)
  	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
  	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  	at java.lang.Thread.run(Thread.java:748)
  Caused by: java.lang.NullPointerException
  	at com.github.jcustenborder.kafka.connect.transform.xml.XSDCompiler.compileContext(XSDCompiler.java:146)
  	at com.github.jcustenborder.kafka.connect.transform.xml.FromXml.configure(FromXml.java:130)
  	at org.apache.kafka.connect.runtime.ConnectorConfig.transformations(ConnectorConfig.java:261)
  	... 10 more

Trace with:

log4j.logger.com.github.jcustenborder: TRACE
log4j.logger.org.apache.kafka.connect: TRACE

File: ibm-mq-xml-transformation.log

Using [ibm-mq-source-connector](https://github.com/ibm-messaging/kafka-connect-mq-source):
curl -i -X PUT -H "Accept:application/json" \
    -H  "Content-Type:application/json" http://localhost:8083/connectors/kafka.ibm-mq.source.connector.tomo/config \
    -d ' {
        "connector.class":"MQSourceConnector",
        "mq.channel.name":"KAFKA.SVRCONN.01",
        "mq.connection.name.list":"server.domain.com",
        "mq.message.body.jms":"true",
        "mq.password":"*****",
        "mq.queue.manager":"IN101D",
        "mq.queue":"KAFKA.TEST",
        "mq.record.builder":"com.ibm.eventstreams.connect.mqsource.builders.DefaultRecordBuilder",
        "mq.user.name":"someUser",
        "tasks.max":"1",
        "topic":"kafka.tomo.2",
        "transforms.fromxml.schema.path":"file:///tmp/LeistungNotification.xsd",
        "transforms.fromxml.type":"com.github.jcustenborder.kafka.connect.transform.xml.FromXml$Value",
        "transforms":"fromxml",
        "value.converter":"StringConverter"
    }'

Referenced XSD's

Hello how are you?

We have got a business scenario where we need to read an XML file and validate against a set of referenced schemas (But not single schema) and transform & stream it into the KAFKA. Basically we need to pick the xml file as it arrives and write it to the KAFKA topic (AVRO Format). How this can be achieved with this code. Also, it would be great if you can provide us the test sample to test this code.

Thanks,
Christopher

java.lang.NoSuchMethodError: com.google.common.io.Files.fileTreeTraverser()Lcom/google/common/collect/TreeTraverser;

@jcustenborder

Am trying to do the below POC:

  1. Source RabbitMQ Message - XML Payload
  2. Transform XML message as Avro/JSON into a Kafka Topic.
  3. Push Kafka topic message into Elastic Search, for a real-time report using Kibana.

Below are the configuration to read the RabbitMQ payload into a Kafka Topic.

connect-standalone.properties
bootstrap.servers=localhost:9092 key.deserializer=org.apache.kafka.common.serialization.KafkaAvroSerializer value.deserializer=org.apache.kafka.common.serialization.KafkaAvroSerializer key.converter=io.confluent.connect.avro.AvroConverter key.converter.schema.registry.url=http://localhost:8084 value.converter=io.confluent.connect.avro.AvroConverter value.converter.schema.registry.url=http://localhost:8084 offset.storage.file.filename=/tmp/connect.offsets offset.flush.interval.ms=10000 plugin.path=/usr/share/java,/usr/share/java/confluent-common

xmlsource.properties
name=RMQXML
connector.class=io.confluent.connect.rabbitmq.RabbitMQSourceConnector
tasks.max=1
kafka.topic=Hello
rabbitmq.queue=testq
rabbitmq.automatic.recovery.enabled = true
rabbitmq.connection.timeout.ms = 60000
rabbitmq.handshake.timeout.ms = 10000
rabbitmq.host = localhost
rabbitmq.network.recovery.interval.ms = 10000
rabbitmq.password = admin123
rabbitmq.port = 5672
rabbitmq.prefetch.count = 500
rabbitmq.prefetch.global = false
rabbitmq.requested.channel.max = 0
rabbitmq.requested.frame.max = 0
rabbitmq.requested.heartbeat.seconds = 60
rabbitmq.shutdown.timeout.ms = 10000
rabbitmq.topology.recovery.enabled = true
rabbitmq.username = guest
rabbitmq.virtual.host = /
transforms=messageField,FromXml
transforms.messageField.type=org.apache.kafka.connect.transforms.ExtractField$Value
#value of the 'Payload' field is the xml
transforms.messageField.field=Payload
transforms.FromXml.type=com.github.jcustenborder.kafka.connect.transform.xml.FromXml$Value
transforms.FromXml.schema.path=file:/etc/kafka-connect-xml/test.xsd

Error while trying to pull a xml payload from RabbitMQ into Kafka Topic

[2019-07-19 16:48:38,397] INFO Set up the value converter class io.confluent.connect.avro.AvroConverter for task RMQXML-0 using the worker config (org.apache.kafka.connect.runtime.Worker:430)
[2019-07-19 16:48:38,397] INFO Set up the header converter class org.apache.kafka.connect.storage.SimpleHeaderConverter for task RMQXML-0 using the worker config (org.apache.kafka.connect.runtime.Worker:436)
[2019-07-19 16:48:38,407] INFO FromXmlConfig values:
        schema.path = [file:/etc/kafka-connect-xml/test.xsd]
 (com.github.jcustenborder.kafka.connect.transform.xml.FromXmlConfig:279)
[2019-07-19 16:48:38,418] INFO Loading schema from file:/etc/kafka-connect-xml/test.xsd (com.github.jcustenborder.kafka.connect.transform.xml.XSDCompiler:242)
[2019-07-19 16:48:38,449] INFO compileContext() - Generating source for file:/etc/kafka-connect-xml/test.xsd (com.github.jcustenborder.kafka.connect.transform.xml.XSDCompiler:76)
xsd50861ee5f6c9ec7ee82f69ef31004d01d2a501e3/ObjectFactory.java
xsd50861ee5f6c9ec7ee82f69ef31004d01d2a501e3/Student.java

2019-07-19 15:53:11,440] ERROR Failed to start task RMQXML-0 (org.apache.kafka.connect.runtime.Worker:445)
java.lang.NoSuchMethodError: com.google.common.io.Files.fileTreeTraverser()Lcom/google/common/collect/TreeTraverser;
        at com.github.jcustenborder.kafka.connect.transform.xml.XSDCompiler.compileContext(XSDCompiler.java:117)
        at com.github.jcustenborder.kafka.connect.transform.xml.FromXml.configure(FromXml.java:95)
        at org.apache.kafka.connect.runtime.ConnectorConfig.transformations(ConnectorConfig.java:256)
        at org.apache.kafka.connect.runtime.Worker.buildWorkerTask(Worker.java:485)
        at org.apache.kafka.connect.runtime.Worker.startTask(Worker.java:441)
        at org.apache.kafka.connect.runtime.standalone.StandaloneHerder.createConnectorTasks(StandaloneHerder.java:307)
        at org.apache.kafka.connect.runtime.standalone.StandaloneHerder.updateConnectorTasks(StandaloneHerder.java:332)
        at org.apache.kafka.connect.runtime.standalone.StandaloneHerder.putConnectorConfig(StandaloneHerder.java:210)
        at org.apache.kafka.connect.cli.ConnectStandalone.main(ConnectStandalone.java:113)

Any help or direction would be appreciated. Does any configuration needs addressed?

Thanks,
Kesavan

Class not found although plugin loaded

Thanks for your work on this plugin! I'm trying the XML transform but am unable to get it running. Any ideas what I might be doing wrong?

I POST this to http://my-connect:8083/connectors:

{
    "name": "file-source",
    "config": {
		"connector.class": "FileStreamSource",
		"tasks.max": "1",
		"file": "/tmp/test.xml",
		"name": "file-source",
                 "topic": "mytopic",
                 "value.converter": "org.apache.kafka.connect.storage.StringConverter",
                 "key.converter": "org.apache.kafka.connect.storage.StringConverter",
                 "transforms": "FromXml",
		"transforms.FromXml.type": "com.github.jcustenborder.kafka.connect.transform.xml.FromXml$Value",
		"transforms.FromXml.schema.path": "https://my-schema/schema.xsd"
    }
}

This results in the following response:

{
    "error_code": 400,
    "message": "Connector configuration is invalid and contains the following 2 error(s):\nInvalid value com.github.jcustenborder.kafka.connect.transform.xml.FromXml$Value for configuration transforms.FromXml.type: Class com.github.jcustenborder.kafka.connect.transform.xml.FromXml$Value could not be found.\nInvalid value null for configuration transforms.FromXml.type: Not a Transformation\nYou can also find the above list of errors at the endpoint `/{connectorType}/config/validate`"
}

I'm installing the plugin like this in a Dockerfile:

FROM confluentinc/cp-kafka-connect:5.3.1

RUN confluent-hub install --no-prompt jcustenborder/kafka-connect-transform-xml:0.1.0.12

In the logs, I can see the plugin loading:

[2019-10-10 10:23:11,721] INFO Loading plugin from: /usr/share/confluent-hub-components/jcustenborder-kafka-connect-transform-xml (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
[2019-10-10 10:23:11,993] INFO Registered loader: PluginClassLoader{pluginLocation=file:/usr/share/confluent-hub-components/jcustenborder-kafka-connect-transform-xml/} (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
[2019-10-10 10:23:11,993] INFO Added plugin 'com.github.jcustenborder.kafka.connect.transform.common.PatternRename$Key' (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
[2019-10-10 10:23:11,993] INFO Added plugin 'com.github.jcustenborder.kafka.connect.transform.common.BytesToString$Value' (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
[2019-10-10 10:23:11,993] INFO Added plugin 'com.github.jcustenborder.kafka.connect.transform.common.ExtractNestedField$Value' (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
[2019-10-10 10:23:11,993] INFO Added plugin 'com.github.jcustenborder.kafka.connect.transform.common.ChangeCase$Value' (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
[2019-10-10 10:23:11,993] INFO Added plugin 'com.github.jcustenborder.kafka.connect.transform.common.ChangeCase$Key' (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
[2019-10-10 10:23:11,993] INFO Added plugin 'com.github.jcustenborder.kafka.connect.transform.common.ToJSON$Key' (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
[2019-10-10 10:23:11,993] INFO Added plugin 'com.github.jcustenborder.kafka.connect.transform.xml.FromXml$Key' (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
[2019-10-10 10:23:11,993] INFO Added plugin 'com.github.jcustenborder.kafka.connect.transform.xml.FromXml$Value' (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
[2019-10-10 10:23:11,994] INFO Added plugin 'com.github.jcustenborder.kafka.connect.transform.common.ToJSON$Value' (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
[2019-10-10 10:23:11,994] INFO Added plugin 'com.github.jcustenborder.kafka.connect.transform.common.BytesToString$Key' (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
[2019-10-10 10:23:11,994] INFO Added plugin 'com.github.jcustenborder.kafka.connect.transform.common.ExtractTimestamp$Value' (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
[2019-10-10 10:23:11,994] INFO Added plugin 'com.github.jcustenborder.kafka.connect.transform.common.ExtractNestedField$Key' (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
[2019-10-10 10:23:11,994] INFO Added plugin 'com.github.jcustenborder.kafka.connect.transform.common.ChangeTopicCase' (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
[2019-10-10 10:23:11,994] INFO Added plugin 'com.github.jcustenborder.kafka.connect.transform.common.PatternRename$Value' (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
[2019-10-10 10:23:13,227] INFO Added alias 'ChangeTopicCase' to plugin 'com.github.jcustenborder.kafka.connect.transform.common.ChangeTopicCase' (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)

If I try to validate the config, I get the transform listed as a recommended value so looks like the transform is loaded:

            "value": {
                "name": "transforms.FromXml.type",
                "value": null,
                "recommended_values": [
                    "com.github.jcustenborder.kafka.connect.transform.common.BytesToString$Key",
                    "com.github.jcustenborder.kafka.connect.transform.common.BytesToString$Value",
                    "com.github.jcustenborder.kafka.connect.transform.common.ChangeCase$Key",
                    "com.github.jcustenborder.kafka.connect.transform.common.ChangeCase$Value",
                    "com.github.jcustenborder.kafka.connect.transform.common.ChangeTopicCase",
                    "com.github.jcustenborder.kafka.connect.transform.common.ExtractNestedField$Key",
                    "com.github.jcustenborder.kafka.connect.transform.common.ExtractNestedField$Value",
                    "com.github.jcustenborder.kafka.connect.transform.common.ExtractTimestamp$Value",
                    "com.github.jcustenborder.kafka.connect.transform.common.PatternRename$Key",
                    "com.github.jcustenborder.kafka.connect.transform.common.PatternRename$Value",
                    "com.github.jcustenborder.kafka.connect.transform.common.ToJSON$Key",
                    "com.github.jcustenborder.kafka.connect.transform.common.ToJSON$Value",
                    "com.github.jcustenborder.kafka.connect.transform.xml.FromXml$Key",
                    "com.github.jcustenborder.kafka.connect.transform.xml.FromXml$Value",
                    "org.apache.kafka.connect.transforms.Cast$Key",
                    "org.apache.kafka.connect.transforms.Cast$Value",
                    "org.apache.kafka.connect.transforms.ExtractField$Key",
                    "org.apache.kafka.connect.transforms.ExtractField$Value",
                    "org.apache.kafka.connect.transforms.Flatten$Key",
                    "org.apache.kafka.connect.transforms.Flatten$Value",
                    "org.apache.kafka.connect.transforms.HoistField$Key",
                    "org.apache.kafka.connect.transforms.HoistField$Value",
                    "org.apache.kafka.connect.transforms.InsertField$Key",
                    "org.apache.kafka.connect.transforms.InsertField$Value",
                    "org.apache.kafka.connect.transforms.MaskField$Key",
                    "org.apache.kafka.connect.transforms.MaskField$Value",
                    "org.apache.kafka.connect.transforms.RegexRouter",
                    "org.apache.kafka.connect.transforms.ReplaceField$Key",
                    "org.apache.kafka.connect.transforms.ReplaceField$Value",
                    "org.apache.kafka.connect.transforms.SetSchemaMetadata$Key",
                    "org.apache.kafka.connect.transforms.SetSchemaMetadata$Value",
                    "org.apache.kafka.connect.transforms.TimestampConverter$Key",
                    "org.apache.kafka.connect.transforms.TimestampConverter$Value",
                    "org.apache.kafka.connect.transforms.TimestampRouter",
                    "org.apache.kafka.connect.transforms.ValueToKey"
                ],
                "errors": [
                    "Invalid value com.github.jcustenborder.kafka.connect.transform.xml.FromXml$Value for configuration transforms.FromXml.type: Class com.github.jcustenborder.kafka.connect.transform.xml.FromXml$Value could not be found.",
                    "Invalid value null for configuration transforms.FromXml.type: Not a Transformation"
                ],
                "visible": true
            }
        },

Thanks!

Many xsd's Referenced

Hi people,

First, I want to thanks about this connector.

I have a question about the connector.

I have many xml payloads arriving in kafka topic, and I need to transform into json to send elasticsearch with sink-connector.

The problem is that I have many schemas and not One..

The connector solve this problem?

Thanks a lot

Unable to parse XSD file. Gatting exceptions

WhatsApp Image 2020-07-17 at 11 06 08 AM

Hi Dev Team,
I am a petty new in the Kafka world. I have a requirement where we are reading data from IBM Mq topics. This data have some header and footer values and the body contains XML data in it. By using the IBM MQ source connector we are able to read XML as a string. Now we want to convert this into Avro format. I have cloned your repository and trying to run Junits but I am getting an error while parsing xml file. Please find attached the stack trace for the same.

org.xml.sax.SAXParseException; Exception thrown while processing: Exception thrown while building field / org.apache.kafka.connect.errors.ConnectException: Schema compiler could not bind schema

Using this XML and an XSD auto-generated, I get a Transform failure

Fails

        transforms = [xml]
        transforms.xml.negate = false
        transforms.xml.package = com.github.jcustenborder.kafka.connect.transform.xml.model
        transforms.xml.predicate =
        transforms.xml.schema.path = [https://rmoff.net/files/livecyclehireupdates.xsd]
        transforms.xml.type = class com.github.jcustenborder.kafka.connect.transform.xml.FromXml$Value
        transforms.xml.xjc.options.automatic.name.conflict.resolution.enabled = true
        transforms.xml.xjc.options.strict.check.enabled = true
        transforms.xml.xjc.options.verbose.enabled = true
        value.converter = null
 (org.apache.kafka.connect.runtime.ConnectorConfig$EnrichedConnectorConfig:354)
[2020-09-28 14:00:03,127] INFO [source-http-xml-02|task-0] FromXmlConfig values:
        package = com.github.jcustenborder.kafka.connect.transform.xml.model
        schema.path = [https://rmoff.net/files/livecyclehireupdates.xsd]
        xjc.options.automatic.name.conflict.resolution.enabled = true
        xjc.options.strict.check.enabled = true
        xjc.options.verbose.enabled = true
 (com.github.jcustenborder.kafka.connect.transform.xml.FromXmlConfig:354)
[2020-09-28 14:00:03,128] INFO [source-http-xml-02|task-0] compileContext() - Generating source for https://rmoff.net/files/livecyclehireupdates.xsd (com.github.jcustenborder.kafka.connect.t
ransform.xml.XSDCompiler:99)
[2020-09-28 14:00:03,593] ERROR [source-http-xml-02|task-0] Error (com.github.jcustenborder.kafka.connect.transform.xml.XSDCompiler:31)
org.xml.sax.SAXParseException; Exception thrown while processing: Exception thrown while building field 'station'. Stations
        at com.github.jcustenborder.kafka.connect.xml.KafkaConnectPlugin.run(KafkaConnectPlugin.java:486)
        at com.sun.tools.xjc.model.Model.generateCode(Model.java:292)
        at com.sun.tools.xjc.api.impl.s2j.SchemaCompilerImpl.bind(SchemaCompilerImpl.java:284)
        at com.sun.tools.xjc.api.impl.s2j.SchemaCompilerImpl.bind(SchemaCompilerImpl.java:95)
        at com.github.jcustenborder.kafka.connect.transform.xml.XSDCompiler.compileContext(XSDCompiler.java:106)
        at com.github.jcustenborder.kafka.connect.transform.xml.FromXml.configure(FromXml.java:130)
        at org.apache.kafka.connect.runtime.ConnectorConfig.transformations(ConnectorConfig.java:285)
        at org.apache.kafka.connect.runtime.Worker.buildWorkerTask(Worker.java:605)
        at org.apache.kafka.connect.runtime.Worker.startTask(Worker.java:555)
        at org.apache.kafka.connect.runtime.distributed.DistributedHerder.startTask(DistributedHerder.java:1251)
        at org.apache.kafka.connect.runtime.distributed.DistributedHerder.access$1700(DistributedHerder.java:127)
        at org.apache.kafka.connect.runtime.distributed.DistributedHerder$10.call(DistributedHerder.java:1266)
        at org.apache.kafka.connect.runtime.distributed.DistributedHerder$10.call(DistributedHerder.java:1262)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.lang.IllegalStateException: Exception thrown while building field 'station'. Stations
        at com.github.jcustenborder.kafka.connect.xml.KafkaConnectPlugin.field(KafkaConnectPlugin.java:420)
        at com.github.jcustenborder.kafka.connect.xml.KafkaConnectPlugin.fields(KafkaConnectPlugin.java:452)
        at com.github.jcustenborder.kafka.connect.xml.KafkaConnectPlugin.fields(KafkaConnectPlugin.java:463)
        at com.github.jcustenborder.kafka.connect.xml.KafkaConnectPlugin.run(KafkaConnectPlugin.java:477)
        ... 16 more
Caused by: java.util.ConcurrentModificationException
        at java.base/java.util.HashMap.computeIfAbsent(HashMap.java:1134)
        at com.github.jcustenborder.kafka.connect.xml.KafkaConnectPlugin.type(KafkaConnectPlugin.java:293)
        at com.github.jcustenborder.kafka.connect.xml.KafkaConnectPlugin.field(KafkaConnectPlugin.java:401)
        ... 19 more
[2020-09-28 14:00:03,594] ERROR [source-http-xml-02|task-0] Failed to start task source-http-xml-02-0 (org.apache.kafka.connect.runtime.Worker:560)
org.apache.kafka.connect.errors.ConnectException: org.apache.kafka.connect.errors.ConnectException: Schema compiler could not bind schema.
        at org.apache.kafka.connect.runtime.ConnectorConfig.transformations(ConnectorConfig.java:296)
        at org.apache.kafka.connect.runtime.Worker.buildWorkerTask(Worker.java:605)
        at org.apache.kafka.connect.runtime.Worker.startTask(Worker.java:555)
        at org.apache.kafka.connect.runtime.distributed.DistributedHerder.startTask(DistributedHerder.java:1251)
        at org.apache.kafka.connect.runtime.distributed.DistributedHerder.access$1700(DistributedHerder.java:127)
        at org.apache.kafka.connect.runtime.distributed.DistributedHerder$10.call(DistributedHerder.java:1266)
        at org.apache.kafka.connect.runtime.distributed.DistributedHerder$10.call(DistributedHerder.java:1262)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: org.apache.kafka.connect.errors.ConnectException: Schema compiler could not bind schema.
        at com.github.jcustenborder.kafka.connect.transform.xml.XSDCompiler.compileContext(XSDCompiler.java:109)
        at com.github.jcustenborder.kafka.connect.transform.xml.FromXml.configure(FromXml.java:130)
        at org.apache.kafka.connect.runtime.ConnectorConfig.transformations(ConnectorConfig.java:285)
        ... 10 more

If you install the HTTP source connector you can run this:

curl -i -X PUT -H "Accept:application/json" \
    -H  "Content-Type:application/json" http://localhost:8083/connectors/source-http-xml-02/config \
    -d ' {
        "connector.class": "com.github.castorm.kafka.connect.http.HttpSourceConnector",
        "tasks.max": "1",
        "http.request.url": "https://tfl.gov.uk/tfl/syndication/feeds/cycle-hire/livecyclehireupdates.xml",
        "http.timer.interval.millis": "600000",
        "kafka.topic": "livecyclehireupdates",
        "transforms": "xml",
        "transforms.xml.type": "com.github.jcustenborder.kafka.connect.transform.xml.FromXml$Value",
        "transforms.xml.schema.path": "https://rmoff.net/files/livecyclehireupdates.xsd",
        "transforms.xml.xjc.options.verbose.enabled": "true",
        "transforms.xml.xjc.options.automatic.name.conflict.resolution.enabled":"true"
    }'

Versions:

  • Confluent Platform 6.0
  • jcustenborder/kafka-connect-transform-xml:0.1.0.18

java.lang.StackOverflowError when using a large xsd file for transformation from xml to json

I am trying to start a new connector which reads an xml from a mqtt broker and transforms the xml into json and sends the json to a kafka topic.

Connector configuration:
name=mqtt-xml-to-kafka-json
connector.class=io.confluent.connect.mqtt.MqttSourceConnector
tasks.max=1
mqtt.server.uri=tcp://localhost:1883
mqtt.topics=aws/+/DATA/+/json
kafka.topic=aws-data-json
confluent.topic.bootstrap.servers=localhost:9092
confluent.topic.replication.factor=1
transforms=xml
transforms.xml.type=com.github.jcustenborder.kafka.connect.transform.xml.FromXml$Value
transforms.xml.schema.path=file:///home/dev/schema.xsd
value.converter=org.apache.kafka.connect.json.JsonConverter
value.converter.schemas.enable=false

Starting the connector with the below command:
sh connect-standalone.sh /home/Softwares/kafka/kafka_2.13-2.7.0/config/connect-standalone.properties /home/Softwares/kafka/kafka_2.13-2.7.0/dev/mqtt-xml-to-kafka-json.properties

The xsd file size is 137 KB.
The connector works fine for small xsd files and the xml gets converted to json and received at the kafka topic.
But when the connector is started with this xsd file, it fails with the below Stackoverflow error.
I have validated the xml against the xsd and they are compatible.

[2021-08-27 14:13:01,306] INFO EnrichedConnectorConfig values:
config.action.reload = restart
connector.class = io.confluent.connect.mqtt.MqttSourceConnector
errors.log.enable = false
errors.log.include.messages = false
errors.retry.delay.max.ms = 60000
errors.retry.timeout = 0
errors.tolerance = all
header.converter = null
key.converter = null
name = mqtt-xml-to-kafka-json
predicates = []
tasks.max = 1
topic.creation.groups = []
transforms = [xml]
transforms.xml.negate = false
transforms.xml.package = com.github.jcustenborder.kafka.connect.transform.xml.model
transforms.xml.predicate =
transforms.xml.schema.path = [file:///home/dev/schema.xsd]
transforms.xml.type = class com.github.jcustenborder.kafka.connect.transform.xml.FromXml$Value
transforms.xml.xjc.options.automatic.name.conflict.resolution.enabled = false
transforms.xml.xjc.options.strict.check.enabled = true
transforms.xml.xjc.options.verbose.enabled = false
value.converter = class org.apache.kafka.connect.json.JsonConverter
(org.apache.kafka.connect.runtime.ConnectorConfig$EnrichedConnectorConfig:361)
[2021-08-27 14:13:01,307] INFO FromXmlConfig values:
package = com.github.jcustenborder.kafka.connect.transform.xml.model
schema.path = [file:///home/dev/schema.xsd]
xjc.options.automatic.name.conflict.resolution.enabled = false
xjc.options.strict.check.enabled = true
xjc.options.verbose.enabled = false
(com.github.jcustenborder.kafka.connect.transform.xml.FromXmlConfig:361)
[2021-08-27 14:13:01,330] INFO compileContext() - Generating source for file:/home/dev/schema.xsd
(com.github.jcustenborder.kafka.connect.transform.xml.XSDCompiler:99)
[2021-08-27 14:13:02,383] ERROR Failed to start task mqtt-xml-to-kafka-json-0
(org.apache.kafka.connect.runtime.Worker:560)
java.lang.StackOverflowError
at com.sun.codemodel.JNarrowedClass.fullName(JNarrowedClass.java:110)
at com.sun.codemodel.JNarrowedClass.fullName(JNarrowedClass.java:118)
at com.sun.codemodel.JNarrowedClass.equals(JNarrowedClass.java:212)
at com.github.jcustenborder.kafka.connect.xml.KafkaConnectPlugin.lambda$type$1(KafkaConnectPlugin.java:312)
at java.util.HashMap.computeIfAbsent(HashMap.java:1127)
at com.github.jcustenborder.kafka.connect.xml.KafkaConnectPlugin.type(KafkaConnectPlugin.java:293)
at com.github.jcustenborder.kafka.connect.xml.KafkaConnectPlugin.lambda$type$1(KafkaConnectPlugin.java:323)
at java.util.HashMap.computeIfAbsent(HashMap.java:1127)
at com.github.jcustenborder.kafka.connect.xml.KafkaConnectPlugin.type(KafkaConnectPlugin.java:293)
at com.github.jcustenborder.kafka.connect.xml.KafkaConnectPlugin.lambda$type$1(KafkaConnectPlugin.java:323)
at java.util.HashMap.computeIfAbsent(HashMap.java:1127)
at com.github.jcustenborder.kafka.connect.xml.KafkaConnectPlugin.type(KafkaConnectPlugin.java:293)
at com.github.jcustenborder.kafka.connect.xml.KafkaConnectPlugin.lambda$type$1(KafkaConnectPlugin.java:323)
at java.util.HashMap.computeIfAbsent(HashMap.java:1127)
at com.github.jcustenborder.kafka.connect.xml.KafkaConnectPlugin.type(KafkaConnectPlugin.java:293)
at com.github.jcustenborder.kafka.connect.xml.KafkaConnectPlugin.lambda$type$1(KafkaConnectPlugin.java:323)
at java.util.HashMap.computeIfAbsent(HashMap.java:1127)
at com.github.jcustenborder.kafka.connect.xml.KafkaConnectPlugin.type(KafkaConnectPlugin.java:293)
at com.github.jcustenborder.kafka.connect.xml.KafkaConnectPlugin.lambda$type$1(KafkaConnectPlugin.java:323)
at java.util.HashMap.computeIfAbsent(HashMap.java:1127)
at com.github.jcustenborder.kafka.connect.xml.KafkaConnectPlugin.type(KafkaConnectPlugin.java:293)
at com.github.jcustenborder.kafka.connect.xml.KafkaConnectPlugin.lambda$type$1(KafkaConnectPlugin.java:323)
at java.util.HashMap.computeIfAbsent(HashMap.java:1127)
at com.github.jcustenborder.kafka.connect.xml.KafkaConnectPlugin.type(KafkaConnectPlugin.java:293)
at com.github.jcustenborder.kafka.connect.xml.KafkaConnectPlugin.lambda$type$1(KafkaConnectPlugin.java:323)
at java.util.HashMap.computeIfAbsent(HashMap.java:1127)
at com.github.jcustenborder.kafka.connect.xml.KafkaConnectPlugin.type(KafkaConnectPlugin.java:293)
at com.github.jcustenborder.kafka.connect.xml.KafkaConnectPlugin.lambda$type$1(KafkaConnectPlugin.java:323)
at java.util.HashMap.computeIfAbsent(HashMap.java:1127)
at com.github.jcustenborder.kafka.connect.xml.KafkaConnectPlugin.type(KafkaConnectPlugin.java:293)
at com.github.jcustenborder.kafka.connect.xml.KafkaConnectPlugin.lambda$type$1(KafkaConnectPlugin.java:323)
at java.util.HashMap.computeIfAbsent(HashMap.java:1127)
at com.github.jcustenborder.kafka.connect.xml.KafkaConnectPlugin.type(KafkaConnectPlugin.java:293)
at com.github.jcustenborder.kafka.connect.xml.KafkaConnectPlugin.lambda$type$1(KafkaConnectPlugin.java:323)
at java.util.HashMap.computeIfAbsent(HashMap.java:1127)
at com.github.jcustenborder.kafka.connect.xml.KafkaConnectPlugin.type(KafkaConnectPlugin.java:293)
at com.github.jcustenborder.kafka.connect.xml.KafkaConnectPlugin.lambda$type$1(KafkaConnectPlugin.java:323)
at java.util.HashMap.computeIfAbsent(HashMap.java:1127)
at com.github.jcustenborder.kafka.connect.xml.KafkaConnectPlugin.type(KafkaConnectPlugin.java:293)
at com.github.jcustenborder.kafka.connect.xml.KafkaConnectPlugin.lambda$type$1(KafkaConnectPlugin.java:323)
at java.util.HashMap.computeIfAbsent(HashMap.java:1127)

Is there a way to increase the stack memory size when starting the kafka-connect? I could not find any such configuration.

SAXParseException : Premature end of file

I am getting SAXParseException saying Premature end of file.

  • I have tried different schemas and xml data whoever I am getting the same error.
  • I have also ran it on docker and confluent platform setup
  • I ran docker on windows and confluent platfrom on WSL Ubuntu

Link To Docker Compose That I used

https://github.com/confluentinc/demo-scene.git

data.xml

<?xml version="1.0" encoding="UTF-8"?>
<addresses xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <address>
    <name>Joe Tester</name>
    <street>Baker street 5</street>
  </address>
</addresses>

data.xsd

<?xml version="1.0" encoding="utf-8"?>
<!-- Created with Liquid Technologies Online Tools 1.0 (https://www.liquid-technologies.com) -->
<xs:schema xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" attributeFormDefault="unqualified" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="addresses">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="address">
          <xs:complexType>
            <xs:sequence>
              <xs:element name="name" type="xs:string" />
              <xs:element name="street" type="xs:string" />
            </xs:sequence>
          </xs:complexType>
        </xs:element>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>

Error Logs

Caused by: javax.xml.bind.UnmarshalException
 - with linked exception:
[org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 39; Premature end of file.]
        at javax.xml.bind.helpers.AbstractUnmarshallerImpl.createUnmarshalException(AbstractUnmarshallerImpl.java:310)
        at com.sun.xml.bind.v2.runtime.unmarshaller.UnmarshallerImpl.createUnmarshalException(UnmarshallerImpl.java:578)
        at com.sun.xml.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(UnmarshallerImpl.java:264)
        at com.sun.xml.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal(UnmarshallerImpl.java:229)
        at javax.xml.bind.helpers.AbstractUnmarshallerImpl.unmarshal(AbstractUnmarshallerImpl.java:140)
        at javax.xml.bind.helpers.AbstractUnmarshallerImpl.unmarshal(AbstractUnmarshallerImpl.java:189)
        at com.github.jcustenborder.kafka.connect.transform.xml.FromXml.processString(FromXml.java:78)
        ... 16 more
Caused by: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 39; Premature end of file.
        at java.xml/com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:204)
        at java.xml/com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:178)
        at java.xml/com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:400)
        at java.xml/com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:327)
        at java.xml/com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1471)
        at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(XMLDocumentScannerImpl.java:1013)
        at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:605)
        at java.xml/com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:112)
        at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:534)
        at java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:888)
        at java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:824)
        at java.xml/com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
        at java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1216)
        at java.xml/com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:635)
        at com.sun.xml.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(UnmarshallerImpl.java:258)
        ... 20 more

Does anyone knows whats the issue ?

Errors using books.xml : `Exception thrown while processing: Exception thrown while building field 'book'. BooksForm`

version 0.1.0.18

Using the unit test file books.xml and accompanying books.xsd the SMT throws an error.

Config:

curl -i -X PUT -H  "Content-Type:application/json" \
    http://localhost:8083/connectors/source-file-01/config \
    -d '{
    "connector.class": "org.apache.kafka.connect.file.FileStreamSourceConnector",
    "tasks.max": "1",
    "file": "/data/books.xml",
    "topic": "xmltest",
    "transforms": "xml",
    "transforms.xml.type": "com.github.jcustenborder.kafka.connect.transform.xml.FromXml$Value",
    "transforms.xml.schema.path": "file:///data/books.xsd",
    "transforms.xml.xjc.options.verbose.enabled":"true"
    }'

Error

[2020-10-02 09:42:54,358] TRACE [source-file-01b|task-0] Retrieving loaded class 'com.github.jcustenborder.kafka.connect.transform.xml.FromXml$Value' from 'PluginClassLoader{pluginLocation=file:/usr/share/confluent-hub-components/jcustenborder-kafka-connect-transform-xml/}' (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader:403)
[2020-10-02 09:42:54,359] TRACE [source-file-01b|task-0] Retrieving loaded class 'com.github.jcustenborder.kafka.connect.transform.xml.FromXml$Value' from 'PluginClassLoader{pluginLocation=file:/usr/share/confluent-hub-components/jcustenborder-kafka-connect-transform-xml/}' (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader:403)
[2020-10-02 09:42:54,361] INFO [source-file-01b|task-0] EnrichedConnectorConfig values:
        config.action.reload = restart
        connector.class = org.apache.kafka.connect.file.FileStreamSourceConnector
        errors.log.enable = false
        errors.log.include.messages = false
        errors.retry.delay.max.ms = 60000
        errors.retry.timeout = 0
        errors.tolerance = none
        header.converter = null
        key.converter = null
        name = source-file-01b
        predicates = []
        tasks.max = 1
        topic.creation.groups = []
        transforms = [xml]
        transforms.xml.negate = false
        transforms.xml.package = com.github.jcustenborder.kafka.connect.transform.xml.model
        transforms.xml.predicate =
        transforms.xml.schema.path = [file:///data/books.xsd]
        transforms.xml.type = class com.github.jcustenborder.kafka.connect.transform.xml.FromXml$Value
        transforms.xml.xjc.options.automatic.name.conflict.resolution.enabled = false
        transforms.xml.xjc.options.strict.check.enabled = true
        transforms.xml.xjc.options.verbose.enabled = true
        value.converter = null
 (org.apache.kafka.connect.runtime.ConnectorConfig$EnrichedConnectorConfig:354)
[2020-10-02 09:42:54,363] INFO [source-file-01b|task-0] FromXmlConfig values:
        package = com.github.jcustenborder.kafka.connect.transform.xml.model
        schema.path = [file:///data/books.xsd]
        xjc.options.automatic.name.conflict.resolution.enabled = false
        xjc.options.strict.check.enabled = true
        xjc.options.verbose.enabled = true
 (com.github.jcustenborder.kafka.connect.transform.xml.FromXmlConfig:354)
[2020-10-02 09:42:54,364] INFO [source-file-01b|task-0] compileContext() - Generating source for file:/data/books.xsd (com.github.jcustenborder.kafka.connect.transform.xml.XSDCompiler:99)
[2020-10-02 09:42:54,396] TRACE [source-file-01b|task-0] run - BooksForm (com.github.jcustenborder.kafka.connect.xml.KafkaConnectPlugin:475)
[2020-10-02 09:42:54,396] TRACE [source-file-01b|task-0] field() - processing name = 'book' type = 'List<BookForm>' (com.github.jcustenborder.kafka.connect.xml.KafkaConnectPlugin:381)
[2020-10-02 09:42:54,397] ERROR [source-file-01b|task-0] Error (com.github.jcustenborder.kafka.connect.transform.xml.XSDCompiler:31)
org.xml.sax.SAXParseException; Exception thrown while processing: Exception thrown while building field 'book'. BooksForm
        at com.github.jcustenborder.kafka.connect.xml.KafkaConnectPlugin.run(KafkaConnectPlugin.java:486)
        at com.sun.tools.xjc.model.Model.generateCode(Model.java:292)
        at com.sun.tools.xjc.api.impl.s2j.SchemaCompilerImpl.bind(SchemaCompilerImpl.java:284)
        at com.sun.tools.xjc.api.impl.s2j.SchemaCompilerImpl.bind(SchemaCompilerImpl.java:95)
        at com.github.jcustenborder.kafka.connect.transform.xml.XSDCompiler.compileContext(XSDCompiler.java:106)
        at com.github.jcustenborder.kafka.connect.transform.xml.FromXml.configure(FromXml.java:130)
        at org.apache.kafka.connect.runtime.ConnectorConfig.transformations(ConnectorConfig.java:285)
        at org.apache.kafka.connect.runtime.Worker.buildWorkerTask(Worker.java:605)
        at org.apache.kafka.connect.runtime.Worker.startTask(Worker.java:555)
        at org.apache.kafka.connect.runtime.distributed.DistributedHerder.startTask(DistributedHerder.java:1251)
        at org.apache.kafka.connect.runtime.distributed.DistributedHerder.access$1700(DistributedHerder.java:127)
        at org.apache.kafka.connect.runtime.distributed.DistributedHerder$10.call(DistributedHerder.java:1266)
        at org.apache.kafka.connect.runtime.distributed.DistributedHerder$10.call(DistributedHerder.java:1262)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.lang.IllegalStateException: Exception thrown while building field 'book'. BooksForm
        at com.github.jcustenborder.kafka.connect.xml.KafkaConnectPlugin.field(KafkaConnectPlugin.java:420)
        at com.github.jcustenborder.kafka.connect.xml.KafkaConnectPlugin.fields(KafkaConnectPlugin.java:452)
        at com.github.jcustenborder.kafka.connect.xml.KafkaConnectPlugin.fields(KafkaConnectPlugin.java:463)
        at com.github.jcustenborder.kafka.connect.xml.KafkaConnectPlugin.run(KafkaConnectPlugin.java:477)
        ... 16 more
Caused by: java.util.ConcurrentModificationException
        at java.base/java.util.HashMap.computeIfAbsent(HashMap.java:1134)
        at com.github.jcustenborder.kafka.connect.xml.KafkaConnectPlugin.type(KafkaConnectPlugin.java:293)
        at com.github.jcustenborder.kafka.connect.xml.KafkaConnectPlugin.field(KafkaConnectPlugin.java:401)
        ... 19 more
[2020-10-02 09:42:54,398] ERROR [source-file-01b|task-0] Failed to start task source-file-01b-0 (org.apache.kafka.connect.runtime.Worker:560)
org.apache.kafka.connect.errors.ConnectException: org.apache.kafka.connect.errors.ConnectException: Schema compiler could not bind schema.
        at org.apache.kafka.connect.runtime.ConnectorConfig.transformations(ConnectorConfig.java:296)
        at org.apache.kafka.connect.runtime.Worker.buildWorkerTask(Worker.java:605)
        at org.apache.kafka.connect.runtime.Worker.startTask(Worker.java:555)
        at org.apache.kafka.connect.runtime.distributed.DistributedHerder.startTask(DistributedHerder.java:1251)
        at org.apache.kafka.connect.runtime.distributed.DistributedHerder.access$1700(DistributedHerder.java:127)
        at org.apache.kafka.connect.runtime.distributed.DistributedHerder$10.call(DistributedHerder.java:1266)
        at org.apache.kafka.connect.runtime.distributed.DistributedHerder$10.call(DistributedHerder.java:1262)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: org.apache.kafka.connect.errors.ConnectException: Schema compiler could not bind schema.
        at com.github.jcustenborder.kafka.connect.transform.xml.XSDCompiler.compileContext(XSDCompiler.java:109)
        at com.github.jcustenborder.kafka.connect.transform.xml.FromXml.configure(FromXml.java:130)
        at org.apache.kafka.connect.runtime.ConnectorConfig.transformations(ConnectorConfig.java:285)
        ... 10 more

Class Cast Exceptions at Run-time for kafka-connect-transform-xml module

Jeremey,

I would like to document 2 issues that I encountered using the latest code from Kafka-connect-transform-xml and building it locally.

Issue #1: our XSD uses complex types, the generated source code classes from supplied compilation unit information in XSDCompiler.java class would throw an exception during transformations as indicated in the exception below.

connect | [2018-10-16 15:31:47,514] INFO WorkerSourceTask{id=my-simple-connector-0} Source task finished initialization and start (org.apache.kafka.connect.runtime.WorkerSourceTask)
connect | [2018-10-16 15:31:47,620] WARN JDBC type -101 (TIMESTAMP WITH TIME ZONE) not currently supported (io.confluent.connect.jdbc.dialect.OracleDatabaseDialect)
connect | [2018-10-16 15:31:47,789] ERROR Error encountered in task my-simple-connector-0. Executing stage 'TRANSFORMATION' with class 'com.github.jcustenborder.kafka.connect.transform.xml.FromXml$Value'. (org.apache.kafka.connect.runtime.errors.LogReporter)
connect | java.lang.ClassCastException: xsd67e0b5325f0161e74b01ae5c883765296c12e5de.EventInfoChanged cannot be cast to javax.xml.bind.JAXBElement
connect | at com.github.jcustenborder.kafka.connect.transform.xml.FromXml.processString(FromXml.java:65)
connect | at com.github.jcustenborder.kafka.connect.transform.common.BaseTransformation.process(BaseTransformation.java:141)
connect | at com.github.jcustenborder.kafka.connect.transform.xml.FromXml$Value.apply(FromXml.java:136)
connect | at org.apache.kafka.connect.runtime.TransformationChain.lambda$apply$0(TransformationChain.java:44)
connect | at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:128)
connect | at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:162)
connect | at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:104)
connect | at org.apache.kafka.connect.runtime.TransformationChain.apply(TransformationChain.java:44)
connect | at org.apache.kafka.connect.runtime.WorkerSourceTask.sendRecords(WorkerSourceTask.java:292)
connect | at org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:228)
connect | at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:175)
connect | at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:219)
connect | at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
connect | at java.util.concurrent.FutureTask.run(FutureTask.java:266)
connect | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
connect | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
connect | at java.lang.Thread.run(Thread.java:748)

Issue #2:

In order to mitigate the class cast exception at JAXBElement as indicated in issue #1, I have updated code in abstract class FromXML.java in 2 methods used JAXBRetrospective to make it generic to handle Byte and String types with following lines

    @Override
protected SchemaAndValue processBytes(R record, org.apache.kafka.connect.data.Schema inputSchema, byte[] input) {
  try (InputStream inputStream = new ByteArrayInputStream(input)) {
    try (Reader reader = new InputStreamReader(inputStream)) {
      JAXBIntrospector jaxbIntrospector = context.createJAXBIntrospector();
      Object anyObject = this.unmarshaller.unmarshal(reader);
      Connectable jaxbElement = (Connectable) jaxbIntrospector.getValue(anyObject);
      Struct struct = jaxbElement.toConnectStruct();
      return new SchemaAndValue(struct.schema(), struct);
    }
  } catch (IOException | JAXBException e) {
    throw new DataException("Exception thrown while processing xml", e);
  }
}

Same lines of code added to processString() method of FromXML.java class.
With these changes new error came up saying the element cannot caste to Connectable interface

connect | [2018-10-16 16:15:59,455] INFO WorkerSourceTask{id=my-simple-connector-0} Source task finished initialization and start (org.apache.kafka.connect.runtime.WorkerSourceTask)
connect | [2018-10-16 16:15:59,600] WARN JDBC type -101 (TIMESTAMP WITH TIME ZONE) not currently supported (io.confluent.connect.jdbc.dialect.OracleDatabaseDialect)
connect | [2018-10-16 16:15:59,798] ERROR Error encountered in task my-simple-connector-0. Executing stage 'TRANSFORMATION' with class 'com.github.jcustenborder.kafka.connect.transform.xml.FromXml$Value'. (org.apache.kafka.connect.runtime.errors.LogReporter)
connect | java.lang.ClassCastException: xsd67e0b5325f0161e74b01ae5c883765296c12e5de.EventInfoChanged cannot be cast to com.github.jcustenborder.kafka.connect.xml.Connectable
connect | at com.github.jcustenborder.kafka.connect.transform.xml.FromXml.processString(FromXml.java:66)
connect | at com.github.jcustenborder.kafka.connect.transform.common.BaseTransformation.process(BaseTransformation.java:141)
connect | at com.github.jcustenborder.kafka.connect.transform.xml.FromXml$Value.apply(FromXml.java:137)
connect | at org.apache.kafka.connect.runtime.TransformationChain.lambda$apply$0(TransformationChain.java:44)
connect | at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:128)
connect | at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:162)
connect | at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:104)
connect | at org.apache.kafka.connect.runtime.TransformationChain.apply(TransformationChain.java:44)
connect | at org.apache.kafka.connect.runtime.WorkerSourceTask.sendRecords(WorkerSourceTask.java:292)
connect | at org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:228)
connect | at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:175)
connect | at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:219)
connect | at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
connect | at java.util.concurrent.FutureTask.run(FutureTask.java:266)
connect | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
connect | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
connect | at java.lang.Thread.run(Thread.java:748)

Can you please verify by testing complex XSD types to make sure these class cast exceptions can be avoided?

NullPointerException when java compiler is not installed

Great Project :-)

When running in an environment where javac is not installed, a NullPointer Exception is thrown on the following line:

try (StandardJavaFileManager fileManager = javaCompiler.getStandardFileManager(diagnostics, locale, null)) {

It would be great to throw a more meaningful exception such as (java compiler is not installed, therefore we cannot translate the schema to Java classes).

java.io.IOException: Map failed while load testing with KAFKA 3.6.0

I am experiencing a runtime exception while doing a load testing with 10 Million records in my kafka cluster (3 Brokers) and broker crashes every time after this issue is encountered. During my initial analysis I thought the issue could be something to do with the default value of vm.max_map_count, which is 65K, but I have increased it to a higher value of 400K. I have retested the load, the broker crashes again in the middle of processing the range of 3 to 4 million records.

I have a better monitoring in place for the Process and OS sampling and not seeing any performance anomalies during the load testing. Also, the server have a huge amount of disk space left in the data directories. Heap memory is set per broker is 6GB (xmx and xms) of 60 GB RAM Linux Servers.

I have started seeing after the major kafka version upgrade to 3.x. Did anyone experience this issue and do I need to exercise any performance tuning with 3.x testing such high load.

Java Version:

openjdk version "11.0.8" 2020-07-14 LTS
OpenJDK Runtime Environment 18.9 (build 11.0.8+10-LTS)

Error while appending records to LoadTestingTopic in dir /broder/log (org.apache.kafka.storage.internals.log.LogDirFailureChannel)
java.io.IOException: Map failed
at java.base/sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:1016)
at org.apache.kafka.storage.internals.log.AbstractIndex.createMappedBuffer(AbstractIndex.java:466)
at org.apache.kafka.storage.internals.log.AbstractIndex.createAndAssignMmap(AbstractIndex.java:104)
at org.apache.kafka.storage.internals.log.AbstractIndex.(http://AbstractIndex.java:82)
at org.apache.kafka.storage.internals.log.OffsetIndex.(http://OffsetIndex.java:69)
at org.apache.kafka.storage.internals.log.LazyIndex.loadIndex(LazyIndex.java:239)
at org.apache.kafka.storage.internals.log.LazyIndex.get(LazyIndex.java:179)
at kafka.log.LogSegment.offsetIndex(LogSegment.scala:67)
at kafka.log.LogSegment.canConvertToRelativeOffset(LogSegment.scala:130)
at kafka.log.LogSegment.ensureOffsetInRange(LogSegment.scala:177)
at kafka.log.LogSegment.append(LogSegment.scala:157)
at kafka.log.LocalLog.append(LocalLog.scala:439)
at kafka.log.UnifiedLog.append(UnifiedLog.scala:911)
at kafka.log.UnifiedLog.appendAsLeader(UnifiedLog.scala:719)
at kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:1313)
at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:1301)
at kafka.server.ReplicaManager.$anonfun$appendToLocalLog$6(ReplicaManager.scala:1210)
at scala.collection.StrictOptimizedMapOps.map(StrictOptimizedMapOps.scala:28)
at scala.collection.StrictOptimizedMapOps.map$(StrictOptimizedMapOps.scala:27)
at scala.collection.mutable.HashMap.map(HashMap.scala:35)
at kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:1198)
at kafka.server.ReplicaManager.appendRecords(ReplicaManager.scala:754)
at kafka.server.KafkaApis.handleProduceRequest(KafkaApis.scala:686)
at kafka.server.KafkaApis.handle(KafkaApis.scala:180)
at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:149)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.lang.OutOfMemoryError: Map failed
at java.base/sun.nio.ch.FileChannelImpl.map0(Native Method)
at java.base/sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:1013)

ERROR: White spaces are required between publicId and systemId

Version 0.1.0.18

Installed using:

confluent-hub install --no-prompt jcustenborder/kafka-connect-transform-xml:0.1.0.18

Config:

curl -i -X PUT -H  "Content-Type:application/json" http://localhost:8083/connectors/source-file-01/config \
    -d '{
    "connector.class": "org.apache.kafka.connect.file.FileStreamSourceConnector",
    "tasks.max": "1",
    "file": "/tmp.xml",
    "topic": "xmltest",
    "key.converter": "org.apache.kafka.connect.storage.StringConverter",
    "value.converter": "org.apache.kafka.connect.storage.StringConverter",
        "transforms": "xml",
        "transforms.xml.type": "com.github.jcustenborder.kafka.connect.transform.xml.FromXml$Value",
        "transforms.xml.schema.path": "http://datex2.eu/schema/1_0/1_0/DATEXIISchema_1_0_1_0.xsd"
    }'

Transform failed with error org.xml.sax.SAXParseException; systemId: http://datex2.eu/schema/1_0/1_0/DATEXIISchema_1_0_1_0.xsd; lineNumber: 1; columnNumber: 50; White spaces are required between publicId and systemId.

[2020-09-08 14:08:41,647] INFO [source-file-01|task-0] FromXmlConfig values:
   package = com.github.jcustenborder.kafka.connect.transform.xml.model
   schema.path = [http://datex2.eu/schema/1_0/1_0/DATEXIISchema_1_0_1_0.xsd]
   xjc.options.automatic.name.conflict.resolution.enabled = false
   xjc.options.strict.check.enabled = true
   xjc.options.verbose.enabled = false
 (com.github.jcustenborder.kafka.connect.transform.xml.FromXmlConfig:347)
[2020-09-08 14:08:41,699] INFO [source-file-01|task-0] compileContext() - Generating source for http://datex2.eu/schema/1_0/1_0/DATEXIISchema_1_0_1_0.xsd (com.github.jcustenborder.kafka.connect.transform.xml.XSDCompiler:99)
[2020-09-08 14:08:42,278] ERROR [source-file-01|task-0] fatalError (com.github.jcustenborder.kafka.connect.transform.xml.XSDCompiler:36)
org.xml.sax.SAXParseException; systemId: http://datex2.eu/schema/1_0/1_0/DATEXIISchema_1_0_1_0.xsd; lineNumber: 1; columnNumber: 50; White spaces are required between publicId and systemId.
   at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:203)
   at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:177)
   at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:400)
   at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:327)
   at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1472)
   at com.sun.org.apache.xerces.internal.impl.XMLScanner.scanExternalID(XMLScanner.java:1072)
   at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.scanDoctypeDecl(XMLDocumentScannerImpl.java:642)
   at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(XMLDocumentScannerImpl.java:924)
   at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:602)
   at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:112)
   at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:505)
   at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:842)
   at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:771)
   at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
   at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213)
   at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:643)
   at com.sun.tools.xjc.reader.internalizer.DOMForest.parse(DOMForest.java:395)
   at com.sun.tools.xjc.reader.internalizer.DOMForest.parse(DOMForest.java:275)
   at com.sun.tools.xjc.api.impl.s2j.SchemaCompilerImpl.parseSchema(SchemaCompilerImpl.java:158)
   at com.github.jcustenborder.kafka.connect.transform.xml.XSDCompiler.compileContext(XSDCompiler.java:103)
   at com.github.jcustenborder.kafka.connect.transform.xml.FromXml.configure(FromXml.java:130)
   at org.apache.kafka.connect.runtime.ConnectorConfig.transformations(ConnectorConfig.java:264)
   at org.apache.kafka.connect.runtime.Worker.buildWorkerTask(Worker.java:515)
   at org.apache.kafka.connect.runtime.Worker.startTask(Worker.java:467)
   at org.apache.kafka.connect.runtime.distributed.DistributedHerder.startTask(DistributedHerder.java:1186)
   at org.apache.kafka.connect.runtime.distributed.DistributedHerder.access$1600(DistributedHerder.java:127)
   at org.apache.kafka.connect.runtime.distributed.DistributedHerder$12.call(DistributedHerder.java:1201)
   at org.apache.kafka.connect.runtime.distributed.DistributedHerder$12.call(DistributedHerder.java:1197)
   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   at java.lang.Thread.run(Thread.java:748)
org.xml.sax.SAXParseException; systemId: http://datex2.eu/schema/1_0/1_0/DATEXIISchema_1_0_1_0.xsd; lineNumber: 1; columnNumber: 50; White spaces are required between publicId and systemId.
   at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1239)
   at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:643)
   at com.sun.tools.xjc.reader.internalizer.DOMForest.parse(DOMForest.java:395)
   at com.sun.tools.xjc.reader.internalizer.DOMForest.parse(DOMForest.java:275)
   at com.sun.tools.xjc.api.impl.s2j.SchemaCompilerImpl.parseSchema(SchemaCompilerImpl.java:158)
   at com.github.jcustenborder.kafka.connect.transform.xml.XSDCompiler.compileContext(XSDCompiler.java:103)
   at com.github.jcustenborder.kafka.connect.transform.xml.FromXml.configure(FromXml.java:130)
   at org.apache.kafka.connect.runtime.ConnectorConfig.transformations(ConnectorConfig.java:264)
   at org.apache.kafka.connect.runtime.Worker.buildWorkerTask(Worker.java:515)
   at org.apache.kafka.connect.runtime.Worker.startTask(Worker.java:467)
   at org.apache.kafka.connect.runtime.distributed.DistributedHerder.startTask(DistributedHerder.java:1186)
   at org.apache.kafka.connect.runtime.distributed.DistributedHerder.access$1600(DistributedHerder.java:127)
   at org.apache.kafka.connect.runtime.distributed.DistributedHerder$12.call(DistributedHerder.java:1201)
   at org.apache.kafka.connect.runtime.distributed.DistributedHerder$12.call(DistributedHerder.java:1197)
   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   at java.lang.Thread.run(Thread.java:748)

Source XML file:

tmp.xml.zip

No schema file

Would this transform work if I only have the .xml files (I don't have a .XSD file)?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.