Comments (17)
Maybe rather than trying to have all these options on a single operator, maybe Kafka Producer/Consumer operator that handles the low-level properties and MessageHub Producer/Consumer wrappers that are only intended to support MessageHub and have their configuration focussed on that.
So one set focused on connecting directly to Kafka servers and one on connecting to MessageHub.
from streamsx.kafka.
+1 to the idea of MessageHub-only wrappers. This would help make a virtually fool-proof operator for that case.
from streamsx.kafka.
That wasn't what I meant, but instead the app should be able to have the last word in setting properties, so if it sets some_kafka_property='a' then that wins.
Let me see if I understand your point. You are suggesting that app config should primarily be used for setting things that will be the same across applications, such as broker urls or credentials. For other properties that affect how the application works (i.e. acks=all or buffer.size=whatever), the application should be allowed to set (or override) these properties as needed.
This boils down to a couple of things then:
- The property precedence should be changed such that properties set at the application level take precedence over the properties set at the app config level
- Documentation on the best practices when using these operators
from streamsx.kafka.
What would the format of the submission time value be?
from streamsx.kafka.
What would be the method for providing the username/password required in the jaas.conf file for the Message Hub login? One solution that I haven't tried out might be to leverage the new sasl.jaas.config Kafka Property. Discussion about this is here.
from streamsx.kafka.
- For the submission-time value, I was planning to keep it as a list of key=value pairs (same as the existing Kafka operators)
- As for the MessageHub wrappers, could you provide a little more info on how you think this should work? Would the MessageHub ops be Java primitive operators that extend the "low-level" Kafka operators, or would they be SPL composites?
from streamsx.kafka.
@Alex-Cook4 Sorry, I forgot to mention the JAAS config in the above proposal. My plan is to eliminate the jaasFile
and jaasFilePropName
parameters and use the sasl.jaas.config property that you mentioned. I already prototyped this in my workspace and I am able to connect to MessageHub same as before. More interestingly, I am able to construct the value for this property using only the information provided in the MessageHub creds. This means users will no longer have to deal with JAAS files when connecting to MessageHub.
Users connecting to a secured on-prem Kafka cluster will still need to provide JAAS information, but they can pass in that info using the sasl.jaas.config property rather than as a separate file.
Edit: I added a section to the above proposal regarding SASL/JAAS.
from streamsx.kafka.
Excellent! Yes, I didn't realize that everything is in the credentials file now. That will make this so much easier.
I think @ddebrunner has done this before with the MQTT operators, but this would allow us to upload .sab files that just take a submission-time parameter or AppConfig to connect to MessageHub, then just have a publish/subscribe way for other Streams apps to get and write to MessageHub. Maybe that's obvious, but I find it exciting!
from streamsx.kafka.
For the submission-time value, I was planning to keep it as a list of key=value pairs (same as the existing Kafka operators)
Just seems like that's going to be pretty ugly to enter as a submission time value.
I can see why a submission value might be an idea, but at least for distributed/bluemix using an application configuration is a better approach. Then one has a submission parameter that provides the app config name rather than the actual configuration.
from streamsx.kafka.
I think we need to be careful about trying to cover every possible situation and making the configuration too complex, thus making the operator harder to use. Having five levels of finding config information with some having multiple entries might be over thinking the problem.
It's also important to note that there are (I think) two types of kafka configuration items:
- Information to identify the Kafka broker
- Configuration for the application, e.g. the setting for ensuring the message makes it to all brokers.
The second set should not be overwritten by configuration since it changes the application behaviour, e.g. breaking consistent region guarantees.
As @Alex-Cook4 pointed out, for MessageHub I want a simple foolproof way to connect to it, ideally a single parameter, which app config to use, maybe even defaulting to one named "message-hub".
from streamsx.kafka.
Java primitive or composite is really an implementation detail, though if the operator has knowledge of submission time parameters it has to be a composite.
I would be somewhat wary of adding support for submission time value support in an operator, it means that the operator always exposes ways to change its configuration, thus no longer left up to the application to decide. A microservice that exposes submission time value is fine, an operator I'm not so sure about, submission time values are really the application's responsibility.
from streamsx.kafka.
Just seems like that's going to be pretty ugly to enter as a submission time value.
I can see why a submission value might be an idea, but at least for distributed/bluemix using an application configuration is a better approach. Then one has a submission parameter that provides the app config name rather than the actual configuration.
@ddebrunner I agree that app configs make more sense. The use cases I have in mind for this are:
- Running an app in standalone
- Using the operators in a Java topology that is submitted using an Embedded context
In both cases, the submission-time parameter provides a straight-forward way to dynamically/programmatically set things like brokers or to test different property values.
from streamsx.kafka.
I think we need to be careful about trying to cover every possible situation and making the configuration too complex, thus making the operator harder to use. Having five levels of finding config information with some having multiple entries might be over thinking the problem.
That's a good point. I think we'll start without the kafkaProperty
parameter and if someone presents a valid need for it, we can add it back in. It always easier to add something to the operators than to remove it.
from streamsx.kafka.
It's also important to note that there are (I think) two types of kafka configuration items:
- Information to identify the Kafka broker
- Configuration for the application, e.g. the setting for ensuring the message makes it to all brokers.
The problem is that Kafka doesn't make this sort of distinction...everything is a property, whether it's setting the brokers, setting up creds or changing app behaviour. I worry about trying to invent categories (connectivity vs app config) for the different properties and then treating those properties in a different way.
from streamsx.kafka.
@cancilla Maybe the same case for multple configuration items., e.g. multiple application configurations. What would you expect to be in each app config?
from streamsx.kafka.
I worry about trying to invent categories (connectivity vs app config) for the different properties
That wasn't what I meant, but instead the app should be able to have the last word in setting properties, so if it sets some_kafka_property='a'
then that wins.
from streamsx.kafka.
Thanks for the feedback! As per the discussion, the operators will:
- allow properties to be specified either via an
appConfig
orpropertiesFile
- properties specified via a
propertiesFile
will override any properties specified in anappConfig
- A separate repo and toolkit have been created to for MessageHub operators
from streamsx.kafka.
Related Issues (20)
- KafkaProducer: message or key attribute with underline causes error at context checker HOT 5
- please remove the master branch
- invalid links in io pages after removal of master branch
- KafkaConsumer: dynamic startPosition parameter HOT 4
- Add sample with KafkaConsumer and start position as submission time value
- KafkaProducer operator: remove the deprecated consistent region policy "AtLeastOnce" HOT 1
- Toolkit includes a vulnerable version of log4j. HOT 1
- upgrade kafka-clients from 2.3.1 to 2.5.1 HOT 2
- support incremental cooperative rebalancing
- KafkaConsumer: Remove support of input port when in consistent region HOT 1
- SPL doc: Add hints, what to do when the toolkit is used with Kafka version 0.10
- KafkaProducer: remove the deprecated consistentRegionPolicy "AtLeastOnce" HOT 1
- KafkaConsumer: Reset of CR may fail when partitions are revoked before the reset actually happens HOT 1
- I18n: update message translation HOT 1
- Application directory cannot be used as file location for files specified via system property and vmArg parameter HOT 2
- Sample applications: Prepare Makefiles for CP4D buildservice (build with VsCode)
- Expansion of {applicationDir} in vmArg does not work HOT 2
- KafkaConsumer: support submission time parameter for multiple topics HOT 1
- streamsx.kafka toolkit: Upgrade slf4j-log4j12-1.7 jar library to log4j-slf4j-impl version 2.17.2
- create a new version for sttreamsx.kafka due to log4j issue
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from streamsx.kafka.