Comments (7)
Hi @towhans, can you please expand on the use case? Generally speaking you don't want to pass the data through multiple processes, as that incurs copying. So our concern with "connecting pipelines" is that users will end-up using pipelines for code organization purposes instead of modelling runtime concerns.
So can you describe why would you need to pass the data around? Thanks!
from broadway.
transformer1 -> processor1 -> batcher1
transformer2 -> processor2 -> batcher2
The case is that transformer2
is to be applied after processor1
. processor1
is statefull. transformer2
is stateless. If we make:
transformer1 -> processor1 |> transformer2 |> processor2 -> batcher
then we can't specify different parallelization for transformer2.
So the case is about interleaving stateful and stateless transformations.
from broadway.
So the case is about interleaving stateful and stateless transformations.
Which kind of transformations though? What is stateful and what isn't?
In theory, the only benefit for creating new pipelines / new stages is if different part of those stages depend on different IO resources and we plan to do it as part of #39. Stateful or stateless should not matter. :)
from broadway.
Sorry for taking so long to respond. I had to think it through again. I get your point to avoid the anti-pattern of using gen_stages for code organization. In our case transformators
are stateless and processors
are statefull. But that doesn't really matter. The importatnt realization for me is that the "chain of pipelines" is a higher level thing that can be assembled into one single broadway pipeline. So I retract the proposal and thank you for your replies. They were very helpful.
from broadway.
Thanks for following up! The unnecessary creation of processes/stages is exactly what we want to avoid, so when we adding multiple processors, we have to be really careful in documenting those concerns!
from broadway.
I have a usecase for this, I think.
Stream of user ids -> batchLookup profiles for users -> partition / filter profiles -> do somehting with batches of profiles.
I can sort of make this work by moving the profile lookup into the producer, but then I need to build out that convenient batching logic myself instead.
from broadway.
Hi @kwando!
Thanks for the feedback.
I believe you'll be able to achieve that after we implement #39.
from broadway.
Related Issues (20)
- How to stop a Broadway Kafka pipeline? HOT 1
- Make producer module a keyword list to ease configuration management? HOT 6
- Broadway.update_rate_limit doesn't reset the counter/interval right away HOT 3
- NoopAcknowledger fails with ack key being set HOT 7
- Disable automatic call to handle_batch/4 HOT 2
- [Question] Creating a Broadway Message struct for testing?
- Telemetry distinguish between Producer metrics HOT 2
- Expected Behavior on Startup? HOT 4
- [docs] The `Broadway.test_batch` example doesn't work with Broadway 1.0.3 HOT 3
- Dialyzer error on ack_immediately/1 HOT 3
- Broadway v1.0.4 Broadway.NoopAcknowledger returns NoopAcknowledger instead of Broadway.NoopAcknowledger HOT 1
- Add `terminate/3` callback HOT 11
- Oban producer HOT 3
- Allow support for Nimble Options 1.0 HOT 5
- Request for MQTT support in Elixir Broadway HOT 3
- Issues using Broadway with DynamicSupervisor
- `handle_message` timeout HOT 2
- C
- Batcher Concurrency based on batcher_key HOT 2
- GenStage 1.2.0 breaks tests HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from broadway.