Coder Social home page Coder Social logo

Proposal: chained pipelines about broadway HOT 7 CLOSED

dashbitco avatar dashbitco commented on July 17, 2024
Proposal: chained pipelines

from broadway.

Comments (7)

josevalim avatar josevalim commented on July 17, 2024

Hi @towhans, can you please expand on the use case? Generally speaking you don't want to pass the data through multiple processes, as that incurs copying. So our concern with "connecting pipelines" is that users will end-up using pipelines for code organization purposes instead of modelling runtime concerns.

So can you describe why would you need to pass the data around? Thanks!

from broadway.

towhans avatar towhans commented on July 17, 2024

transformer1 -> processor1 -> batcher1
transformer2 -> processor2 -> batcher2

The case is that transformer2 is to be applied after processor1. processor1 is statefull. transformer2 is stateless. If we make:

transformer1 -> processor1 |> transformer2 |> processor2 -> batcher

then we can't specify different parallelization for transformer2.

So the case is about interleaving stateful and stateless transformations.

from broadway.

josevalim avatar josevalim commented on July 17, 2024

So the case is about interleaving stateful and stateless transformations.

Which kind of transformations though? What is stateful and what isn't?

In theory, the only benefit for creating new pipelines / new stages is if different part of those stages depend on different IO resources and we plan to do it as part of #39. Stateful or stateless should not matter. :)

from broadway.

towhans avatar towhans commented on July 17, 2024

Sorry for taking so long to respond. I had to think it through again. I get your point to avoid the anti-pattern of using gen_stages for code organization. In our case transformators are stateless and processors are statefull. But that doesn't really matter. The importatnt realization for me is that the "chain of pipelines" is a higher level thing that can be assembled into one single broadway pipeline. So I retract the proposal and thank you for your replies. They were very helpful.

from broadway.

josevalim avatar josevalim commented on July 17, 2024

Thanks for following up! The unnecessary creation of processes/stages is exactly what we want to avoid, so when we adding multiple processors, we have to be really careful in documenting those concerns!

from broadway.

kwando avatar kwando commented on July 17, 2024

I have a usecase for this, I think.

Stream of user ids -> batchLookup profiles for users -> partition / filter profiles -> do somehting with batches of profiles.

I can sort of make this work by moving the profile lookup into the producer, but then I need to build out that convenient batching logic myself instead.

from broadway.

msaraiva avatar msaraiva commented on July 17, 2024

Hi @kwando!

Thanks for the feedback.

I believe you'll be able to achieve that after we implement #39.

from broadway.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.