Coder Social home page Coder Social logo

massive's People

Contributors

vicentemundim avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

isabella232

massive's Issues

Improve parallelization of processes

Currently Massive steps will check the number of items to be processed and will create jobs accordingly. This works pretty well when all you need is to parallelize the processing of one file. But what happens when two processes are started concurrently (not necessarily at the same time)?

The first process step will enqueue its jobs (say 100 jobs), which will be processed by some workers (3 for example). In this scenario each worker will process 3 jobs at a time. Then the second process is started, enqueing more 50 jobs. Since all workers are busy this process will hang, waiting for a worker to be able to process it.

What is worse is that since Resque will push the first process 100 jobs into the queue (with RPUSH), and will pop them in order (with LPOP), the second process jobs will only run after the first process jobs complete.

A solution would be to use multiple queues with different priorities and have the jobs enqueued in those different queues.

For example, in the above sceneario, suppose that the first process would enqueue its jobs in 100 different queues named massive-1, massive-2, massive-3, ..., massive-100. When the second process enqueues its jobs, it would also enqueue then in different queues, like massive-1, massive-2, massive-3, ..., massive-50. The workers would need to be started with a splat (*), so that Resque is able to process those dynamic queues.

Now when one of the 3 workers finish processing a job for the first process it will get a job from higher priority queues, which are massive-1, then massive-2, etc. Since there are jobs from the second process in those queues, they would be processed before a job from the first process.

It would happen like so with 3 workers:

  • First process enqueues 100 jobs in massive-N queues
  • Workers process 3 jobs from the first process in massive-1, massive-2, massive-3 queues
  • Second process enqueues 50 jobs in massive-N queues
  • After the job in massive-1 gets processed, the second process job in massive-1 starts processing
  • After the job in massive-2 gets processed, the second process job in massive-2 starts processing
  • After the job in massive-3 gets processed, the second process job in massive-3 starts processing
  • After the second process job in massive-1 gets processed, the first process job in massive-4 starts processing
  • After the second process job in massive-2 gets processed, the first process job in massive-2 starts processing
  • After the second process job in massive-3 gets processed, the first process job in massive-3 starts processing
  • So on until there are no jobs to be processed.

In this way we give each process a fair amount of processing when concurrent process are run. By adding more workers we are able to process more jobs in parallel, including ones of different processes.

The developer could specify how many queues he wants to use, and whether he wants to split jobs in queues at all. For example, we could limit it to use only 10 queues. For processes that use multiple jobs this would mean that processes would not be run in parallel, but could work for ones in multiple jobs.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.