This project designs and implements a distributed system to process large computational jobs across a fleet of workers. The primary task is to compute the mean of every index across multiple files, each containing a set of random numbers.
Checkout the deployed version: Distributed Average
- Dynamic Worker Allocation: Ability to spawn a user-defined number of worker processes (W).
- File Processing: Compute the mean across F files, each with C random numbers.
- Resource Management: Each worker can process a maximum of 5 files concurrently, ensuring efficient memory usage.
- Queue Management: Jobs are queued and processed asynchronously, allowing for multiple jobs to be submitted and processed in parallel.
- Worker Orchestration: Optimized to minimize worker idle time and handle varying processing speeds.
- Real-Time Monitoring: The UI displays live updates on the system's state, including worker status, job completion, and queue information through RabbitMQs management UI.
- UI: React with Javascript
- API: Node.js with Express and TypeScript
- Worker Processes: Python
- Queue System: RabbitMQ
- Storage: AWS S3
- Node.js
- Python
- React
- Docker
- AWS
- Rust (if you want fast local file generation)
- Clone the repository:
git clone [repo-link]
. - Set up environment, see example env for required variables.
- Run
docker compose up
The services are hosted on DigitalOcean's Kubernetes service, utilizing three nodes. The API, upon receiving the desired number of workers from the user, spawns corresponding Kubernetes pods using the provided Docker file.
Upon receiving
To address variations in worker processing speeds and minimize idle time during each summation phase, a speculative execution strategy is employed. The Master maintains statistics on the average completion time for tasks and periodically reschedules tasks based on certain conditions:
- The job is close to completion (over 75% of tasks are complete).
- Some tasks have been running for longer than 1.2 times the average duration.
These criteria help ensure that worker nodes are fully utilized and that stragglers tasks do not unduly delay a summation phase. The specific constants used in this strategy may require tuning to optimize system performance.
Given the RAM constraints, a straightforward choice for the partition size would be
For instance, if a worker
A potential downside to this approach lies in its interaction with our speculative execution strategy. If the number of files in a task is too large, there's a risk of substantial recomputation when a faster node takes over a task. Currently, our system does not support the utilization of partial results, which could mitigate this issue.
As is probably obvious from the looks of the UI, I am not a frontend engineer. I apologize, I know it sucks. I promise I don't have bad taste.
Thanks to the creators of this takehome, it was quite interesting to think about and implement.