squerez / maestro Goto Github PK

2.0 2.0 0.0 117 KB

Another data engineering framework

License: MIT License

Python 100.00%

maestro's Introduction

Hey there 👋

Welcome to my corner of GitHub!
I'm squerez, just a regular tech enthusiast diving into this area for about 5 years now.

About me

I've always been captivated by tech and programming.
When I first joined GitHub, just building a simple Python script to print messages in text boxes was quite the challenge for me. Fast-forward to this day, I've made some progress, but hey, there's always more to explore and learn.

My contributions

So far, I've dipped my toes into several projects, mainly personal, hitting these milestones:

I've committed 172 times to a bunch of different repos;
Opened 10 issues and pushed 43 pull requests;
My personal "pet" projects have scored around 4 stars, spread across 20 repositories;
And contributed in 1 public repositories.

Languages I use

My playground involves various languages, but these are my go-tos:

Tech I'm learning

I'm on a quest to level up my skills in:

Rust;
Go;
Ansible;
Flux.

When I'm not here, you might find me wandering in a forest of repositories, where I've gotten so lost, I've set up campsites in unfinished projects, in each a sign saying, ⚠️ under construction (forever).

You know, just your typical getaway from the world of 'done'.

Anyways, if you're still curious, check out my commit snake animation right here in my profile.
It's a cool visual of my coding journey, constantly changing with each contribution.

github contribution grid snake animation

My creative collection

Behold, my digital creations:

maestro - a Python-based data engineering metadata-driven framework. Currently in a testing phase, evolving as I learn more in the DE field.;
shamir - a simple OTP API designed for team use, born from my curious mind exploring API design concepts.;
rustsnake - a customizable snake game crafted in Rust, wWhere I've been playing with both the language and game development.;
init.lua - my tangled web of Neovim configurations, ever-evolving to suit my whims as a lazy dev.

Feel free to explore and see what I've been up to.

Thanks for stopping by and checking out my arts & crafts. If you've got questions, ideas, or want to collaborate, drop a line via an issue on this repo.

Until next time! 👋

maestro's People

Stargazers

Watchers

maestro's Issues

Refactor - CLI

The CLI, the main user interface, should allow a user to:

Start and stop tasks
Get the status of tasks
See the state of machines (i.e. the workers)
Start the manager
Start the worker

Refactor - job

The job is an aggregation of tasks. It has one or more tasks that typically form a larger logical grouping of tasks to perform a set of functions. In Kubernetes, the job has type of job - maybe to be implemented in the future?

A job should specify details at a high level and will apply to all tasks it defines:

```
Each task that makes up the job
```

How many instances of each task should run

The type of the job (should it be running continuously or will it run to completion and stop?)

Refactor - task

Implement a new class called Task - a task is the smallest unit of work in an orchestration system.

A task should specify the following:

The amount of memory, CPU, and disk it needs to run effectively (need to refine this);

What the orchestrator should do in case of failures, typically called a restart policy;

The name of the container image used to run the task (need to refine this)

Task definitions may specify additional details, but these are the core requirements.

The first thing we want to think about is the states a task will go through during its life.

First, a user submits a task to the system. At this point, the task has been enqueued but is waiting to be scheduled. Let’s call this initial state Pending.

Once the system has figured out where to run the task, we can say it has been moved into a state of Scheduled. The scheduled state means the system has determined there is a machine that can run the task, but it is in the process of sending the task to the selected machine or the selected machine is in the process of starting the task.

Next, if the selected machine successfully starts the task, it moves into the Running state.

Upon a task completing its work successfully, or being stopped by a user, the task moves into a state of Completed.

If at any point the task crashes or stops working as expected, the task then moves into a state of Failed.

In order to run our tasks as containers, they need a configuration. For a task in our orchestration system, we’ll describe its configuration using the Config class. This class encapsulates all the necessary bits of information about a task’s configuration:

The Name field will be used to identify a task in our orchestration system, and it will perform double duty as the name of the running container.
The Image field, as you probably guessed, holds the name of the image the container will run. Remember, an image can be thought of as a package: it contains the collection of files and instructions necessary to run a program.
The Memory and Disk fields will serve two purposes. The scheduler will use them to find a node in the cluster capable of running a task. They will also be used to tell the Docker daemon the amount of resources a task requires.
The Env field allows a user to specify environment variables that will get passed in to the container.
Finally, the RestartPolicy field tells the Docker daemon what to do in the event a container dies unexpectedly. This field is one of the mechanisms that provides resilience in our orchestration system. As you can see from the comment, the acceptable values are an empty string, always, unless-stopped, or on-failure. Setting this field to always will, as its name implies, restart a container if it stops. Setting it to unless-stopped will restart a container unless it has been stopped (e.g. by docker stop). Setting it to on-failure will restart the container if it exits due to an error (i.e. a non-zero exit code).

Refactor - worker

The worker provides the muscles of an orchestrator.

It is responsible for running the tasks assigned to it by the manager. If a task fails for any reason, it must attempt to restart the task. The worker also makes metrics about its tasks and its overall machine health available for the manager to poll.

The worker is responsible for the following:

```
Running tasks as Docker containers.
```
```
Accepting tasks to run from a manager.
```

Providing relevant statistics to the manager for the purpose of scheduling tasks.

Keeping track of its tasks and their state.

Like the manager, it too has an API, though it serves a different purpose. The primary user of this API is the manager. The API provides the means for the manager to send tasks to the worker, to tell the worker to stop tasks, and to retrieve metrics about the worker’s state. Next, the worker has a task runtime, which in our case will be Docker. Like the manager, the worker also keeps track of the work it is responsible for, which is done in the Task Storage layer. Finally, the worker provides metrics about its own state, which it makes available via its API.

Refactor - manager

The manager is the brain of an orchestrator and the main entry point for users. I

n order to run jobs in the orchestration system, users submit their jobs to the manager. The manager, using the scheduler, then finds a machine where the job’s tasks can run. The manager also periodically collects metrics from each of its workers, which are used in the scheduling process.

The manager should do the following:

Accept requests from users to start and stop tasks.

```
Schedule tasks onto worker machines.
```

Keep track of tasks, their states, and the machine on which they run.

We will also need to implement the API - The API is the primary mechanism for interacting with maestro.
Users submit jobs and request jobs be stopped via the API. A user can also query the API to get information about job and worker status.

We will also need to implement some kind of storage. The manager must keep track of all the jobs in the system in order to make good scheduling decisions, as well as to provide answers to user queries about job and worker statuses. The manager also needs to keep track of worker metrics, such as the number of jobs a worker is currently running, how much memory it has available, how much load is the CPU under, and how much disk space is free. This data, like the data in the job storage layer, is used for scheduling.

Refactor - scheduler

The scheduler decides what machine can best host the tasks defined in the job.
We will need to define the decision-making process:

The decision-making process can be as simple as selecting a node from a set of machines in a round-robin fashion, or as complex as the EPVM scheduler (used as part of Google’s Borg scheduler), which calculates a score based on a number of variables and then selects a node with the "best" score.

The scheduler should perform these functions:

Determine a set of candidate machines on which a task could run.

Score the candidate machines from best to worst.

```
Pick the machine with the best score.
```

A scheduler contains 3 main phases that represent the order in which the scheduler moves through the process of scheduling task onto workers: feasibility, scoring, and picking.

**Feasability**: This phase assesses whether it’s even possible to schedule a task onto a worker. There will be cases where a task cannot be scheduled onto any worker; there will also be cases where a task can be scheduled but only onto a subset of workers. We can think of this phase similar to choosing which car to buy. My budget is $10,000, but depending on which car lot I go to all the cars on the lot could cost more than $10,000 or there may only be subset of cars that fit into my price range.

**Scoring**: This phase takes the workers identified by the feasability phase and gives each one a score. This stage is the most important and can be accomplished any number of ways. For example, to continue our car purchase analogy, I might give a score for each of three cars that fit within my budget based on variables like fuel efficiency, color, and safety rating.

**Picking**: The phase is the simplest. From the list of scores, the scheduler picks the best one. This will be either the highest or lowest score.

squerez / maestro Goto Github PK

maestro's Introduction

Hey there 👋

About me

My contributions

Languages I use

Tech I'm learning

My creative collection

maestro's People

Stargazers

Watchers

maestro's Issues

Refactor - CLI

Refactor - job

Refactor - task

Refactor - worker

Refactor - manager

Refactor - scheduler

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent