openfreeenergy / exorcist Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 0.0 194 KB

Daemonless campaign-scale simulation orchestration

License: MIT License

Python 100.00%

exorcist's People

Contributors

Watchers

exorcist's Issues

TaskStatusDB: Set up an empty database

Empty database includes two tables:

tasks with columns:
- taskid: str
- status: int
- last_modified: datetime
- tries: int
dependencies with columns:
- to: str (FK on tasks.taskid)
- from: str (FK on tasks.taskid)

Main user-facing to implement:

__init__(self, engine: sqla.Engine)
from_filename(cls, filename: os.PathLike)

DISCUSS: TaskStatusDB: Switch to storing status name instead of value?

In original development, I stored the integer value associated from the TaskStatus enum for a given status. The question here is whether to instead use string name from the enum. At this point I don't have a strong preference of one over the other, although I'm leaning a bit toward using the string name. Here are the advantages I see to each choice:

Why to use string name:

Allows us to use sqla.Enum as column type, which may do better validation of values (haven't checked this, but we're certainly not currently preventing the DB from storing an int with no meaning from the enum)
More obvious output if user directly works with DB (e.g., loading tasks table with a pandas data frame): meaningful string instead of meaningless int
(I think) it will allow us to immediately get the enum object back, instead of converting the int value into the enum object being our responsibility. This could simplify future code based on an existing task database (dashboards, consistency checks, etc.)

Why to use int value:

Possible performance improvements (space and speed) over storing CHAR/VARCHAR.
If sqla.Enum is internally using CHAR, there might be migration issues if a new status is added to the enum (different CHAR length might be required)
In the short term, I think we're more likely to change the name of a status than its numerical value. That would be breaking for existing DBs using different string names.

Worker: Method for selecting task to work on

The Worker need to have a way to select which task it will run. There are a few options here; I think we should engineer things such that we can easily try alternatives, since I'm not sure what will best meet needs of users. A couple options:

Priority in the task status DB. Get first available sorted by priority (on the SQL end). Should be fast, but reading doesn't block other readers, so there are potential concurrency pileup issues.
Make the decision in Python; could include some randomness to avoid concurrency issues (e.g., select weighted by priority). Will be slower between read and claim, but we already have safety on the claim to ensure that we're actually the only one to get stake our claim to a task.

Worker: Method to run tasks sequentially until out of queue time

TaskStatusDB: Methods to add a task/network of tasks to a database

Make it possible to add Tasks to the database.

add_task(self, task: Task, requirements: Iterable[Task]: add a single task to database, along with edges to things it depends on. A task with no dependencies should be added with status as AVAILABLE, otherwise status should be BLOCKED.
add_task_network(self, network: nx.DiGraph): add an entire graph (with Tasks as nodes) to the database. Initial status as with add_task.

These should probably use some internal methods, rather than having add_task_network call add_task for each task (which would require a separate transaction with the DB for each task). Adding tasks to the DB should be batched.

TaskStatusDB: Rebuild task network object

Go from the databases to an nx.DiGraph of Tasks.

This isn't strictly necessary for minimal functionality, but has potential to be very useful for things like troubleshooting, debugging, and ensuring database consistency.

This should actually be done in 2 stages: going from the TaskStatusDB to a network of taskid strings, and then a second function that takes that taskid network and attaches TaskDetails from the TaskDetailsStore.

TaskStatusDB: Method to update task status

update_task_status(self, taskid, new_status, old_status)

A couple concerns/questions:

Do we need to pass the DB connection in here? This seems like it could be part of a more complicated sequence that we'd like to commit all at once.

ResultStore

This object stores the final results, and is specific to the client application. The retry number is passed to this when storing, and it is up to the client application to ensure that the combination of result object and retry number is ensured a unique location in their storage.

is_failure_result(result: ResultObject) -> bool
store_result(result: ResultObject, retry: int)
~~load_result(label: str, retry: int): this isn't strictly necessary for Exorcist, but may be useful (and will be needed in the client code anyway)~~

Worker: Method to run a single task

TaskStatusDB: Update DAG after task completion

mark_task_completed(self, taskid): This both marks the task as having status COMPLETED and also updates the dependencies table to mark tasks involving this one as completed, and finally marks any newly unblocked tasks as AVAILABLE.

This involves a decent bit of shuffling between SQL and Python, with a lot of writes. This is the area where we'll need to pay close attention to avoid inconsistency problems.

Migrate setup to pyproject

TaskDetailsStore

This object loads and saves details of how to run tasks. This is specific to the client application, but we define an API here that must be at least duck-typed to.

In practice, our first usage will be as files on the filesystem.

~~load_task(self, taskid: str) -> Callable[[], Result]: Note that this returns a callable. All that the worker needs to do is call the function returned here.~~
store_task_details(self, taskid: str, task_details: TaskDetails): Store the task details. The nature of the TaskDetails object depends on the client application.
load_task_details(self, taskid: str) -> TaskDetails: ~~This isn't strictly needed for the primary functionality, but~~ will be useful for various tools for troubleshooting/introspection/debugging. (Is needed if we switch to the run_task model)
run_task(self, task_details: TaskDetails) -> Result: Run the actual task.

Example client code

Since TaskDetailsStore and ResultStore need to be subclassed (or duck-typed) by client code, we need to have a very simple example to show how to do this. This will also facilitate our testing, especially when doing integration tests between various units and moving toward end-to-end testing.

openfreeenergy / exorcist Goto Github PK

exorcist's People

Contributors

Watchers

exorcist's Issues

TaskStatusDB: Set up an empty database

DISCUSS: TaskStatusDB: Switch to storing status name instead of value?

Worker: Method for selecting task to work on

Worker: Method to run tasks sequentially until out of queue time

TaskStatusDB: Methods to add a task/network of tasks to a database

TaskStatusDB: Rebuild task network object

TaskStatusDB: Method to update task status

ResultStore

Worker: Method to run a single task

TaskStatusDB: Update DAG after task completion

Migrate setup to pyproject

TaskDetailsStore

Example client code

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent