openfreeenergy / exorcist Goto Github PK
View Code? Open in Web Editor NEWDaemonless campaign-scale simulation orchestration
License: MIT License
Daemonless campaign-scale simulation orchestration
License: MIT License
Empty database includes two tables:
tasks
with columns:
taskid: str
status: int
last_modified: datetime
tries: int
dependencies
with columns:
to: str
(FK on tasks.taskid
)from: str
(FK on tasks.taskid
)Main user-facing to implement:
__init__(self, engine: sqla.Engine)
from_filename(cls, filename: os.PathLike)
In original development, I stored the integer value associated from the TaskStatus
enum for a given status. The question here is whether to instead use string name from the enum. At this point I don't have a strong preference of one over the other, although I'm leaning a bit toward using the string name. Here are the advantages I see to each choice:
Why to use string name:
sqla.Enum
as column type, which may do better validation of values (haven't checked this, but we're certainly not currently preventing the DB from storing an int with no meaning from the enum)Why to use int value:
CHAR
/VARCHAR
.sqla.Enum
is internally using CHAR
, there might be migration issues if a new status is added to the enum (different CHAR
length might be required)The Worker
need to have a way to select which task it will run. There are a few options here; I think we should engineer things such that we can easily try alternatives, since I'm not sure what will best meet needs of users. A couple options:
Make it possible to add Task
s to the database.
add_task(self, task: Task, requirements: Iterable[Task]
: add a single task to database, along with edges to things it depends on. A task with no dependencies should be added with status as AVAILABLE
, otherwise status should be BLOCKED
.add_task_network(self, network: nx.DiGraph)
: add an entire graph (with Task
s as nodes) to the database. Initial status as with add_task
.These should probably use some internal methods, rather than having add_task_network
call add_task
for each task (which would require a separate transaction with the DB for each task). Adding tasks to the DB should be batched.
Go from the databases to an nx.DiGraph
of Task
s.
This isn't strictly necessary for minimal functionality, but has potential to be very useful for things like troubleshooting, debugging, and ensuring database consistency.
This should actually be done in 2 stages: going from the TaskStatusDB
to a network of taskid
strings, and then a second function that takes that taskid
network and attaches TaskDetails
from the TaskDetailsStore
.
update_task_status(self, taskid, new_status, old_status)
A couple concerns/questions:
This object stores the final results, and is specific to the client application. The retry number is passed to this when storing, and it is up to the client application to ensure that the combination of result object and retry number is ensured a unique location in their storage.
is_failure_result(result: ResultObject) -> bool
store_result(result: ResultObject, retry: int)
load_result(label: str, retry: int)
: this isn't strictly necessary for Exorcist, but may be useful (and will be needed in the client code anyway)mark_task_completed(self, taskid)
: This both marks the task as having status COMPLETED
and also updates the dependencies table to mark tasks involving this one as completed, and finally marks any newly unblocked tasks as AVAILABLE
.This involves a decent bit of shuffling between SQL and Python, with a lot of writes. This is the area where we'll need to pay close attention to avoid inconsistency problems.
This object loads and saves details of how to run tasks. This is specific to the client application, but we define an API here that must be at least duck-typed to.
In practice, our first usage will be as files on the filesystem.
load_task(self, taskid: str) -> Callable[[], Result]
: Note that this returns a callable. All that the worker needs to do is call the function returned here.store_task_details(self, taskid: str, task_details: TaskDetails)
: Store the task details. The nature of the TaskDetails
object depends on the client application.load_task_details(self, taskid: str) -> TaskDetails
: run_task
model)run_task(self, task_details: TaskDetails) -> Result
: Run the actual task.Since TaskDetailsStore
and ResultStore
need to be subclassed (or duck-typed) by client code, we need to have a very simple example to show how to do this. This will also facilitate our testing, especially when doing integration tests between various units and moving toward end-to-end testing.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.