Toolbox of compute resources for the Specify Network.
specifysystems / complexify Goto Github PK
View Code? Open in Web Editor NEWToolbox of compute resources for the Specify Network.
License: GNU General Public License v3.0
Toolbox of compute resources for the Specify Network.
License: GNU General Public License v3.0
We decided to skip the controller container for the first version and just have daemon processes on the makeflow containers.
We will also need to store job configuration files somewhere as well as job data like user uploads and environment data for modeling
The first version is likely a simple sqlite container
Our plan as of 4/12/2022 is to only expose database operations via the API and not connect to the database directly from any container other than the web services container. Update the documentation to reflect this change in direction.
This demo makeflow would be the first step when running things by hand and would produce occurrence files for each of the Heuchera species. It could start with Ryan's heuchera csv file but it may be useful to throw in some data aggregator files as well for a more robust test case
We settled on the makeflow container (maybe rename) running a daemon process that gets jobs and runs one or more makeflows. Create a diagram showing these interactions and making it clear how the process runs a job.
Needed services:
I am not sure if the first version needs to take uploads or not or if we want to add that later. This may need to expand depending on how / if we want to handle users to start or if this will just be for us to begin with.
Implement a test case for the complexify framework. This test case starts with a list of species names to be included and will need to incorporate data from a variety of occurrence sources. Models should use maxent or rare species modeling. Utilize masks and end with a set of SDMs grouped by taxonomic groups.
A version zero should be able to run a makeflow by hand through the worker containers and produce a result.
Document how to start up the system and run a demo makeflow
Add documentation for all available tasks currently available so we can start putting together test job configurations
We may need to iterate over this a few times but, as of now, we plan on producing a docker container or set of containers that are accessible via a web service so that someone could connect to them from an R or Python client library.
Some likely components to this diagram:
Document all of the data types we are exposing via our tasks. This includes primative types and derrived types and should include at least all of the input and output types we are exposing via our current tasks and may include some future planned tasks
Current thoughts are at:
https://github.com/specifysystems/complexify/blob/main/docs/async_containers_design.md
I think that this gets us pretty close to a version 1 plan. There may be some iterating over the makeflow and controller containers but the worker, catalog server, database, and web server containers are all pretty set I believe.
This container should include all of our computational tools (lmpy, lmtools, biotaphy, syftr) in whatever form they are currently in (but easy to update). It should probably run a daemon process like the work queue worker factory
Note where containers are evolutions of old Lifemapper components (like MattDaemon). Update documentation and add new container types as the previous iteration is being split.
This container should be very simple and just include the catalog server to facilitate computations
The first demo makeflow we need is one that can run SDMs from known occurrence data. Use Heuchera as a test case and generate a demo makeflow that cleans the occurrence records before running them through maxent.
There will most likely be at least two workflow types. A first workflow will process the input occurrences, altering them slightly, and creating one more more output files of grouped (probably by species) occurrences as well as a file, or files, indicating which species are present in those occurrence files. This is necessary so that we know what files will be generated by the various tasks.
The other workflow(s) will process the occurrence data in groups to assess the records, create SDMs, create species manifests and syftorium files for cataloging as well as create an output package.
It is important to also figure out where we can include multi-species processing in the workflow.
Try to capture the general purpose of each of the containers and how they operate. Make sure that it is clear how the worker container works by running work_queue_factory which runs work_queue_workers and how they determine what to do.
This client should match the API as it currently stands. I don't know if this will ever be widely distributed but there should still be some protections and forethought in case we do decide to distribute for some reason (in entirety or just a portion).
This demo makeflow is more for testing complexify than for testing lmpy scripts. Aimee can help debug any problems found within lmpy or logic errors encountered. This establishes a workflow to build off of for multi-species operations and statistics.
Add whatever hooks we think that we need for the complexify repository to ensure we have acceptable code quality and consistency as well as automated testing and any appropriate CI / CD
There may be a few questions left to answer before this can be done, but at its core, this container needs to run a makeflow or multiple makeflows.
It may need to do some job configuration processing and do some things outside of makeflow as well.
It also needs to be determined if this is a "one shot" (or single run or whatever else it is called) container or if there will be some daemon process so it continuously runs.
This tool should take a job configuration as an argument and processes the job appropriately including one or more makeflows
The first version of the job daemon process can be pretty simple and be focused on running our established job workflows. Try to make it easy to extend to more generic workflows but getting something that we can use quickly is important.
Will run as the daemon process on makeflow container instances.
Nginx and flask most likely
Test that the entire system (v1) can run from start to finish with Heuchera dataset.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.