Light

specifysystems / complexify Goto Github PK

View Code? Open in Web Editor NEW

0.0 2.0 0.0 44.35 MB

Toolbox of compute resources for the Specify Network.

License: GNU General Public License v3.0

Dockerfile 18.98% Python 81.02%

complexify's Introduction

complexify

Toolbox of compute resources for the Specify Network.

complexify's People

Contributors

Watchers

complexify's Issues

Update diagrams to show updated design and storage volumes

We decided to skip the controller container for the first version and just have daemon processes on the makeflow containers.

We will also need to store job configuration files somewhere as well as job data like user uploads and environment data for modeling

Create a database container

The first version is likely a simple sqlite container

Update diagrams to show connections to database always going through API

Our plan as of 4/12/2022 is to only expose database operations via the API and not connect to the database directly from any container other than the web services container. Update the documentation to reflect this change in direction.

Create a demo makeflow for processing raw input occurrences from aggregators

This demo makeflow would be the first step when running things by hand and would produce occurrence files for each of the Heuchera species. It could start with Ryan's heuchera csv file but it may be useful to throw in some data aggregator files as well for a more robust test case

Diagram what makeflow container looks like

We settled on the makeflow container (maybe rename) running a daemon process that gets jobs and runs one or more makeflows. Create a diagram showing these interactions and making it clear how the process runs a job.

Implement first version of complexify web services

Needed services:

Submit job
Get job status
Retrieve outputs

I am not sure if the first version needs to take uploads or not or if we want to add that later. This may need to expand depending on how / if we want to handle users to start or if this will just be for us to begin with.

Implement a test case for Caryophyllales

Implement a test case for the complexify framework. This test case starts with a list of species names to be included and will need to incorporate data from a variety of occurrence sources. Models should use maxent or rare species modeling. Utilize masks and end with a set of SDMs grouped by taxonomic groups.

Document how to run a makeflow "by hand" through the system

A version zero should be able to run a makeflow by hand through the worker containers and produce a result.

Document how to start up the system and run a demo makeflow

Document all tasks available in Complexify v1

Add documentation for all available tasks currently available so we can start putting together test job configurations

Create a biotaphy webinar infrastructure diagram

We may need to iterate over this a few times but, as of now, we plan on producing a docker container or set of containers that are accessible via a web service so that someone could connect to them from an R or Python client library.

Some likely components to this diagram:

R client
Python client
Flask layer
lmpy layer
biotaphypy layer
Storage volume?

Document all data types available in complexify via API

Document all of the data types we are exposing via our tasks. This includes primative types and derrived types and should include at least all of the input and output types we are exposing via our current tasks and may include some future planned tasks

Settle on a version 1 impelementation strategy for Complexify

Current thoughts are at:
https://github.com/specifysystems/complexify/blob/main/docs/async_containers_design.md

I think that this gets us pretty close to a version 1 plan. There may be some iterating over the makeflow and controller containers but the worker, catalog server, database, and web server containers are all pretty set I believe.

Implement a worker container

This container should include all of our computational tools (lmpy, lmtools, biotaphy, syftr) in whatever form they are currently in (but easy to update). It should probably run a daemon process like the work queue worker factory

Update container diagram and documentation

Note where containers are evolutions of old Lifemapper components (like MattDaemon). Update documentation and add new container types as the previous iteration is being split.

Implement a catalog server container

This container should be very simple and just include the catalog server to facilitate computations

Create a demo makeflow for Heuchera SDMs

The first demo makeflow we need is one that can run SDMs from known occurrence data. Use Heuchera as a test case and generate a demo makeflow that cleans the occurrence records before running them through maxent.

Create diagrams and documentation for archive / syftorium workflow

There will most likely be at least two workflow types. A first workflow will process the input occurrences, altering them slightly, and creating one more more output files of grouped (probably by species) occurrences as well as a file, or files, indicating which species are present in those occurrence files. This is necessary so that we know what files will be generated by the various tasks.

The other workflow(s) will process the occurrence data in groups to assess the records, create SDMs, create species manifests and syftorium files for cataloging as well as create an output package.

It is important to also figure out where we can include multi-species processing in the workflow.

Create a diagram showing interactions between containers and tools

Try to capture the general purpose of each of the containers and how they operate. Make sure that it is clear how the worker container works by running work_queue_factory which runs work_queue_workers and how they determine what to do.

Create a Complexfy API client

This client should match the API as it currently stands. I don't know if this will ever be widely distributed but there should still be some protections and forethought in case we do decide to distribute for some reason (in entirety or just a portion).

Create a multi-species demo makeflow for Heuchera models

This demo makeflow is more for testing complexify than for testing lmpy scripts. Aimee can help debug any problems found within lmpy or logic errors encountered. This establishes a workflow to build off of for multi-species operations and statistics.

Add pre-commit hooks to complexify repository

Add whatever hooks we think that we need for the complexify repository to ensure we have acceptable code quality and consistency as well as automated testing and any appropriate CI / CD

Implement a makeflow container

There may be a few questions left to answer before this can be done, but at its core, this container needs to run a makeflow or multiple makeflows.

It may need to do some job configuration processing and do some things outside of makeflow as well.

It also needs to be determined if this is a "one shot" (or single run or whatever else it is called) container or if there will be some daemon process so it continuously runs.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.