wytamma / beastiary Goto Github PK

Real time and remote MCMC trace monitoring with BEASTIARY.

Home Page: https://beastiary.wytamma.com

Python 17.59% HTML 2.49% JavaScript 2.75% Vue 41.21% Shell 0.29% TypeScript 12.67% CSS 0.23% OpenEdge ABL 22.77%

beast beast2 phylogenetics bayesian-inference revbayes mcmc monitoring

beastiary's Introduction

Beastiary is designed for visualising and analysing MCMC trace files generated from Bayesian phylogenetic analyses. Beastiary works in real-time and on remote servers (e.g. a HPC). The goal of Beastiary is to be a beautiful and simple yet powerful tool for Bayesian phylogenetic inference. A beastiary (from bestiarum vocabulum) is a compendium of beasts.

Paper: Wirth & Duchene (2022)

Documentation: https://beastiary.wytamma.com

Source Code: https://github.com/Wytamma/beastiary

Install

pip install beastiary

Use

To start beastiary use the beastiary command.

beastiary

For more information read the docs.

Cite

Wytamma Wirth, Sebastian Duchene, Real-Time and Remote MCMC Trace Inspection with Beastiary, Molecular Biology and Evolution, Volume 39, Issue 5, May 2022, msac095, https://doi.org/10.1093/molbev/msac095

beastiary's People

Contributors

Stargazers

Watchers

Forkers

esteinig wook2014

beastiary's Issues

Mock out dashboard layout

Create the basic layout for the dashboard.

Add mypy

Move calculateStats to worker thread

calculateStats is the main blocker and should be moved to the worker. Also need better management of workers so you don't create 1 per calculation. E.g. create a worker pool.

Add drag and drop option for static files.

Add an option to drag and drop files into Beastiary. The files will not be able to be tracked, however, it would allow you to statically examine files. The file would just load into the browser so no data transfer is required.

WebSockets for Continuous Data Transmission

Replace polling with web sockets e.g.

from fastapi import FastAPI, WebSocket

app = FastAPI()

@app.websocket("/data")
async def data_websocket(websocket: WebSocket):
    await websocket.accept()
    while True:
        data = await generate_data()  # Your function to get real-time data
        await websocket.send_json(data)
        await asyncio.sleep(1)  # Adjust based on desired update frequency

WRITE FRONTEND TESTS

Add --watch to cli

Add an option to watch a folder. This will add files to beastiary when they are added to the folder. Might also need to handle patterns e.g. *.log

Auto update active trace

Using polling to automatically get new samples for the active trace

It's very blue 🔵

I like the blue, but there is a lot of it...

Add --host to cli

Add a --host or --share cli arg that sill proxy beastiary through a server so the analysis is publicly available.

Add trace when pressing enter

If you add a trace via the modal and press enter the page refreshes. Need to block the default form submission.

Token login fails on first try

The auto login fails the first time you open the webapp

Select multiple traces

Add the ability to select multiple traces

Parameter statistics

Add statistics for parameters (mean, 95% CI, etc)

Cumulative Ess plot

Add cumulative ESS plots and run time predictions based on target

joint-marginal plots

Add joint-marginal plots

SAWarning

SAWarning: Identity map already had an identity for (<class 'beastiary.models.sample.Sample'>, (116,), None), replacing it with newly flushed object. Are there load operations occurring inside of an event handler within the flush?
db.commit()

Fix docs website

The custom domain for the docs is reverting to my blog url...

Colour order issue with histogram plot

better multi chain support

Convergence diagnostics
Easier selection of multiple traces across log files

add tabs for different plots

Tree metrics

Mixing and convergence in tree space is an important requirement for effective Bayesian phylogenetic inference. The Beastiary backend should compute real-time convergence diagnostics (similar to RWTY) for assessing the adequacy with which the MCMC has sampled the phylogenetic tree topology space.

ADD TESTS FOR BACKEND

add file explorer

For security reasons browsers don't allow access to file paths. I think that means we can't have a drag and drop interface (although this wouldn't work on remote servers anyway). I think the best option would be to have a file explorer interface like the one in the vue ui.

Submit logfile at start up

Add the ability to pass beastiary a logfile path at start up, so that it is automatically added.

beastiary --logfile '/path/to/log'

Move byte caching to the trace model

Currently the byte of the last row read is used to keep track of the location in the log file. However, the byte could be tracked on the Trace model to save adding it to the last Sample read.

Add burnin

Add the ability to specific the amount of burnin

Sort files in file explorer

ESS

Ess calculation for each parameter

configurable options (reading log files)

Add CLI configurations to parse any type of delimited file.

Add —version

import pkg_resources
version = pkg_resources.get_distribution('beastiary').version

There's no error if the log file doesn't exist

When you use the ADD button to add a log file if the file doesn't exist beastiary should display an error.

Add HPC guide

Optimisation for large datasets

Beastiary will freeze if the dataset being loaded is very large (100,000 samples). Is there a way to optimise the loading time of large datasets?

use webgl for plotly trace

The trace is laggy with 100K points. Change to webGL

Efficient Data Serialization

Binary Serialization: binary formats can significantly reduce the payload size. Consider using formats such as Apache Arrow or custom binary payloads for trace data.

Incorrect parsing of parameter names if space is present.

I found that if spaces are present in a parameter name, the name will be split and the additional words will be used to label the subsequent parameters.

In the attached log file and screenshot, the parameter "Current Tree" is split by Beastiary, with "Current" labelling the correct column and "Tree" labeling the adjacent column (which should be "Location.rates.California.NewYork"). All of the data in the log file is present, but the labels are incorrect from that point on.

sample.log