aplbrain / bossphorus Goto Github PK

View Code? Open in Web Editor NEW

1.0 18.0 1.0 265 KB

A BossDB-like volumetric datastore in Rust

License: Apache License 2.0

Rust 100.00%

rust volumetric-data 3d bossdb

bossphorus's Introduction

Bossphorus implementation in Rust

This is a partial reimplementation of the BossDB REST API in Rust.

Why?

bossphorus simplifies data-access patterns for data that do not fit into RAM. When you write a 100-gigabyte file, bossphorus automatically slices your dataset up to fit in bite-sized pieces.

When you request small pieces of your data for analysis, bossphorus intelligently serves only the parts you need, leaving the rest on disk.

Feature Parity

See Feature Parity for more information.

Disk Usage

Bossphorus caches cuboids in the uploads folder that's created in the current working directory. Currently, it will cache up to 1000 cuboids in this folder. The least recently used cuboids are removed when the cuboid limit is reached.

Configuration

Environment variables have precedence over the Rocket.toml config file.

Environment Variables

BOSSHOST: Sets the Boss DB host
BOSSTOKEN: Token used for Boss auth

Rocket.toml File

bosshost: Sets the Boss DB host
bosstoken: Token used for Boss auth

Defaults

In absence of an environment variable and value in the Rocket.toml file:

bosshost = "api.bossdb.io"
bosstoken = "public"

Development

Blosc must be installed manually via a package manager to build. SQLite is required, but it included with MacOS by default.

For MacOS:

brew install c-blosc

For Debian based Linux distros:

sudo apt-get install libblosc-dev sqlite3

For RPM based Linux distros:

sudo yum install blosc sqlite

Due to use of the Rocket web server crate, the nightly Rust toolchain must be used. You can set this as your project default with:

rustup override set nightly

Releases

You can build an optimized release with:

cargo build --release

The binary will be at target/release/bossphorus.

License

bossphorus's People

Contributors

Stargazers

Watchers

Forkers

fossabot

bossphorus's Issues

[ChunkedBloscFileDataManager] File IO parallelism

Right now, each cuboid on disk is read in series, but we know both the shape as well as the filenames a priori. This could theoretically be parallelized, though it'll require some cleverness with the ndarray API, which I don't believe supports parallel reads/writes by default.

Allow user to configure the DataManager stack they want to run

I'm thinking we should make the binary configurable in a file, but with sane defaults. Something like:

config.yml

port: 8090
cache: "LRU"

usage_manager: "console"

data_managers:

    - ChunkedFileDataManager:
        upload_path: "uploads/"
    
    - BossDBRelayDataManager:
        host:        "bossdb.io"
        protocol:    "https"
        token:       "public"

And callable with bossphorus --config config.yml

In particular, I think being able to specify which data managers are used and in which order is something users may want to be able to do at runtime, which is perhaps a bit more complicated than the current env-variable technique allows.

@movestill thoughts?

Implement a cache cleanup and cache-maintenance strategy

Right now, if you create the following DataManager stack;

ChunkedBloscFileDataManager → BossDBRelayDataManager

...then as cache-misses in the ChunkedBloscFileDataManager are fulfilled by the BossDBRelayDataManager, they're saved to disk and returned to the client.

There currently exists no mechanism by which to clear the files from ChunkedBloscFileDataManager, which means that the cache will grow to infinity (or until your drive is full, whichever is sooner).

There should be a cache cleanup strategy, but I believe it makes sense for there to be several strategies which a user can choose between. Perhaps options like:

LRU
FIFO
Most distant (Euclidean) from LRU (??)

Even if we just implement one of these to start with, might be smart to leave space and an abstraction layer to allow for multiple in the future.

support jpg filmstrip interface for neuroglancer

I vaguely remember there being a particularly nuanced reason why old bossphorus wasn't compatible with neuroglancer...

Support for annotation u64, and image u16 channels

Right now we support all imagery (u8) channels. I think all that's really required here is Generic-izing the u8 code to take an arbitrary dtype, though there may be a few libraries that don't support other datatypes.

Going to use this Issue to document these conflicts as I encounter them so that we can address them with all the info we need.

[ChunkedBloscFileDataManager] Save data to disk in a SEPARATE thread as returning data to the user

Right now we perform the following data flow upon a ChunkedBloscFileDataManager cache miss:

Ask the next layer for data
Save the retrieved data to disk
Return a copy of that retrieved data

In order to improve performance and save roundtrip time on that initial request, we should perform step 2 (saving the data to disk) in parallel with returning it to the user. Or, rather, spawn a routine to save data to disk in parallel (and allow it to finish even once the HTTP request is closed).