Coder Social home page Coder Social logo

sahahn / bpt_app Goto Github PK

View Code? Open in Web Editor NEW
0.0 3.0 1.0 2.39 MB

The Brain Predictability toolbox app (BPt_app) is designed to offer a GUI experience to the base BPt python library.

Python 23.44% Hack 1.94% JavaScript 66.31% PHP 3.11% CSS 4.88% Dockerfile 0.27% Shell 0.05%

bpt_app's Introduction

Welcome to the Brain Predictability toolbox (BPt) Web Interface.

Intro

This project is designed to be an easy to use user interface for performing neuroimaging based machine learning (ML) experiments.

This is an early beta release version, so please be mindful that their will likely be some rough edges. Please open an issue with any errors that comes up!

The main python library (that serves as a backend for this application) can be found at: https://github.com/sahahn/BPt.

Installation


As it currently stands, BPt_app is designed to be created and run in a docker container. Please follow the below instructions:

  1. Make sure you have docker installed on your device. See: https://docs.docker.com/get-docker/

  2. Secondly, we make use of docker-compose to make the installation process overall more painless. On some systems with will already be installed with docker, but on others you may need to perform additional steps to download it, see: https://docs.docker.com/compose/install/

  3. Next, you will clone this repository to your local device. On unix based systems, the command is as follows:

    git clone https://github.com/sahahn/BPt_app.git
  4. An essential step to using to using the application is the ability to have the application access your datasets of interest. Importantly, adding datasets can be done either before installation or after.

    1. Datasets are saved within BPt_app in the folder 'BPt_app/data/sources'

    2. Datasets must be compatible with BPt, this requires the user to format the dataset accordingly, before adding it to the sources directory. Specifically, datasets are comprised of a folder (where the name of the folder is name of the dataset), and within that folder 1 or more csv files with the datasets data. For example:

    
     BPt_app/data/sources/my_dataset/
     BPt_app/data/sources/my_dataset/data1.csv
     BPt_app/data/sources/my_dataset/data2.csv
     BPt_app/data/sources/my_dataset/data3.csv
     
    1. Each file with data (data1.csv, data2.csv data3.csv above) must also be formatted in a specific way. Specifically- all data files must be comma seperated and contain only one header row with the name of each feature (or an index name / eventname - described in the next steps). For example (where note the \n character is ussually hidden in most text editors):
    
     subject_id,feat1,feat2,feat3\n
     a,1.4,9,1.22\n
     b,1.3,9,0.8\n
     c,2,10,1.9\n
     
    1. Each file must have a column with a stored subject id. Valid names for this subject id column are currently: ['subject_id', 'participant_id', 'src_subject_id', 'subject', 'id', 'sbj', 'sbj_id', 'subjectkey', 'ID', 'UID', 'GUID'] As long as a column is included and saved under one of those names, then that column will be used iternally as the subject id. In the example above, 'subject_id' is used as the subject id column.

    2. Next, each data file can optionally be stored with a valid 'event name' column. This column should be stored in the same way as the subject id column, and is used in cases where the underlying dataset is for example longitudinal or any case where a feature contains multiple values for the same subject. Valid column names for this are currently: ['eventname', 'event', 'events', 'session_id', 'session', 'time_point', 'event_name', 'event name'] Within BPt_app, this column lets you filter data by a specific eventname value. Note eventnames cannot contain the reserved string ' - '.

    3. A few general notes about adding data to BPt:

      • You may add multiple datasets, just with different folder names
      • Data will be processed by BPt upon launch of the web application, this means that if you add a new dataset once the application has already been launched initially, that dataset will be processed upon the next launch of the application. Re-loading the web page can trigger the app to look for changes to the backend data.
      • If a feature / column overlaps across different data sources, e.g., data1.csv, data2.csv, then that feature will be merged across all data files, and saved in a new file. Merge behavior is if new values are found (as indexed by subject id and eventname overlap) they are simply added. If overlapped values are found, the newer value for that subject_id / eventname pair will be used.
      • You can change or delete data files or datasets at will, this will just prompt BPt to re-index that dataset and changes will be made accordingly.
  5. Now, to install the application, navigate within the main BPt_app folder/repository and run the docker compose command:

    docker-compose up

    This will take care of building the docker image and application. There are a number of different tweaks here that you can make as desired, some of these are listed below:

    • You may pass the flag "-d", so "docker-compose up -d", which will run the docker container in the background, otherwise the docker instance will be tied to your current terminal (and therefore shutdown if you close that terminal). See https://docs.docker.com/compose/reference/up/ for other simmilar options.
    • Before running docker-compose up, you can optionally modify the docker-compose.yml file. One perhaps useful modification is to change the value of restart: no, to restart: always what this will do is restart BPt_app whenever it goes down, e.g., when you restart your computer. Otherwise, you must start the container manually everytime you wish to use BPt_app after a restart.
    • You can use the command 'docker-compose start' from the BPt_app directory to restart the container
    • Likewise, you can use the command 'docker-compose stop' to stop the web app
  6. After the container is running, navigate to http://localhost:8008/BPt_app/WebApp/index.php this is the web address of the app, and should bring you to the home page!

Once up and running


The most useful commands to know once up and running are those used to start and stop the container (as mentioned above) with docker-compose, and also updating. There are two main ways to update. A faster temporary update (where the update will persist across stopping and starting the docker container, e.g., docker-compose start and stop, but will be deleted if docker-compose down is ever called). To call this faster temporary update, naviagte to the BPt_app folder and run the command (Note: the container must be running, e.g., docker-compose start called):

bash update.sh

If instead you would like to do a full and lasting update, this involves re-building the whole container. It will also call git pull on your main directory, looking for changes in the docker files. To run this full update, run within BPt_app:

bash full_update.sh

bpt_app's People

Contributors

sahahn avatar

Watchers

James Cloos avatar  avatar Pierre avatar

Forkers

harel-coffee

bpt_app's Issues

Add control for merge behavior

Current merge behavior is fixed as computing the inner overlap of subjects across different loading. Alternatively, could give the user control to set any non-overlapping subjects data to NaN and still keep those subjects, i.e., outer merge.

Develop more in-depth help pages

An interesting idea would be to add in a separate set of pages which could be filled in with more advanced descriptions of certain things. E.g., descriptions on considerations for cross-validation w/ pictures or whatever.

Better ensemble support

Better integration / support / controls + options for integrating Ensembles. E.g., DES split, stacking regressor, etc...

Should have automatic detection of base model type, and show relevant options. Should be able to specify the model responsible for stacking for example.

Quick copy pipeline

It would be nice to have a feature where you could select to make a copy of an existing pipeline, e.g., similar to how param dists are setup, could be helpful when only trying to change a few small things.

Automatically detect if parameter search is needed

Change behavior s.t., parameter search starts as None, but when any set of params requiring a search or Select is specified, automatically change the search to RandomSearch.

Could alternatively change it to be a warning, i.e., cause a visual change, and have that pipeline not appear as valid.

Support for submitting jobs to remote clusters

Add in support for integrating remote clusters, i.e., ideally, you could set it up to submit jobs to a cluster. Implementation would likely be similar to VACC_EXT setup, but could involve something different (e.g., maybe running the full docker setup + app on an interactive slurm job??? Can you submit jobs from within interactive slurm jobs?)

Only show imputer if any NaN

Along the lines of automating things that people shouldn't need to think about, add in the automatic hiding of imputers if no NaN data is loaded.

Logs on start screen

Add an explicit log for data loading / proc changes that can be viewed instead of the waiting screen. Also, change the start screen to look a bit better, especially since it will pop up first every time. So maybe have it start with something like: "Checking for changes to the underlying dataset!"

Caching settings

Add the different caching options to the settings page. Make more transparent plus add optional storage limits.

GUI upload datasets

Create a nice GUI interface for uploading custom datasets - i.e., with more controls and flexibility for comma-separated vs tab-separated, options for which column is the subject id, and which is event name.

Related, would be a GUI screen to see what datasets are available and maybe delete some?

Better logs for data loading

Have in the logs it tell you where the latest ML object from a save is stored. Or rather, should have an option somewhere to download the pickled ML object, so one could for example just use the GUI for data loading.

Multiple logins

Investigate how to handle multiple tabs/windows open from the same user? Right now, this behavior will likely just break things in unexpected ways. Not sure the best way to fix it, seems like it would require more frequent communication with the server / re-writing a lot of how things are currently stored.

Saving / importing / sharing pipelines

Add in support for saving pipelines both in version across multiple users, and just between projects. e.g., user should be able to "import" a pipeline from a different project into their current one.

Build in explicitly a single / multiuser mode

A multi-user mode will eventually be put on the DEAP. This means that a number of settings need to be quickly enabled/disabled based on which mode is being run.

This includes:

  • Linking to a different Sets page
  • Removing the choice of dataset
  • Changing how variables are selected
  • Changing the helper text in some cases
  • Changing how data is loaded
  • Removing the load database check + associated pieces
  • Likely more...

Loading set variable display issues

Figure out a fix for what to display when loading a single set variable, when the whole set is loaded, e.g., if filter set on the whole set, then the log will be flooded with before + after values for every variable in the set.

Comparing two (or more) completed jobs

Interface/option to compare the results from two different runs. Maybe some requirement like they both need to be on the same target, and either both Evaluate or Test?

Would involve mostly generating tables? Or some plotting? Not sure.

Re-visit param caching

In some cases, the current implementation might not be working 100% correctly. Also should change it so param caching is dataset-specific rather than global~

Deleting temp jobs

Temp jobs should be somewhat regularly deleted, to avoid building up space, and also in cases when silent errors occur. One way of doing this is how validation jobs work, where if the output already exists at the start of the job, it is deleted. This won't work for jobs with names though

Caching w/ data loading

Changing the event shortname still seems to break data loading caching. I.e., will still load an incorrectly cached previous copy.

HTML entrypoints

Right now, most of the logic is handled in javascript. Could be helpful in the future to add meaningful entry points, e.g. /project_name/page

First time loading bug

On first time loading, will try and get the ML_options before setup_info.py has been run. Need to change the order of operations so that ML_options are not loaded until after setup has been called.

Add support for Feature Importance's

This constitutes a fairly large effort and might involve to some extent changes on the BPt side of things. Thinking now that the place to specify what feature importances to calculate should be on the Evaluate tab. In whatever ways possible, the options should be "smartly" generated, i.e., so not displaying irrelevant params. Then also involved is added a section on the results for each job to view the feature importances in different ways.

More info on jobs in results

There should potentially be more entries in the main table, maybe Elapsed? But also, when opening a job, it should allow the user to see more detailed information on how that job was run, e.g., what pipeline was used, etc...

Show 'X' not preserved on new set search

When searching for a new set, if the user had previously changed the DataTable to display more than 5 entries, this choice will be refreshed back to 5 upon a new search. Should look into propagating the user choice to a new search.

Filter by percent and outlier

Right now the UI makes it seems like you can filter by both outlier + std, make sure that the UI reflects the actual behavior / decide on what might be the best behavior. Also make sure it is well documented in the help string

Visual feedback on pressing save projects

When clicking save projects, it would be nice to have the button change for a second, just to provide some visual feedback that clicking the button actually did something

Add settings dataset name

On the settings page, for adding short event names, include an indicator for which dataset has that event name (how to handle cases where multiple datasets have the same one?)

NaN threshold for loading sets

For loading, sets add in support for a NaN threshold like is currently implemented in base BPt in Load_Data. This would also involve improved support for printing information about patterns of NaN.

Improve look of Set's page

Improve look and feel of Set's page, i.e., right now variable names can easily be too long. The top dataset selector is a linky clunky, etc...

Datatables sometimes disappearing

When switching between project tabs, sometimes when both tabs have a datatable loaded, going back to the first tab (i.e.,a loaded set) will cause that table to appear empty until re-drawn (by switching pages or searching)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.