Coder Social home page Coder Social logo

databio / lolaweb Goto Github PK

View Code? Open in Web Editor NEW
16.0 5.0 2.0 434 KB

An interactive Shiny web application for genomic interval enrichment analysis using LOLA.

Home Page: http://lolaweb.databio.org/

R 72.37% CSS 1.63% Shell 0.49% HTML 24.63% Dockerfile 0.88%
shiny-server r docker docker-swarm

lolaweb's Introduction

LOLAweb

Docker Image CI

LOLAweb is a web server and interactive results viewer for enrichment of overlap between a query region set (a bed file) and a database of region sets. It provides an interactive result explorer to visualize the highest ranked enrichments from the database. You can access the web server at http://lolaweb.databio.org.

This repository contain the shiny app source code and Docker implementation for LOLAweb.

Shiny app

LOLAweb is implemented as an interactive Shiny app. You can run this app locally by following the instructions in the appfolder.

Docker

The ghcr.io/databio/lolaweb container is based on the ghcr.io/databio/shinybase container, which you can find in its GitHub repository or in the GitHub Container Registry.

build the container image yourself

  1. Clone this repository
  2. Build locally using Docker. Run this command from the same directory as the Dockerfile.

docker build --no-cache -t lolaweb .

Or pull the container image:

docker pull ghcr.io/databio/lolaweb

The container image itself is hosted in the GitHub Container Registry: https://github.com/databio/LOLAweb/pkgs/container/lolaweb

Container volumes and reference data

LOLAweb needs access to a few folders where it can store results or logs, or access necessary files like the database. To handle this, we've set up the app to look for two shell environment variables:

  • $LWREF, for LOLAweb reference data, which may be read-only
  • $LWLOCAL, where local results can be written.

To run the LOLAweb container (locally or on a server), you need to set these environment variables (for example, in your .bashrc or .profile file. These variables will be injected into the container when it is run.

For example, set these variables to two different paths if you like. Or if you keep all five subfolders together in the same path, set these variables to the same value.

# Example locations. Set to match your environment
export LWREF='/home/subdir/lola/'
export LWLOCAL='/var/things/loladata/'

LOLAweb will look at the value in $LWREF for the reference data. This folder should have subfolders called databases, universes, and examples. In each of these subfolders are another layer of subfolders for genome assemblies

LOLAweb looks for $LWLOCAL to have two subfolders: cache, and shinylogs. This is where the app will write results and log files. If running LOLAweb on a server, be sure these directories are writeable by the Docker process.

The following instructions demonstrat how to download and configure the LOLAweb data directories for a minimal example using hg19 reference data:

## assign env vars for data path
## NOTE: must include trailing /
LWLOCAL="/path/to/local/data/"
LWREF="/path/to/reference/data/"

## change reference data dir
cd $LWREF

## create dir for databases
mkdir -p databases
## create examples and universe dir
## NOTE: these must include subdirs named corresponding to appropriate ref genome
mkdir -p examples/hg19
mkdir -p universes/hg19

## download example universe and user set
curl http://cloud.databio.org.s3.amazonaws.com/vignettes/lola_vignette_data_150505.tgz | tar xvz

## move example universe and user set files to hg19 dir
mv lola_vignette_data/activeDHS_universe.bed universes/hg19/.
mv lola_vignette_data/setB_100.bed examples/hg19/.

## clean up
rm -rf lola_vignette_data

## download databases
curl http://cloud.databio.org.s3.amazonaws.com/regiondb/LOLACoreCaches_170206.tgz | tar xvz
curl http://cloud.databio.org.s3.amazonaws.com/regiondb/LOLAExtCaches_170206.tgz | tar xvz

## move databases to appropriate spots
mv scratch/ns5bc/resources/regions/LOLACore databases/Core
mv scratch/ns5bc/resources/regions/LOLAExt databases/Extended

## clean up
rm -rf scratch

## change ot local data dir
cd $LWLOCAL

## create placeholder dirs for cache and shinylog
mkdir -p cache
mkdir -p shinylog

Run the LOLAweb container locally with reference data:

## run the docker image
## NOTE: this run command uses image pulled from ghcr.io/databio/lolaweb
docker run -d \
  -p 80:80 \
  -e LWREF=$LWREF \
  -e LWLOCAL=$LWLOCAL \
  --volume ${LWLOCAL}:${LWLOCAL} \
  --volume ${LWREF}:${LWREF} \
  --volume ${LWLOCAL}/shinylog:/var/log/shiny-server \
  ghcr.io/databio/lolaweb

Open a browser to:

http://localhost

Running a dev container

You could also run the dev version of the container by pulling ghcr.io/databio/lolaweb:dev. This will retrieve the dev tagged image from the GitHub Container Registry. Just add :dev to the container name at the end of the docker run command above.

Running multiple LOLAweb containers simultaneously with Docker Swarm

For the typical use case of an individual user, a single running container will suffice. But if you need to set up an enterprise-level LOLAweb server that can handle concurrent users, we've also made that easy by using Docker Swarm. This is how we run the main LOLAweb servers, and you could do the same thing if you want your own local implementation. Docker Swarm is a technique for running multiple instances of the same container. Read more about how to set up your own swarm.

Troubleshooting

The LOLAweb Docker implementation includes a mechanism to write Shiny Server logs to $LWLOCAL/shinylog. These log files may be useful when troubleshooting problems with running LOLAweb via Docker. They include errors with R processing as well as information as to whether the Shiny Server process was killed due to resource limitations (i.e., not enough RAM allocated to Docker daemon).

For additional support with the LOLAweb Docker implementation, please file a GitHub issue.

lolaweb's People

Contributors

nmagee avatar nsheff avatar vpnagraj avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

lolaweb's Issues

decouple processing

moving forward LOLAweb could be decoupled so that there is a "run" view and a "results" view

in principle the app will behave the same (after execution of run, users is forwarded to results) ... but (among other things) this will make managing display of ui elements much more intuitive

docker base

instead of starting from the R base image in the docker file, we should consider a few alternatives:

  • using a bioconductor container?
  • creating our own image and using that as the base image?

As it is, the way I 'm seeing this, it's is a 25 minute travis build for deploy because it's re-installing lots of stuff from scratch every time. We could avoid this by having the LOLA base image start from something a little more refined.

incorporate `plotFeatureDist`

the app should include plotFeatureDist() from GenomicDistributions

this will require:

  1. adding a new object to result list that's cached
  2. figuring out parallelization of the runLOLA and featureDistribution computation ... see SO post for example

Email results

I think it might be nice to have an option (as in "optional"!) to enter your email address when you run a new analysis so that you can post a message to that address with a link to the results. Either that, or have user accounts with saved results.

This helps to keep track of multiple results and gives some robustness to interface errors happening after submission (which you will never be able to avoid completely in requests that may run for an extended period of time).

sample results link on front page

with the change to tabs the 'sample results' link got moved to the "results" tab and is therefore no longer on the first page... that's fine, but I think we really need a link to sample results on the first page for first-time users. otherwise, they won't realize they can get there without runnign a complete analysis. nobody will think to click the 'run' tab.

we should add that link back in so it's in both places.

staging and dev

@nmagee and @vpnagraj

When we last talked, we thought about having master and dev branches with separate docker deploys. I now think your staging comment makes more sense; so we have dev for development, staging pushes that into a test docker environment, and then master is the real deploy.

that makes a lot of sense to me now, even for this.

container bloat

Not mandatory for the article submission, but if we can get our container to lose a little bit of weight in the nearfuture, that would be great.

We shot from 1.4GB to nearly 5GB yesterday, so it would make good sense to pull out any reference data/DB's that we can, and put them into the NFS share in one centralized reference location like the other reference data.

container issues

I'm getting a 'Welcome to nginx' page at our URLs right now. The servers are running but there must be something wrong with traefik, I'm guessing.

faviocon

Tiny user-interface suggestion: can we make the shiny server display the lola logo (just the 'button') as a favicon?

Run summary stats

Along with the UI decouple (#30) we should have the results page show some summary stats on the run:

  • start time?
  • end time?
  • elapsed time?
  • results URL
  • filename uploaded
  • genome selected
  • universe selected
  • database selected

anything else?

add 'thinking' icon to visualization page

for the interactive viewer, it takes about 5-7 seconds to load (even when clicking on "sample results" for example).

I think we should use the gear icon here as well. After the "LOLA Results" heading but before the plots are displayed, I mean.

single-user multiple sessions

it appears that a single user can only have multiple sessions in different browsers. can we relax this constraint?

bug in uploads

there is another bug in uploads; it is requiring that there be exactly 6 columns in the upload bed file.

reference universes

universes are actually relative to a reference assembly, like databases.

hence, we should move the selection of assembly to the left, with the input, and this will need to populate options select boxes for both universe and database.

in other words, activeDHS universe will not work with an mm9 database.

old sample results get broken

probably you realize this, but the sample results will stop working if they are old, when you update to the new way of wrapping the results into a list; hence, right now on dev, the link to sample results is failing.

I think this won't be a problem in the future, but once we're settled we need to make sure we preserve backwards compatibility so that old results can still load, even if we add a new figure or something. as long as they're in a list it should be fine; just don't display the image if the cache doesn't contain result for that image (in other words, if the cache was generated for the compute for that image was included in the code)

for now we just have to make sure the "Sample Results" link points to a new-style result when that is ready.

processing time streamlined

If we can load all the libraries in a global area, then what stops us from also doing this with the database caches, so they don't need to be re-loaded in each container?

links

let's add links to my lab page and to the source code. maybe let's change Powered By SOMRC to:

Powered By SOMRC. A project from the Sheffield lab at the University of Virginia. Source code at GitHub.

or some variant thereof

text status updates

The new text updates are working great, but there are a few issues:

  • "Calculating Fisher scores..." remains after job is complete.
  • notices now appear in 2 locations: top right ("calculating region set enrichments..." and under the green button. let's consolidate
  • notices are too long and go off the screen if on narrow screens
  • should we move the runLOLA button? it seems strange hanging out over on the right side away from everything else.

data mount point

Right now the data (e.g. universes, userSets, etc.) are housed as subfolders right in the app folder. I propose we make this a config variable somewhere, or something.

Also we might make this a single external folder so we don't have to mount 5 volumes independently on the docker container (which are all just subfolders within 1 folder). Maybe?

result table not rendering

currently the JS datatable at the bottom of the results view renders locally but not on dev:

http://dev.lolaweb.databio.org/?key=VWQN3ZC5HFD92EK

Chrome developer tools is reporting the following error for datatables.js:

Failed to load resource: the server responded with a status of 404 (Not Found)

can't remember if i saw this before the navbarpage change or not ... but i'm wondering if this is an issue with websockets being disabled?

dockerfile duplication

Right now we have a different dockerfile for each branch (master, staging, dev). this caused me some confusion; is this the way we want it?

Shouldn't there be only a single docker file, and then the branch it's on dictates which container it's building?

hg38 -- disconnected from server

I keep getting "Disconnected from the server" errors with the hg38 reference.

This happens when I try to run LOLA: I switch the reference to hg38 and upload my BED file, keeping all other options at default, then click "Run". I'm told it's loading the data then switches to "timing stopped" immediately and displays the "disconnected from server" error.

It works with the same file and hg19, so I suppose something's off with the reference caches.

Overall, more informative error messages might help from a usability perspective.

Errors for missing pieces should suggest rerunning the cache.

What happens if someone who ran LOLAweb with an old version tries to load that cache with an updated LOLAweb?

Well, the old versionw ill lack a few components from the new. If you run a cache in an older version of LOLA web, then try to load it in a new version, missing components give you an error like this:

Error: An error has occurred. Check your logs or contact the app author for clarification.

I think this should be a more informative message, if the requested object is simply not found (it could suggest re-running the cache for example).

If there truly was an error then it's fine to say that, otherwise, I think that's not the right message to send.

invert sorting for rank columns

The sorting for any Rank-based column should be inverted, so better hits show up on top, instead of on bottom

(try sorting by support, then by maxRank)

server disconnects

After leaving a connection idle for several minutes, the connection to the server times out. This is not a major problem but there's probably some config option we can change to make these timeouts a bit longer.

Attach SOMDEV1 / DEV2 to Qumulo

Reach out to RCI and discuss options around mounting specific Qumulo shares to DEV1/DEV2. This will help us consolidate pools of reference / universe data.

default cutoffs

Right now the default cutoffs are set such that there are no hits present in any plots.

Can we work out a way to get a reasonable cutoffs such that there will be a few hits present by default? Maybe, 25 hits or something?

Suggestion: take the .25 quantile as the default for both odds ratio and support; and then just rank and take set the p-value cutoff to score on number 25. then we're guaranteed to show exactly 25 by default. in other words, 2 should be loose, hard-coded default cutoffs, and the other should be a dynamic based on number of items.

Tooltips to help with interpretation

In the interactive results interface, I think we should also add some tooltips.

"Select Collection":

LOLA databases are made up of one or more sub-collections of region set. Using this drop-down, you can filter your plots and tables to show only the results from one of these collections at a time.

"Max Rank Cutoff"

These sliders can be used

Let's group all 3 plots under a single heading called "LOLA Results", and make the individual plot headings smaller; then we can add a tooltip question mark that says:

These barplots show the highest-ranking region sets from the database. The higher scores indicate more overlap with your query set. The results are scored using 3 statistics: Support is the raw number of regions that overlapped between your query set and the database set. LogPVal and LogOdds are the results of a Fisher's exact test scoring the significance of that overlap.

We rank each region set from the database for each of these 3 scores, and you can see the raw scores and the ranks in the table below. You can also see the maximum and mean ranks across all 3 categories.

longer-term archival of results

Feature suggestion by @fhalbritter

I think it would be good to have an option for long(er)-term archival of selected results so that people can directly link to dynamic results pages from publications and websites.

Help text for app interface

Make the question marks lead to a popup or something that displays text, instead of going to the R package docs. These are meant to be like little tooltips or something. Here's the text for each:

The User Set is your set of genomic regions that you want to test for overlap with the database. Upload a file in BED format (really, it just needs the first 3 columns: chr, start, and end). You can also drag and drop multiple files and they will be analyzed simultaneously!

Universe:

The universe is your background set of regions. You should think of the universe as the set of regions you tested for possible inclusion in your user sets; or, in other words, it is the restricted background set of regions that were tested, including everything in your regions of interest as well as those that did not get included. We recommend you upload a universe that is specific to the query set, but we also provide a few basic region sets (like tiling regions, or the set of all active DNaseI hypersensitive elements from the ENCODE project). The choice of universe can have a drastic affect on the results of the analysis, so it may also be worth running LOLA few times with different universe sets. For further information, there are more details in the LOLA documentation.

DB:

We have provided a few different general-purpose databases. We recommend starting with the Core database, but there are also a few other more specific options if you want to extend your analysis. Further details about what is contained in each database can be found in the documentation on LOLA region databases.

Finally, we need a link to example results and some documentation on interpretation. So, I suggest putting a link to "Sample results" (which could just take you here: http://lolaweb.databio.org/?key=O1K24SVTQZL5WPJ)

intro clarifications

Let's add a blurb at the top of the page saying what this is.

LOLAweb tests for enrichment of overlap between a query region set (a bed file) and a database of region sets. It provides an interactive result explorer to visualize the highest ranked enrichments from the database. LOLAweb is a web interface to the LOLA R package.

And how about a graphical overview image for the input page?
lolaweb-abstract

display commit hash for version

Right now it's hard to tell when an update has propogated to production containers. I propose the web page display the commit hash of LOLAweb

We would just have to use git to grab the latest hash commit of whatever we pulled down in the LOLAweb container, using for example something like this:

https://github.com/databio/pypiper/blob/f88d2a68fef1014e4e8a1294c8d57a5ceeb5a747/pypiper/manager.py#L411

So the LOLAweb R code would just, upon running, run a system command git rev-parse to grab the hash, and display this in the footnotes.

Flash dependency

The new "Excel" and "PDF" buttons are nice, but they introduce a dependency on Flash.

The PDF looks strange anyway, and Excel can be done with CSV, so I think I would just take those two buttons out...

Submit button becomes active while processing

The submit button ("Run LOLA") is deactivated upon submission (as it should be), but then seems to become activated again while the processing is still ongoing.

I've experienced this using my Firefox browser on Linux Mint with a fairly small input dataset. Seemed to happen about half a minute into processing, not sure whether at a specific stage.

Unable to use own BED file

I've first run LOLAweb using the example data provided, which worked fine. I then tried upload my own BED file (which the web interface tells me worked) and submitted that. The results I get, however, look the same as for the example data and the table at the bottom of the results page tells me that the userset "setB_100.bed" was used.

Upload/Download results package

We've discussed this before, and also suggested by @fhalbritter

Eventually, though, I would still need to load the results into R

I think we should have a link under Run Summary to download an archive (.Rdata file) of the results.

LOLA should have functions to understand these archives.

Then, this could be uploadable as well?

example data

We need an example user dataset so someone can run it without any input to test, if they want. Just use the setC_100 or whatever.

modularize plotting functions

there's quite a bit of redundancy in the "*_plot_input" functions (especially now that we have logic for inverted sorting by rank columns) ... these should be modularized so there's a single function that simply takes a parameter (i.e. oddsRatio, pValueLog, support) for the feature to be plotted

more informative error messages

currently errors during runLOLA result in a "disconnected from server" but do not provide any more meaningful error message or output ... and since we're capturing the run time the green run message bar just shows the "timing stopped" information

Bad Gateway

the dev sever is returning a bad gateway at the moment.
Actually this just happend about 3 minutes ago; I had an interactive results tab open and it just choked.

databases should not be hard-coded

  
  output$loladbs <- renderUI({
    
    if(input$refgenome == "mm10") {
      
      selectInput("loladb", label = "", choices = c("Core"))
      
    } else {
      
      selectInput("loladb", label = "", choices = c("Core", "LOLAJaspar", "LOLARoadmap"))
      
    }
    
  })

like universes, these should be dynamic based on disk folders.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.