The lolaweb's discuss from databio

container issues

I'm getting a 'Welcome to nginx' page at our URLs right now. The servers are running but there must be something wrong with traefik, I'm guessing.

display commit hash for version

Right now it's hard to tell when an update has propogated to production containers. I propose the web page display the commit hash of LOLAweb

We would just have to use git to grab the latest hash commit of whatever we pulled down in the LOLAweb container, using for example something like this:

https://github.com/databio/pypiper/blob/f88d2a68fef1014e4e8a1294c8d57a5ceeb5a747/pypiper/manager.py#L411

So the LOLAweb R code would just, upon running, run a system command git rev-parse to grab the hash, and display this in the footnotes.

bad gateway

you probably realize this but just in case you don't :) I'm getting a Bad Gateway error right now on http://lolaweb.databio.org

I think it might be nice to have an option (as in "optional"!) to enter your email address when you run a new analysis so that you can post a message to that address with a link to the results. Either that, or have user accounts with saved results.

This helps to keep track of multiple results and gives some robustness to interface errors happening after submission (which you will never be able to avoid completely in requests that may run for an extended period of time).

Upload/Download results package

We've discussed this before, and also suggested by @fhalbritter

Eventually, though, I would still need to load the results into R

I think we should have a link under Run Summary to download an archive (.Rdata file) of the results.

LOLA should have functions to understand these archives.

Then, this could be uploadable as well?

intro clarifications

Let's add a blurb at the top of the page saying what this is.

LOLAweb tests for enrichment of overlap between a query region set (a bed file) and a database of region sets. It provides an interactive result explorer to visualize the highest ranked enrichments from the database. LOLAweb is a web interface to the LOLA R package.

And how about a graphical overview image for the input page?

staging and dev

@nmagee and @vpnagraj

When we last talked, we thought about having master and dev branches with separate docker deploys. I now think your staging comment makes more sense; so we have dev for development, staging pushes that into a test docker environment, and then master is the real deploy.

that makes a lot of sense to me now, even for this.

Submit button becomes active while processing

The submit button ("Run LOLA") is deactivated upon submission (as it should be), but then seems to become activated again while the processing is still ongoing.

I've experienced this using my Firefox browser on Linux Mint with a fairly small input dataset. Seemed to happen about half a minute into processing, not sure whether at a specific stage.

dockerfile duplication

Right now we have a different dockerfile for each branch (master, staging, dev). this caused me some confusion; is this the way we want it?

Shouldn't there be only a single docker file, and then the branch it's on dictates which container it's building?

make all reference data available and document local setup

since the universes and regionDBs have outgrown what is in lola vignette data, i think we should consider where and how to make all the reference data (and sample caches) available in one place for users who want to run LOLAweb locally

this should be documented either in root README or apps README

Attach SOMDEV1 / DEV2 to Qumulo

Reach out to RCI and discuss options around mounting specific Qumulo shares to DEV1/DEV2. This will help us consolidate pools of reference / universe data.

Run summary stats

Along with the UI decouple (#30) we should have the results page show some summary stats on the run:

start time?
end time?
elapsed time?
results URL
filename uploaded
genome selected
universe selected
database selected

anything else?

decouple processing

moving forward LOLAweb could be decoupled so that there is a "run" view and a "results" view

in principle the app will behave the same (after execution of run, users is forwarded to results) ... but (among other things) this will make managing display of ui elements much more intuitive

server disconnects

After leaving a connection idle for several minutes, the connection to the server times out. This is not a major problem but there's probably some config option we can change to make these timeouts a bit longer.

hg38 -- disconnected from server

I keep getting "Disconnected from the server" errors with the hg38 reference.

This happens when I try to run LOLA: I switch the reference to hg38 and upload my BED file, keeping all other options at default, then click "Run". I'm told it's loading the data then switches to "timing stopped" immediately and displays the "disconnected from server" error.

It works with the same file and hg19, so I suppose something's off with the reference caches.

Overall, more informative error messages might help from a usability perspective.

data mount point

Right now the data (e.g. universes, userSets, etc.) are housed as subfolders right in the app folder. I propose we make this a config variable somewhere, or something.

Also we might make this a single external folder so we don't have to mount 5 volumes independently on the docker container (which are all just subfolders within 1 folder). Maybe?

Consider making "sample results" link a relative URL

LOLAweb/apps/LOLAweb/app.R

Line 79 in 32a7c0a

tags$a(href = "http://lolaweb.databio.org/?key=CD59REQLJU4AFMX",

A very minor suggestion: I'm not sure if Shiny accepts relative URLs, but if this is changed to a relative path, it will keep you on the respective container (dev, staging, latest).

cache retention policy and disclaimer

how should we manage cache retention / do we need to include any disclaimer text about cache availability?

'Run Again' button on results page?

Would it make sense on a results page to have a button to start over, or run another analysis? I just realized, after getting to http://dev.lolaweb.uvasomrc.io/?key=D1N9IEPLGS5MJOC with sample data, that I didn't have an easy to way to run LOLA again.

text status updates

The new text updates are working great, but there are a few issues:

"Calculating Fisher scores..." remains after job is complete.
notices now appear in 2 locations: top right ("calculating region set enrichments..." and under the green button. let's consolidate
notices are too long and go off the screen if on narrow screens
should we move the runLOLA button? it seems strange hanging out over on the right side away from everything else.

container bloat

Not mandatory for the article submission, but if we can get our container to lose a little bit of weight in the nearfuture, that would be great.

We shot from 1.4GB to nearly 5GB yesterday, so it would make good sense to pull out any reference data/DB's that we can, and put them into the NFS share in one centralized reference location like the other reference data.

sample results link on front page

with the change to tabs the 'sample results' link got moved to the "results" tab and is therefore no longer on the first page... that's fine, but I think we really need a link to sample results on the first page for first-time users. otherwise, they won't realize they can get there without runnign a complete analysis. nobody will think to click the 'run' tab.

we should add that link back in so it's in both places.

databases should not be hard-coded

  
  output$loladbs <- renderUI({
    
    if(input$refgenome == "mm10") {
      
      selectInput("loladb", label = "", choices = c("Core"))
      
    } else {
      
      selectInput("loladb", label = "", choices = c("Core", "LOLAJaspar", "LOLARoadmap"))
      
    }
    
  })

like universes, these should be dynamic based on disk folders.

links

let's add links to my lab page and to the source code. maybe let's change Powered By SOMRC to:

Powered By SOMRC. A project from the Sheffield lab at the University of Virginia. Source code at GitHub.

or some variant thereof

make sure sliders don't have NaN values

right now if there's a NaN the sliders will break ... need to strip these (but retain them in the DT)

docker base

instead of starting from the R base image in the docker file, we should consider a few alternatives:

using a bioconductor container?
creating our own image and using that as the base image?

As it is, the way I 'm seeing this, it's is a 25 minute travis build for deploy because it's re-installing lots of stuff from scratch every time. We could avoid this by having the LOLA base image start from something a little more refined.

processing time streamlined

If we can load all the libraries in a global area, then what stops us from also doing this with the database caches, so they don't need to be re-loaded in each container?

result table not rendering

currently the JS datatable at the bottom of the results view renders locally but not on dev:

http://dev.lolaweb.databio.org/?key=VWQN3ZC5HFD92EK

Chrome developer tools is reporting the following error for datatables.js:

Failed to load resource: the server responded with a status of 404 (Not Found)

can't remember if i saw this before the navbarpage change or not ... but i'm wondering if this is an issue with websockets being disabled?

Unable to use own BED file

I've first run LOLAweb using the example data provided, which worked fine. I then tried upload my own BED file (which the web interface tells me worked) and submitted that. The results I get, however, look the same as for the example data and the table at the bottom of the results page tells me that the userset "setB_100.bed" was used.

modularize plotting functions

there's quite a bit of redundancy in the "*_plot_input" functions (especially now that we have logic for inverted sorting by rank columns) ... these should be modularized so there's a single function that simply takes a parameter (i.e. oddsRatio, pValueLog, support) for the feature to be plotted

faviocon

Tiny user-interface suggestion: can we make the shiny server display the lola logo (just the 'button') as a favicon?

incorporate `plotFeatureDist`

the app should include plotFeatureDist() from GenomicDistributions

this will require:

adding a new object to result list that's cached
figuring out parallelization of the runLOLA and featureDistribution computation ... see SO post for example

invert sorting for rank columns

The sorting for any Rank-based column should be inverted, so better hits show up on top, instead of on bottom

(try sorting by support, then by maxRank)

reference universes

universes are actually relative to a reference assembly, like databases.

hence, we should move the selection of assembly to the left, with the input, and this will need to populate options select boxes for both universe and database.

in other words, activeDHS universe will not work with an mm9 database.

bug in uploads

there is another bug in uploads; it is requiring that there be exactly 6 columns in the upload bed file.

Bad Gateway

the dev sever is returning a bad gateway at the moment.
Actually this just happend about 3 minutes ago; I had an interactive results tab open and it just choked.

google analytics

need to add google analytics tracking to shiny server config

old sample results get broken

probably you realize this, but the sample results will stop working if they are old, when you update to the new way of wrapping the results into a list; hence, right now on dev, the link to sample results is failing.

I think this won't be a problem in the future, but once we're settled we need to make sure we preserve backwards compatibility so that old results can still load, even if we add a new figure or something. as long as they're in a list it should be fine; just don't display the image if the cache doesn't contain result for that image (in other words, if the cache was generated for the compute for that image was included in the code)

for now we just have to make sure the "Sample Results" link points to a new-style result when that is ready.

default cutoffs

Right now the default cutoffs are set such that there are no hits present in any plots.

Can we work out a way to get a reasonable cutoffs such that there will be a few hits present by default? Maybe, 25 hits or something?

Suggestion: take the .25 quantile as the default for both odds ratio and support; and then just rank and take set the p-value cutoff to score on number 25. then we're guaranteed to show exactly 25 by default. in other words, 2 should be loose, hard-coded default cutoffs, and the other should be a dynamic based on number of items.

example data

We need an example user dataset so someone can run it without any input to test, if they want. Just use the setC_100 or whatever.

add 'thinking' icon to visualization page

for the interactive viewer, it takes about 5-7 seconds to load (even when clicking on "sample results" for example).

I think we should use the gear icon here as well. After the "LOLA Results" heading but before the plots are displayed, I mean.

Errors for missing pieces should suggest rerunning the cache.

What happens if someone who ran LOLAweb with an old version tries to load that cache with an updated LOLAweb?

Well, the old versionw ill lack a few components from the new. If you run a cache in an older version of LOLA web, then try to load it in a new version, missing components give you an error like this:

Error: An error has occurred. Check your logs or contact the app author for clarification.

I think this should be a more informative message, if the requested object is simply not found (it could suggest re-running the cache for example).

If there truly was an error then it's fine to say that, otherwise, I think that's not the right message to send.

longer-term archival of results

Feature suggestion by @fhalbritter

I think it would be good to have an option for long(er)-term archival of selected results so that people can directly link to dynamic results pages from publications and websites.

allow local and reference data archives to be passed environment variables

right now the cache/, universes/, etc. are hardcoded ... some users developing on the app may not have a local copy of these under their app code structure

we should allow the locations of these to be passed as environment variables at docker run time

Help text for app interface

Make the question marks lead to a popup or something that displays text, instead of going to the R package docs. These are meant to be like little tooltips or something. Here's the text for each:

The User Set is your set of genomic regions that you want to test for overlap with the database. Upload a file in BED format (really, it just needs the first 3 columns: chr, start, and end). You can also drag and drop multiple files and they will be analyzed simultaneously!

Universe:

The universe is your background set of regions. You should think of the universe as the set of regions you tested for possible inclusion in your user sets; or, in other words, it is the restricted background set of regions that were tested, including everything in your regions of interest as well as those that did not get included. We recommend you upload a universe that is specific to the query set, but we also provide a few basic region sets (like tiling regions, or the set of all active DNaseI hypersensitive elements from the ENCODE project). The choice of universe can have a drastic affect on the results of the analysis, so it may also be worth running LOLA few times with different universe sets. For further information, there are more details in the LOLA documentation.

DB:

We have provided a few different general-purpose databases. We recommend starting with the Core database, but there are also a few other more specific options if you want to extend your analysis. Further details about what is contained in each database can be found in the documentation on LOLA region databases.

Finally, we need a link to example results and some documentation on interpretation. So, I suggest putting a link to "Sample results" (which could just take you here: http://lolaweb.databio.org/?key=O1K24SVTQZL5WPJ)

Tooltips to help with interpretation

In the interactive results interface, I think we should also add some tooltips.

"Select Collection":

LOLA databases are made up of one or more sub-collections of region set. Using this drop-down, you can filter your plots and tables to show only the results from one of these collections at a time.

"Max Rank Cutoff"

These sliders can be used

Let's group all 3 plots under a single heading called "LOLA Results", and make the individual plot headings smaller; then we can add a tooltip question mark that says:

These barplots show the highest-ranking region sets from the database. The higher scores indicate more overlap with your query set. The results are scored using 3 statistics: Support is the raw number of regions that overlapped between your query set and the database set. LogPVal and LogOdds are the results of a Fisher's exact test scoring the significance of that overlap.

We rank each region set from the database for each of these 3 scores, and you can see the raw scores and the ranks in the table below. You can also see the maximum and mean ranks across all 3 categories.

databio / lolaweb Goto Github PK

lolaweb's Issues

Recommend Projects

Recommend Topics

Recommend Org