databio / lolaweb Goto Github PK
View Code? Open in Web Editor NEWAn interactive Shiny web application for genomic interval enrichment analysis using LOLA.
Home Page: http://lolaweb.databio.org/
An interactive Shiny web application for genomic interval enrichment analysis using LOLA.
Home Page: http://lolaweb.databio.org/
I'm getting a 'Welcome to nginx' page at our URLs right now. The servers are running but there must be something wrong with traefik, I'm guessing.
Right now it's hard to tell when an update has propogated to production containers. I propose the web page display the commit hash of LOLAweb
We would just have to use git to grab the latest hash commit of whatever we pulled down in the LOLAweb container, using for example something like this:
So the LOLAweb R code would just, upon running, run a system command git rev-parse
to grab the hash, and display this in the footnotes.
you probably realize this but just in case you don't :) I'm getting a Bad Gateway
error right now on http://lolaweb.databio.org
I think it might be nice to have an option (as in "optional"!) to enter your email address when you run a new analysis so that you can post a message to that address with a link to the results. Either that, or have user accounts with saved results.
This helps to keep track of multiple results and gives some robustness to interface errors happening after submission (which you will never be able to avoid completely in requests that may run for an extended period of time).
We've discussed this before, and also suggested by @fhalbritter
Eventually, though, I would still need to load the results into R
I think we should have a link under Run Summary to download an archive (.Rdata file
) of the results.
LOLA should have functions to understand these archives.
Then, this could be uploadable as well?
Let's add a blurb at the top of the page saying what this is.
LOLAweb tests for enrichment of overlap between a query region set (a
bed
file) and a database of region sets. It provides an interactive result explorer to visualize the highest ranked enrichments from the database. LOLAweb is a web interface to the LOLA R package.
And how about a graphical overview image for the input page?
When we last talked, we thought about having master and dev branches with separate docker deploys. I now think your staging comment makes more sense; so we have dev for development, staging pushes that into a test docker environment, and then master is the real deploy.
that makes a lot of sense to me now, even for this.
The submit button ("Run LOLA") is deactivated upon submission (as it should be), but then seems to become activated again while the processing is still ongoing.
I've experienced this using my Firefox browser on Linux Mint with a fairly small input dataset. Seemed to happen about half a minute into processing, not sure whether at a specific stage.
Right now we have a different dockerfile for each branch (master, staging, dev). this caused me some confusion; is this the way we want it?
Shouldn't there be only a single docker file, and then the branch it's on dictates which container it's building?
since the universes
and regionDBs
have outgrown what is in lola vignette data, i think we should consider where and how to make all the reference data (and sample caches) available in one place for users who want to run LOLAweb
locally
this should be documented either in root README or apps README
Reach out to RCI and discuss options around mounting specific Qumulo shares to DEV1/DEV2. This will help us consolidate pools of reference / universe data.
Along with the UI decouple (#30) we should have the results page show some summary stats on the run:
anything else?
moving forward LOLAweb
could be decoupled so that there is a "run" view and a "results" view
in principle the app will behave the same (after execution of run, users is forwarded to results) ... but (among other things) this will make managing display of ui elements much more intuitive
After leaving a connection idle for several minutes, the connection to the server times out. This is not a major problem but there's probably some config option we can change to make these timeouts a bit longer.
I keep getting "Disconnected from the server" errors with the hg38 reference.
This happens when I try to run LOLA: I switch the reference to hg38 and upload my BED file, keeping all other options at default, then click "Run". I'm told it's loading the data then switches to "timing stopped" immediately and displays the "disconnected from server" error.
It works with the same file and hg19, so I suppose something's off with the reference caches.
Overall, more informative error messages might help from a usability perspective.
Right now the data (e.g. universes, userSets, etc.) are housed as subfolders right in the app folder. I propose we make this a config variable somewhere, or something.
Also we might make this a single external folder so we don't have to mount 5 volumes independently on the docker container (which are all just subfolders within 1 folder). Maybe?
Line 79 in 32a7c0a
A very minor suggestion: I'm not sure if Shiny accepts relative URLs, but if this is changed to a relative path, it will keep you on the respective container (dev
, staging
, latest
).
how should we manage cache retention / do we need to include any disclaimer text about cache availability?
Would it make sense on a results page to have a button to start over, or run another analysis? I just realized, after getting to http://dev.lolaweb.uvasomrc.io/?key=D1N9IEPLGS5MJOC with sample data, that I didn't have an easy to way to run LOLA again.
The new text updates are working great, but there are a few issues:
Not mandatory for the article submission, but if we can get our container to lose a little bit of weight in the nearfuture, that would be great.
We shot from 1.4GB to nearly 5GB yesterday, so it would make good sense to pull out any reference data/DB's that we can, and put them into the NFS share in one centralized reference location like the other reference data.
with the change to tabs the 'sample results' link got moved to the "results" tab and is therefore no longer on the first page... that's fine, but I think we really need a link to sample results on the first page for first-time users. otherwise, they won't realize they can get there without runnign a complete analysis. nobody will think to click the 'run' tab.
we should add that link back in so it's in both places.
output$loladbs <- renderUI({
if(input$refgenome == "mm10") {
selectInput("loladb", label = "", choices = c("Core"))
} else {
selectInput("loladb", label = "", choices = c("Core", "LOLAJaspar", "LOLARoadmap"))
}
})
like universes, these should be dynamic based on disk folders.
let's add links to my lab page and to the source code. maybe let's change Powered By SOMRC
to:
Powered By SOMRC. A project from the Sheffield lab at the University of Virginia. Source code at GitHub.
or some variant thereof
right now if there's a NaN
the sliders will break ... need to strip these (but retain them in the DT)
instead of starting from the R base image in the docker file, we should consider a few alternatives:
As it is, the way I 'm seeing this, it's is a 25 minute travis build for deploy because it's re-installing lots of stuff from scratch every time. We could avoid this by having the LOLA base image start from something a little more refined.
If we can load all the libraries in a global area, then what stops us from also doing this with the database caches, so they don't need to be re-loaded in each container?
currently the JS datatable at the bottom of the results view renders locally but not on dev:
http://dev.lolaweb.databio.org/?key=VWQN3ZC5HFD92EK
Chrome developer tools is reporting the following error for datatables.js
:
Failed to load resource: the server responded with a status of 404 (Not Found)
can't remember if i saw this before the navbarpage change or not ... but i'm wondering if this is an issue with websockets being disabled?
I've first run LOLAweb using the example data provided, which worked fine. I then tried upload my own BED file (which the web interface tells me worked) and submitted that. The results I get, however, look the same as for the example data and the table at the bottom of the results page tells me that the userset "setB_100.bed" was used.
there's quite a bit of redundancy in the "*_plot_input" functions (especially now that we have logic for inverted sorting by rank columns) ... these should be modularized so there's a single function that simply takes a parameter (i.e. oddsRatio, pValueLog, support) for the feature to be plotted
Tiny user-interface suggestion: can we make the shiny server display the lola logo (just the 'button') as a favicon?
the app should include plotFeatureDist()
from GenomicDistributions
this will require:
runLOLA
and featureDistribution
computation ... see SO post for exampleThe sorting for any Rank-based column should be inverted, so better hits show up on top, instead of on bottom
(try sorting by support, then by maxRank)
universes are actually relative to a reference assembly, like databases.
hence, we should move the selection of assembly to the left, with the input, and this will need to populate options select boxes for both universe and database.
in other words, activeDHS universe will not work with an mm9 database.
there is another bug in uploads; it is requiring that there be exactly 6 columns in the upload bed file.
the dev sever is returning a bad gateway at the moment.
Actually this just happend about 3 minutes ago; I had an interactive results tab open and it just choked.
need to add google analytics tracking to shiny server config
probably you realize this, but the sample results will stop working if they are old, when you update to the new way of wrapping the results into a list; hence, right now on dev, the link to sample results is failing.
I think this won't be a problem in the future, but once we're settled we need to make sure we preserve backwards compatibility so that old results can still load, even if we add a new figure or something. as long as they're in a list it should be fine; just don't display the image if the cache doesn't contain result for that image (in other words, if the cache was generated for the compute for that image was included in the code)
for now we just have to make sure the "Sample Results" link points to a new-style result when that is ready.
Right now the default cutoffs are set such that there are no hits present in any plots.
Can we work out a way to get a reasonable cutoffs such that there will be a few hits present by default? Maybe, 25 hits or something?
Suggestion: take the .25
quantile as the default for both odds ratio and support; and then just rank and take set the p-value cutoff to score on number 25. then we're guaranteed to show exactly 25 by default. in other words, 2 should be loose, hard-coded default cutoffs, and the other should be a dynamic based on number of items.
We need an example user dataset so someone can run it without any input to test, if they want. Just use the setC_100 or whatever.
for the interactive viewer, it takes about 5-7 seconds to load (even when clicking on "sample results" for example).
I think we should use the gear icon here as well. After the "LOLA Results" heading but before the plots are displayed, I mean.
What happens if someone who ran LOLAweb with an old version tries to load that cache with an updated LOLAweb?
Well, the old versionw ill lack a few components from the new. If you run a cache in an older version of LOLA web, then try to load it in a new version, missing components give you an error like this:
Error: An error has occurred. Check your logs or contact the app author for clarification.
I think this should be a more informative message, if the requested object is simply not found (it could suggest re-running the cache for example).
If there truly was an error then it's fine to say that, otherwise, I think that's not the right message to send.
Feature suggestion by @fhalbritter
I think it would be good to have an option for long(er)-term archival of selected results so that people can directly link to dynamic results pages from publications and websites.
right now the cache/
, universes/
, etc. are hardcoded ... some users developing on the app may not have a local copy of these under their app code structure
we should allow the locations of these to be passed as environment variables at docker run time
Make the question marks lead to a popup or something that displays text, instead of going to the R package docs. These are meant to be like little tooltips or something. Here's the text for each:
The User Set is your set of genomic regions that you want to test for overlap with the database. Upload a file in BED format (really, it just needs the first 3 columns:
chr
,start
, andend
). You can also drag and drop multiple files and they will be analyzed simultaneously!
Universe:
The universe is your background set of regions. You should think of the universe as the set of regions you tested for possible inclusion in your user sets; or, in other words, it is the restricted background set of regions that were tested, including everything in your regions of interest as well as those that did not get included. We recommend you upload a universe that is specific to the query set, but we also provide a few basic region sets (like tiling regions, or the set of all active DNaseI hypersensitive elements from the ENCODE project). The choice of universe can have a drastic affect on the results of the analysis, so it may also be worth running LOLA few times with different universe sets. For further information, there are more details in the LOLA documentation.
DB:
We have provided a few different general-purpose databases. We recommend starting with the Core database, but there are also a few other more specific options if you want to extend your analysis. Further details about what is contained in each database can be found in the documentation on LOLA region databases.
Finally, we need a link to example results and some documentation on interpretation. So, I suggest putting a link to "Sample results" (which could just take you here: http://lolaweb.databio.org/?key=O1K24SVTQZL5WPJ)
In the interactive results interface, I think we should also add some tooltips.
"Select Collection":
LOLA databases are made up of one or more sub-collections of region set. Using this drop-down, you can filter your plots and tables to show only the results from one of these collections at a time.
"Max Rank Cutoff"
These sliders can be used
Let's group all 3 plots under a single heading called "LOLA Results", and make the individual plot headings smaller; then we can add a tooltip question mark that says:
These barplots show the highest-ranking region sets from the database. The higher scores indicate more overlap with your query set. The results are scored using 3 statistics: Support is the raw number of regions that overlapped between your query set and the database set. LogPVal and LogOdds are the results of a Fisher's exact test scoring the significance of that overlap.
We rank each region set from the database for each of these 3 scores, and you can see the raw scores and the ranks in the table below. You can also see the maximum and mean ranks across all 3 categories.
it appears that a single user can only have multiple sessions in different browsers. can we relax this constraint?
plot downloads are PNG; they should be PDF.
The new "Excel" and "PDF" buttons are nice, but they introduce a dependency on Flash.
The PDF looks strange anyway, and Excel can be done with CSV, so I think I would just take those two buttons out...
currently errors during runLOLA
result in a "disconnected from server" but do not provide any more meaningful error message or output ... and since we're capturing the run time the green run message bar just shows the "timing stopped" information
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.