Coder Social home page Coder Social logo

trelliscope's Introduction

Trelliscope: Detailed Visualization of Large Complex Data in R

Join the chat at https://gitter.im/delta-rho/users Build Status CRAN

Trelliscope is an R package to be used in conjunction with datadr and RHIPE to provide a framework for detailed visualization of large complex data.

Installation

# from CRAN:
install.packages("trelliscope")

# from github:
devtools::install_github("delta-rho/datadr")
devtools::install_github("delta-rho/trelliscope")

Tutorial

To get started, see the package documentation and function reference located here.

License

This software is currently under the BSD license.

Acknowledgement

Trelliscope development is sponsored by the DARPA XDATA program.

trelliscope's People

Contributors

hafen avatar jcheng5 avatar renkun-ken avatar schloerke avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

trelliscope's Issues

Add option for defaults in view()

When launching the display with view() (or when making the display), there should be an option to set an initial viewing state, including the rows and columns in the display and initial cognostics and filters.

removeDisplay

removeDisplay is defined twice. Once in /R/makeDisplay.R and the other in /R/displayObj.R

I don't know which one you want to keep

preRender missing image

If my panel function doesn't render anything for one subset, can we put in a filler image?

State handling feature

This covers a few of the outstanding issues. The idea is to have an overall notion of state in Trelliscope displays. By state, we are talking about being able to specify how a display is being shown with respect to sorting, filtering, panel layout, panel labels, etc.

Places where we would like to specify the state include:

  • The default state for a display, specified as a state argument to makeDisplay()
  • The state with which to view a display when calling view()
  • In the URL itself

The approach I have been working on uses a URL hash to store the state as interactions occur. Any time a state variable is changed (sorting, etc.), the URL will update. Thus at any point in the viewing process, the URL can be copied and shared with others to preserve the state (not everything is preserved, such as the current page you are on and related displays).

State is specified from the R console by a named list. Consider this example:

library(trelliscope)
vdbConn(file.path(tempdir(), "testVDB"), name = "test", autoYes = TRUE)

set.seed(1234)
iris$alpha <- sample(letters[1:5], 150, replace =TRUE)

bySpeciesAlpha <- divide(iris, by = c("Species", "alpha"))

makeDisplay(bySpeciesAlpha,
   name = "testBySpeciesAlpha",
   panelFn = function(x)
      plot(x$Sepal.Length, x$Sepal.Width)
)

view("testBySpeciesAlpha", state = list(
   layout = list(nrow = 2, ncol = 2),
   sort = list(Species = "asc", alpha = "desc"),
   filter = list(
      alpha = list(select = c("a", "b", "c")),
      Species = list(regex = "a$"))
      labels = c("Species", "alpha")
))

Here, we add a new categorical variable to the iris data, split on both species and this new variable, and then create a dummy plot. Then, when we call view, we tell the viewer to launch with that display showing, reflecting the specified state. The panels will be laid out in 2 rows and 2 columns. Note that we can also specify arrange="row" (default) or arrange="col" in the layout. The panels will be sorted by Species ascending and alpha descending. The panels will be filtered on alpha so that panels only for the letters a-c are showing and filter on Species so that only species ending with the letter a are showing. For numeric variables, we can specify, for example with a variable called var, the following: filter = list(var = list(from = 0, to = 1)). Finally, we can specify labels by simply providing a character vector of cognostic variable names to display beneath each panel. By default, all splitting variables will be shown (if labels is not specified).

A function, validateState() should be available to ensure the state is specified correctly and optionally check if it matches variables for a given display (provided through the display name).

This display will launch and should have the following URL:

http://127.0.0.1:8100/#name=testBySpeciesAlpha&group=common&layout=nrow:2,ncol:2&sort=Species:asc,alpha:desc&filter=alpha:(select:a;b;c)&labels=Species,alpha

As you interact with the display, the URL should reflect what you have done, and if you paste this URL in a new window, the display should be shown in the state specified by the URL.

Another thing we would like to be able to do is make it easy to reference other displays through cognostics. For example, suppose that I also have a division of the iris data by species. For each display of species, I might want to have a link the use can click on that will open up our display by both species and alpha but only show the panels for the corresponding species. This should be done through a cognostics helper function, cogDisplayHref(). Here's an example:

bySpecies <- divide(iris, by = "Species")

makeDisplay(bySpecies,
   name = "testBySpecies",
   panelFn = function(x)
      plot(x$Sepal.Length, x$Sepal.Width),
   cogFn = function(x) {
      list(alphaLink = cogDisplayHref(
         desc = "species broken down by alpha",
         displayName = "testBySpeciesAlpha",
         displayGroup = "common",
         state = list(
            filter = list(Species = list(select = getSplitVar(x, "Species"))),
            layout = list(nrow = 1, ncol = 5)
         )
      ))
   }
)

view(name = "testBySpecies", state = list(labels = c("Species", "alphaLink"), layout = list(ncol = 3)))

Here, we create the display with a cognostic that builds a link to our previous display, but should filter on the species we would like to view them for. After calling the view statement, we should see the three panels with a link showing (since we specified it with labels =) that we can click on.

dir.create recursive

Behavior: using commands such as vdbInit with a directory nested into a directory not yet created will fail.

Suggestion:
grep all uses of dir.create and use rec=TRUE

e.g.:

vdbDir <- "/tmp/jrounds/vdbtest"

vdbInit(vdbDir, name="testVDB", autoYes=TRUE)
Error in vdbInit(vdbDir, name = "testVDB", autoYes = TRUE) :
Could not create directory.
In addition: Warning message:
In dir.create(path) :
cannot create dir '/tmp/jrounds/vdbtest', reason 'No such file or directory'

Add "CogMap" functionality

The collapsible sidebar on the right of the viewer is a placeholder for a "CogMap". This is basically a scatterplot of all of the cognostics for the specified variables, with the cognostics of the currently-viewed panels being highlighted on the plot. This helps get bearings as to where what you are looking at falls with respect to specified cognostics.

Currently this sidebar covers up panels. It should not do this - the main panel area should resize to accommodate for it.

set display state (create "view") from R console

Add a "state" argument to makeDisplay() that specifies state info to create both the base display and a view of the display with the specified state. Also make a makeView() function that takes an existing display and adds a state to it for a new view of the display. state would be a list, something like this:

state = list(
   panelLayout = list(nrow = 2, ncol = 3), 
   sort = list(var1 = "asc"),
   filter = list(var2 = list(from = 100, to = 200)),
   skip = 3,
   desc = "…"
)

Code checking for version numbering

Is there an easy way we can automatically check the following conditions:

  • The version number is the same in NEWS.md, DESCRIPTION, and R/trelliscope-package.R
  • The date is the same in DESCRIPTION and R/trelliscope-package.R

I was looking around at travis ci stuff, but it wasn't clear to me how that might work, since we'd be checking the code itself--not the execution of it.

Saving plots / output

Is there a way to save the output displayed on the screen? Maybe being able to save the plot as a .png or .pdf such that when something is visually interesting it can be saved quickly to a file for future reference. Something other than a rough screenshot capture.

add ability for users to comment on panels

Comments would be stored, which would require thought on a portable database and how to keep it in sync with the web server, etc. This would also probably require a user login facility.

handle conditional / marginal distributions

A user may want to look at visual filters of cognostics given the current filtered state. I think it would be useful in the scatterplots and quantile plots to show filtered points in a different color (gray?). But then if the "conditional" button is clicked, all filtered points should be removed from the interactive plot.

It has also been suggested to change the naming of "conditional" and "marginal" as someone not familiar with statistical distribution terminology might not get it. Any ideas here are welcome. e.g. marginal=all, conditional=after filtering...

Include 'logical' as a type in cog()

For greater user flexibility, I'd recommend adding 'logical' as one of the acceptable 'types' in cog(), and the just converting the TRUE or FALSE to a character string.

Hexbin problem in bivariate filter

I'm getting errors when I try to use the hexbin view in the bivariate filter. The scatter view works fine, but when I switch to the hexbin view I see "Error: NA/NaN/Inf in foreign function call (arg 1)" in Trelliscope. In my R session, it put out

Error in hexbin:::hexbin(cogDF[, xVar], cogDF[, yVar], shape = shape,  : 
  NA/NaN/Inf in foreign function call (arg 1)

The error persisted when I loaded the hexbin package in the global environment.

Here's my sessionInfo():

R version 3.1.1 (2014-07-10)
Platform: x86_64-apple-darwin13.1.0 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] grid      parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] fastICA_1.2-0        hexbin_1.26.3        ggvis_0.3.0.9001     microbenchmark_1.3-0 mapproj_1.2-2       
 [6] maps_2.3-7           XML_3.98-1.1         ggmap_2.3            data.table_1.9.2     scagnostics_0.2-4   
[11] rJava_0.9-6          base64enc_0.1-1      digest_0.6.4         jsonlite_0.9.11      shiny_0.10.0        
[16] dplyr_0.2.0.99       lubridate_1.3.3      trelliscope_0.7.9.2  ggplot2_1.0.0        lattice_0.20-29     
[21] datadr_0.7.3         Rcpp_0.11.2          devtools_1.5        

loaded via a namespace (and not attached):
 [1] assertthat_0.1      bitops_1.0-6        caTools_1.17        codetools_0.2-8     colorspace_1.2-4   
 [6] DBI_0.3.0           evaluate_0.5.5      formatR_0.10        gtable_0.1.2        htmltools_0.2.4    
[11] httpuv_1.3.0        httr_0.3            knitr_1.6           labeling_0.2        magrittr_1.0.1     
[16] markdown_0.7        MASS_7.3-33         memoise_0.2.1       munsell_0.4.2       plyr_1.8.1         
[21] png_0.1-7           proto_0.3-10        R6_2.0              RCurl_1.95-4.1      reshape2_1.4       
[26] RgoogleMaps_1.2.0.6 rjson_0.2.14        RJSONIO_1.2-0.2     scales_0.2.4        stringr_0.6.2      
[31] testthat_0.8.1      tools_3.1.1         whisker_0.3-2       xtable_1.7-3      

cogDisplayHref displayGroup parameter is not optional

In the cogDisplayHref function, the parameter displayGroup has a default value of NULL, but that leads to URLs that are invalid. The displayGroup parameter must be specified to get a valid URL. Recommend changing the default value to "common" instead, as this is the default group used by makeDisplay.

list2env error

The code below produces the error

Error in list2env(cdo$relatedData) : first argument must be a named list

when trying to view individual displays.

library(trelliscope)

iris_small = iris[c(1:2, 51:52, 101:102),]

vdbConn("iris_small_test", name="iris_small_test", autoYes=TRUE)
by_species = divide(iris_small, by="Species", update=TRUE)

makeDisplay(by_species,
            name="iris",
            panelFn=function(x) qplot(Sepal.Length, data=x, geom="histogram"),
            cogFn=function(x) list(num_plants=cog(nrow(x), "Plants")))

License

* checking DESCRIPTION meta-information ... NOTE
Deprecated license: BSD

Any different name to use?

Error on view()

On the last step in the housing data tutorial:

vdbPrefix is C:\Users\Brian\Documents\housingjunk\vdb
Warning in file(con, "rb") :
cannot open file 'C:\Users\Brian\Documents\housingjunk\vdb/displays/common/list_sold_vs_time/thumb_small.png': No such file or directory
Error in file(con, "rb") : cannot open the connection

The "list_sold_vs_time" directory is there, but no files are there. Then there is the error...

Multicore rendering of plots

If users have a parallel backend registered, Trelliscope should use it to render plots in parallel when they're called up the interactive Shiny session. This would be especially nice for ggplot graphics since they take so much longer to render than trellis.

dealing with key hashes with preRender

Can you fix this?
Error in [.kvLocalDisk(cdo$panelDataSource, cogDF$panelKey) :
It appears you are trying to retrive a subset of the data using a hash of the key. Key hashes have not been computed for this data. Please call updateAttributes() on this data.

save state capability from trelliscope viewer

Add the ability to save the sort/filter/sample/etc. state of a display, essentially creating a view of the display, with the ability to annotate what the state signifies. Views would show up in the display list as sub-items under the display they were created from. Selecting the view would show the display in the state it was saved in.

Add "Add Related Display" functionality

This will allow the user to specify an additional set of displays created against the same division of data to be shown alongside the currently-selected display in the Trelliscope viewer. The trick is to figure out the best way to position and size multiple displays of varying aspect ratio.

Missing variables

* checking R code for possible problems ... NOTE

Do not know what to do:

  • getCogInfo: no visible binding for global variable 'type'
  • syncLocalData: no visible binding for global variable 'dataClass'

Can't find variable

  • displayListNames
    • removeDisplay: no visible binding for global variable 'displayListNames'
  • displayList
    • nbDisplay: no visible binding for global variable 'displayList'
    • nbDisplayList : : no visible binding for global variable 'displayList'
  • displayListDF
    • findDisplay: no visible binding for global variable 'displayListDF'
    • syncLocalData: no visible binding for global variable 'displayListDF'
    • listDisplays: no visible binding for global variable 'displayListDF'
  • logMsg
    • oldGetCurCogDat.data.frame: no visible global function definition for 'logMsg'
    • oldGetCurCogDat.mongoCogDatConn: no visible global function definition for 'logMsg'

Fixed

  • getCognostics : : no visible global function definition for 'getCognosticsSub'
  • mongoSaveCognostics: no visible global function definition for 'vdbMongoInit'
  • myRunApp: no visible binding for global variable '.shinyServerMinVersion'

Any ideas on the displayListNames, displayList, displayListDF, or logMsg ? I don't know where they're coming from.

Option for no visible cognostics

In the "Visible Cognostics" menu, users should have the option of selecting no cognostics to display. It currently allows deselecting all of them, but then shows the default label again in the display.

panel caching

Instead of rendering page by page, for a fixed state of filtering, sorting, sampling, and panel layout, start to build a cache of hidden divs containing each page's rendered content.

Thus when the user clicks next, if the next page has been pre-cached, it will simply load quickly, otherwise, it will trigger the output to be rendered the usual way. Users typically stop to study the current page for a small amount of time, so it would be good to remove the page changing latency by rendering it while they are viewing.

I think the best way to do this is by a specially-numbered div for each page and a special shiny output that keeps triggering more data to be sent as it keeps rendering.

collect function missing

Is the collect function suppose to come from the memisc package? It currently can not be found

makeDisplay() lims don't seem to work for a panel function written in base R

See this example. It runs without error or warning--but the xlims and ylims are not fixed.

library(trelliscope)
vdbConn("iris_test", name = "iris_test", autoYes = TRUE)
by_species <- divide(iris, by = "Species", update = TRUE)
makeDisplay(by_species,
            name ="iris",
            panelFn = function(x) {plot(x$Sepal.Length, x$Petal.Width); return(NULL)},
            cogFn = function(x) list(Sepal.Width.Mean = cogMean(x$Sepal.Width)),
            lims = list(x = "same", y = "same",
                        prepanelFn = function(x) list(xlim = range(x$Sepal.Length),
                                                      ylim = range(x$Petal.Width))))
view()

complete multivariate filter

The multivariate filter currently does not update the selected panels after being highlighted and "apply" is clicked. Fix this. This is a quick fix, just needs to be done.

For reference, the multivariate filter computes an independent components projection of 2 or more selected quantitative cognostics, in hopes to find an interesting projection of these variables, with the idea that interesting subsets of panels can be found through this mechanism.

Group panels hierarchically

The group parameter does a good job of helping to organize panels but I have an exploration that has grown to almost 20 individual panels and will continue to grow. As a suggestion for an enhancement, it might be nice if I could organize panels hierarchically.

This could probably be done with the existing makeDisplay interface letting / indicate a sub group. The display pop-up could then show a collapsible group hierarchy.

Rmd?

How are Rmarkdown documents supported in trelliscope? I remember that they are but I don't remember how.

make panel function generic

The panel function essentially needs to take data for a given subset and turn that into html to put inside a div. Right now that content is simply a png file.

To make the panel function more generic, we can implement different types. For example, if the panel function wants to render a d3 plot, it should probably return a set of instructions for how to render the data (such as a link to d3 javascript code) as well as a json object of the data for the subset that is ready to be rendered. A useful one to consider would be generic support of RCharts.

However, one important consideration is for each potential rendering type, we would need to think about providing methods that allow us to rescale each panel. Scales are extremely important in trelliscope and if the user has specified "sliced" or "same", we want to modify the plots to have the appropriate axis limits.

Debugging error messages

Is there a good way to debug error messages that don't show up until interacting with the viewer? I'm getting the following error message only when opening the Table Sort/Filter tab.

Error: argument is of length zero.

The message in the terminal is below.

Error in if (!is.na(curInfo$type)) { : argument is of length zero

I suspect it's a problem with the cognostics I've defined but I'm not positive (the univariate and bivariate filters work fine).

MathJax

Add a minimized file that can be sourced..

Main issue was long file path for R CMD check

NA's in cognostics turn into errors in Table Sort/Filter and Univariate plots

NA's for any cognostic cause the entire Table Sort/Filter view to throw an error, and it also causes the Univarite Filter to fail for that particular cognostic. Would be nice to add the following functionality:

When the cognostic function is applied in makeDisplay(), run some quick check over the resulting cognostics list, and if there are NA's, remove them from the list before passing them on to the display (so they never appear in the display), and issue a warning or note to the user explaining why the cognostic is not present in the viewer.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.