o2r-project / containerit Goto Github PK
View Code? Open in Web Editor NEWPackage an R workspace and all dependencies as a Docker container
Home Page: https://o2r.info/containerit/
License: GNU General Public License v3.0
Package an R workspace and all dependencies as a Docker container
Home Page: https://o2r.info/containerit/
License: GNU General Public License v3.0
Expands on an idea mentioned in #6 (comment)
After a session (or workspace or something else) was containerized, a user may be able to test-build an image in order to verify that
(1) The build is successful
(2) The local session matches the dockerized session
(3) More ideas?
1 and 2 can be achieved analog to tests/testthat/test_sessioninfo_reproduce.R
, by turning the test into feature
https://github.com/codemeta/codemeta/pull/97/files
Should be quite straightforward thanks to https://github.com/ropensci/jsonld
Extends #37
Simple R images like rocker/r-ver do not come with a GUI, therefore, the use of R is restricted to the console.
With such an configuration (try for instance docker run -it --rm rocker/r-ver
) users are not able to view R plots or any file or data that cannot be printed directly to the console.
Therefore, it would be beneficial to leverage Rstudio images (see rocker/rstudio
) and thus restore sessions directly in an Rstudio Server session.
See https://github.com/rocker-org/rocker/wiki/Using-the-RStudio-image
Write a function that pulls a Dockerized linter (projectatomic/dockerfile-lint looks good) container and executes it on a given path and shows the output on the R console.
As becomes clear in the discussion on geospatial libraries in Rocker, the versions of linked external libraries matter.
Can we support packaging explicit version of linked libraries?
> extSoftVersion()
zlib bzlib xz
"1.2.8" "1.0.6, 6-Sept-2010" "5.1.0alpha"
PCRE ICU TRE
"8.38 2015-11-23" "" "TRE 0.8.0 R_fixes (BSD)"
iconv readline
"glibc 2.23" "6.3"
> library(sf)
Linking to GEOS 3.5.0, GDAL 2.1.2, proj.4 4.9.2
> sf::sf_extSoftVersion()
GEOS GDAL proj.4
"3.5.1" "2.1.2" "4.9.2"
This information could be accessed by a funtion <pkgname_extSoftVersion>
, see extSoftVersion and (sf_extSoftVersion()
](https://github.com/edzer/sfr/blob/5c3dfea395af81bf352b4007d16c6a7d419883c2/R/init.R#L59)
Allow to manually select a specific R version
Now there are versioned Rocker images: https://github.com/rocker-org/rocker-versioned/
https://github.com/metacran/rversions might become handy.
paste0(version$major, '.', version$minor)
How will/does/should containerit leverage other packages?
we must add user config files etc. to the container and make sure they are actually used.
https://stat.ethz.ch/R-manual/R-devel/library/base/html/Startup.html
The startup script should ideally also log the environment, i.e.
sessionInfo()
extends #6
The session loads rgdal and proj packages and adds the required libraries in the Dockerfile.
Use two approaches and then compare:
rsysreqs
In case information is missing, we can retrieve it from the user via a commandline interface directly from R
utils:menu
, cf. https://github.com/hadley/devtools/blob/aaa4b61ca7c44515418d485cc64c84475e998ac7/R/utils.r#L69When we can write Dockerfiles, we might as well parse them. uncontainer_it
= create a session on a host machine that resembles the one that is inside a container.
Use cases needed!
- [ ] https://github.com/o2r-project/o2r-muncher/tree/master/test/bagtainers/markdowntainer-sfr/data
system.file("doc/sf3.Rmd",package = "sf")
)system.file("examples", "knitr-minimal.Rnw", package = "knitr")
Suggestion/discussion
We could wrap the Docker CLI in R functions.
MRAN supports file style locations now: RevolutionAnalytics/checkpoint#218
Also, this issue RevolutionAnalytics/checkpoint#216 (comment) reports on how to make checkpoint work for an R Markdown document.
When reproducing an R session also match the locales.
As shown in #33 (comment) locales are not reproduced yet
In Linux, the locale first has to be generated, if missing, and then configured as default or current locale. That seems not to be trivial, especially in non-interactive mode.
The R functions Sys.getlocale()
and Sys.setlocale()
may be helpful.
tests/testthat/test_sessioninfo_reproduce.R
runs with test (uncomment corresponding lines):test_that("the locales are the same ", {
expect_equal(local_sessionInfo$locale, docker_sessionInfo$locale)
})
Following the suggestions in Label Schema we can add some meta-information to the images
Create a small session (load few packages, including one from CRAN but not in base packages) and create a Dockerfile that is as close as possible to recreate that session.
Based on sessionInfo()$running
we have a mapping from running string to base image. In our case all running strings map to rocker.
Also check devtools::session_info()
, could be useful to determine installation source.
Open questions:
MAINTAINER
field?Consider how the function works_with_R
could be useful for this package: https://github.com/tdhock/dotfiles/blob/master/.Rprofile
NixOS is an interesting Linux distro that is all about declarative configuration and "devops", see e.g. https://www.domenkozar.com/2014/03/11/why-puppet-chef-ansible-arent-good-enough-and-we-can-do-better/
Maybe containerit
could create NixOS installation instructions instead of Dockerfiles ?
See https://github.com/jimhester/lintr#testthat
They seems mostly reasonable :-)
We need a CLI (command line interface) wrapper around the library to integrate it into workflows in other programming languages (e.g. as part of a node.js-based webapp)
Alternatively, if docopt does not work at all, evaluate package https://cran.r-project.org/web/packages/optparse/
Example usage:
container_it.R [-f <path to (markdown, R-script)file>]
container_it.R [-s] # package new R session
What options do we need exposed? How easy is that with docopt?
When packaging research into higher level containers, e.g. ERC, we most probably need some meta information. While the use can be asked for this, see #13, it would be better to extract this automagically from the session.
For this, we would need a feature that appends a script to the "main script file" of the container which has access to the R session "after" the analysis is completed.
Some ideas for informations that could be extracted here:
This feature is complementary to the file analysis conducted by @7048730 in https://github.com/o2r-project/o2r-meta
Extends #6
Install a package that is only available from GitHub. Probably need devtools::session_info()
.
Packages for testing:
This should (optinally, or even by default) also see if a specific version is tagged on GitHub and install that explicitly, e.g. devtools::install_github("Appsilon/shiny.collections", ref = "0.1.0")
https://rstudio.github.io/packrat/
Goal: add a snapshot of a packrat private repository to the container
https://cran.r-project.org/web/packages/datapack/vignettes/datapack-overview.html seems to have a good working process
build
("setup works")see also #10
It could be useful to extend the Dockerfile so that the captured session is fully replicated directly after container start. This would save the user to call require
/library
on those packages manually.
The only way to restore an interactive session with required libraries seems to be defining a Rprofile.site
file and setting R_PROFILE environment variable to its location (using ENV
instruction)
The R_PROFILE file must contain a .First
- function which attaches required packages using require(...) (or library)?
load namespaces via requireNameSpace() --> create instruction CMD ["R"] at the end of the Dockerfile
May extend #37
Users may include R objects from their R workspace in a restored session.
Therefore, the dockerfile()-method has a parameter 'objects' that is not yet implemented.
The objects-parameter takes a character vector containing the names of the objects, as returned by the function ls()
The objects are saved to an RData-file that is copied to the image at the location of the R working directory. If the file has no name (".RData"), R by default loads it automatically into the session on startup.
Alternatively, users may "load" the file manually from the working directory.
Append all Dockerfiles used in FROM
statements for a complete one-document Dockerfile. Could be interesting to evaluate for reproducibility - what is really installed?
"Unchain a Dockerfile"... _how useful can this be?
See e.g. Debian testing Dockerfile: https://github.com/tianon/docker-brew-debian/blob/9b1dd4b1594b8df02f7caa739e84b187edaab404/testing/Dockerfile
copy = script
(default) copies the supplied script to the image,copy = script_dir
also copies the script and all files / directories of the same foldercopy
takes a list of files and directories to be copied to the foldercmd
parameter can be set with Cmd_Rscript("path/to/script") resulting toCMD ["Rscript", "--save", "/path/to/Rscript"]
- [ ] 1. Default: execute a script locally and reproduce the session that results by the end of the script
- [ ] 2. copy_script = TRUE
also copies the script
- [ ] 3. copy_parent = TRUE
also copies the script and all files / directories of the same folder
- [ ] 4. batch_exec = TRUE
also copies script and sets CMD instruction to
CMD ["Rscript", "--save", "/path/to/Rscript"]
- [ ] 5. copy_files
takes a list of files and directories to be copied to the folder
- [ ] 6. Test 1-5 with test scripts
Issues (to be discussed later)
Provide a path to a vignette, then it gets packaged as the main document in a container (build at start time of the container, container finished when vignette is build).
Base on https://docs.docker.com/engine/reference/builder/
See if stuff from traitecoevo/dockertest
https://github.com/traitecoevo/dockertest/search?utf8=%E2%9C%93&q=dockerfile can just be reused?
It might be quite hard for long running system calls, but maybe we find a way to add a progress bar.
Use https://github.com/ropensci/datapack (see also https://cran.rstudio.com/web/packages/datapack/index.html)
Could be useful to install packages via packrat: https://github.com/rstudio/packrat
It is completely correct that right now there is an error message when a session is packaged, because containeRit itself is not published online: "Failed to identify source for package containeRit. Therefore the package cannot be installed in the docker image."
However, it should not be common to "package the packaging lib", so we should add an option add_self = false
to the dockerfile(..)
function that does by default not try to add the containeRit package itself to the image.
Simply copy over packages from lib directory / search path.
Throw errors if OS of host and container are not compatible.
Package a script and add, outside of the script, some result testing, i.e. a validation that the script has succeeded.
Probably use testthat
for it.
RStudio has some nice ways to create a GUI (when package is used in RStudio presumably), so when needing user input this could be handy
See RStudio Add-inns: https://rstudio.github.io/rstudioaddins/
Issue #33 suggests to determine system dependencies based on https://github.com/rstudio/shinyapps-package-dependencies
However, as previous discussion in #33 shows, system dependencies are not explicit for all packages, for instance rgdal
. That is because those dependencies that are listed in the basic Dockerfile seem to be pre-assumed, while the scripts from the 'packages' folder apply to packages that rely on additional dependencies. Hence, in order to rely on the shinyapps-package-dependencies, we would need to install all dependencies from the basic Dockerfile (or use the shinyapps image as the base image) which can result in unnecessary overhead.
Moreover, the shell scripts are made for ubuntu/linux only and therefore may not be applied to all potential base images.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.