Documents to be used in our FAKIN project (in German)
kwb-r / fakin.doc Goto Github PK
View Code? Open in Web Editor NEWBest Practices in Research Data Management
Home Page: https://kwb-r.github.io/fakin.doc
Best Practices in Research Data Management
Home Page: https://kwb-r.github.io/fakin.doc
https://github.com/ropensci/hydrology, also lists kwb.hantush :-)
Least metadata requirements for R scripts:
After talking with @daniel-wicke today on publically publishing two R packages used in the project-ogre (see KWB-R/kwb.ogre#2 and KWB-R/kwb.ogre.model#1) it became obvious that we currently are lacking a company wide strategy for publishing code.
For this a workflow should be developed within FAKIN and implemented in the QMS. This for sure requires that the KWB management and the department leaders
I would propose the following:
100% publically sponsored projects (e.g. BMBF, EU, and so on): source code will always be published on https://github.com/kwb-r as public repository (i.e. it will be accessible for everyone) in case it is possible to the code does not contain security critical paths (e.g. to our company server) or confidential data. Code should be developed in such a way that ideally does not include both (security critical paths and confidential data). Making the code openly available will decrease our burden to install them (e.g. not each student needs to get an "access" token to install private repositories, as required for "contract" projects, see below).
Contract projects (BWB, Veolia): will be published as private repositories by default on https://github.com/kwb-r in case that the funder does not pre-define a specific workflow.
Could this topic also be addressed within one of the next management meetings @chsprenger ?
Use case with code: https://github.com/kwb-r/kwb.qmra/binder
Note:
My-binder support could be integrated as function in kwb.pkgbuild
I don't know why, but no "index.html" gets currently created anymore, thus http://kwb-r.github.io/fakin.doc/ displays the document not by default.
However, this works for a similar setting, e.g. here:
https://github.com/rstudio/bookdown/tree/master/inst/examples
Both chapters are very similar (copied "Best-practices and the second developed by @hsonne) and an should be merged into one:
https://github.com/KWB-R/fakin.doc/blob/master/02_data_storage.Rmd#data-workflow
pandoc: Cannot decode byte '\xd8': Data.Text.Internal.Encoding.Fusion.streamUtf8: Invalid UTF-8 stream
Error: pandoc document conversion failed with error 1
More info: https://travis-ci.org/KWB-R/fakin.doc/builds/391017660
Since: 7d97f9b
HI Michael, we've upgraded your quota for the time being to 10 hours/50 GB and we'll reset the official plan designation later today. But you have all of the privileges of the pro account as of now and anyone else who signs up via the www.kompetenz-wasser.de domain should be automatically upgraded as well.
Best,
Seth Green
Developer Advocate
From FAQ: https://codeocean.com/plans
What is included in the Researcher plan?
The researcher plan includes everything you need to get started, explore and run code, download code and data, unlimited compute capsule publishing, privately modify published code, collaborate with peers, embed code onto your personal site. Everyone is allotted 5GB of storage and 1 hour of compute time per month. Use your academic email to get 20GB of storage and 10 hours of compute time per month.
To be discussed by department leaders in next management meetings, due to high demand by students and also scientists who are eager to improve their R skills.
Within the new KWB project UFOPLAN BaSaR, R scripts already used in OgRe should be reused.
As the project is at the beginning with regular sampling starting next week it is a good time for optimising the data workflow according to Daniel.
He hopes to get some recommendations on how to improve the current folder structure in order to make it easy in the future for being integrated in the workflow proposed by FAKIN.
However, it needs to be assured that R scripts work also on field laptops without connection to the KWB intranet (i.e. adapting folder paths with minimal effort)
Links to files on intranet are not opened in firefox due to security policies in firefox.
see here for problem description: http://kb.mozillazine.org/Links_to_local_pages_do_not_work
Workaround with firefox addon: LocalFileSystemLinks
Solved with: https://addons.mozilla.org/de/firefox/addon/local-filesystem-links
Could be placed in the FAQ Part of the document
A research compendium accompanies, enhances, or is a scientific publication providing data, code, and documentation for reproducing a scientific workflow. It can be published on different platforms using the label (or tag, community, ...)
research-compendium (applied on GitHub, Zenodo, OSF) or as a fallback the term "research compendium" in the description (used on GitLab). The Zenodo community even has a curation policy for the accepted records.
Reminder to myself to add the following references in the relevant chapters of the report, e.g.:
Why to share Code?
Baker, M. Why scientists must share their research code. Nature News
http://dx.doi.org/10.1038/nature.2016.20504 (2016).
Barnes, N. Publish your computer code: it is good enough. Nature 467, 753 (2010).
http://dx.doi.org/10.1038/467753a
Uncategorized
McKiernan, E. C. et al. How open science helps researchers succeed. eLife 5, e16800 (2016).
https://elifesciences.org/articles/16800#
Baker, M. Scientific computing: Code alert. Nature 541, 563–565 (2017).
https://dx.doi.org/10.1038/nj7638-563a
Broman, K. Initial steps toward reproducible research.
http://kbroman.org/steps2rr/ (2016).
Martinez, C. et al. Reproducibility in Science: A Guide to Enhancing Reproducibility in Scientific Results and Writing
https://ropensci.github.io/reproducibility-guide/ (See also ropensci-archive/reproducibility-guide#86)
Michener, W. K. Ten simple rules for creating a good data management plan. PLoS Comput. Biol. 11, e1004525 (2015).
https://doi.org/10.1371/journal.pcbi.1004525
Goodman, A. et al. Ten simple rules for the care and feeding of scientific data. PLoS Comput. Biol. 10, e1003542 (2014).
https://doi.org/10.1371/journal.pcbi.1003542
Best practices scientific computing
Sandve, G. K., Nekrutenko, A., Taylor, J. & Hovig, E. Ten simple rules for reproducible computational research. PLoS Comput. Biol. 9, e1003285 (2013).
https://doi.org/10.1371/journal.pcbi.1003285
Wilson, G. et al. Best practices for scientific computing. PLoS Biol. 12, e1001745 (2014).
https://doi.org/10.1371/journal.pbio.1001745
Why GitHub?
Perkel, J. Democratic databases: Science on GitHub. Nature 538, 127–128 (2016).
All above references found in:
Lowndes, Julia S Stewart, Benjamin D Best, Courtney Scarborough, Jamie C Afflerbach, Melanie R Frazier, Casey C O’Hara, Ning Jiang, and Benjamin S Halpern. 2017. “Our Path to Better Science in Less Time Using Open Data Science Tools.” Nature Ecology & Evolution 1 (6). Nature Publishing Group
https://www.nature.com/articles/s41559-017-0160
In addition add new R Markdown Book (https://bookdown.org/yihui/rmarkdown/) nd Link to R package rticles, which provides templates for writing journal articles in R markdown.
Practical Data Science for Stats - a PeerJ Collection 2018
https://peerj.com/collections/50-practicaldatascistats/
Broman KW, Woo KH. (2017) Data organization in spreadsheets. PeerJ Preprints 5:e3183v1
https://doi.org/10.7287/peerj.preprints.3183v1
Nüst et al 2018 Reproducible research and GIScience: an evaluation using AGILE conference papers
https://peerj.com/articles/5072/
Reproducible research: Strategies, tools, and workflows
http://www.helsinki.fi/varieng/series/volumes/19/flanagan/#sect1
Leek and Peng 2015: Opinion: Reproducible research can still be wrong: Adopting a prevention approach
https://doi.org/10.1073/pnas.1421412111
Sackler Colloquium on Improving the Reproducibility of Scientific Research (Free Online) http://www.pnas.org/content/115/11 , e.g.:
An empirical analysis of journal policy effectiveness for computational reproducibility
https://doi.org/10.1073/pnas.1708290115 Scientific progress despite irreproducibility: A seeming paradox
https://doi.org/10.1073/pnas.1711786114All researchers at KWB should have an ORCID and add at least their work published at KWB.
On the official KWB website this link should be added for each researcher for finding more info.
In case R packages are published use of ORCID is mandatory and included in KWB style package template (see https://kwb-r.github.io/kwb.pkgbuild/articles/tutorial.HTML)
In fakin.doc: add info table with people at KWB who already have one
codemetar (https://ropensci.github.io/codemetar)
"We recommend you to use the codemetar (https://ropensci.github.io/codemetar) package for creating and updating a JSON CodeMeta metadata file (https://codemeta.github.io/) for your package via codemetar::write_codemeta(). It will automatically include all useful information, including GitHub topics. CodeMeta uses schema.org terms so as it gains popularity the JSON metadata of your package might be used by third-party services, maybe even search engines. " (https://ropensci.github.io/dev_guide/building.html#creating-metadata-for-your-package)
dataspice (https://github.com/ropenscilabs/dataspice)
The goal of dataspice is to make it easier for researchers to create basic, lightweight and concise metadata files for their datasets. These basic files can then be used to:
Metadata fields are based on schema.org and other metadata standards.
R package: dirdf - Extracts Metadata from Directory and File Names https://github.com/ropenscilabs/dirdf
Create tidy data frames of file metadata from directory and file names.
Improved folder structure analysis and documentation using function kwb.fakin::plot_path_tree()
(to be added in kwb.geosalz documentation Workflow documentation
for process check https://kwb-r.github.io/kwb.geosalz/dev/articles/workfow.html Add short documentation in fakin.doc
for progress check https://kwb-r.github.io/fakin.doc/case-studies.html#geogenic-salinationDevelop a batch script that tracks all file/folder changes on the KWB servers
/projekte$
/processing
/rawdata
To be used for Brownbag and later as general tool to identify when (un)intended changes in folder structure occured.
Another example is to enable comments or discussions on your HTML pages. There are several possibilities, such as Disqus (https://disqus.com) or Hypothesis (https://hypothes.is). These services can be easily embedded in your HTML book via the includes option (see Section 5.5 for details).
--- Source: bookdown.org
Good example for integration can be found here:
https://benmarwick.github.io/bookdown-ort/mods.html
needs to be updated with correct server paths!!!!
The content here is copied from "Best-practices workshop and the second developed by @hsonne).
This should be wrapped up into one!
FDM building blocks
https://bausteine-fdm.de/index
Introduction to YAML
https://learn.getgrav.org/15/advanced/yaml
Folder Structure Analysis with Python
https://janakiev.com/blog/python-filesystem-analysis/
Store QGIS cache on a fast disk (and not on our server!)
https://courses.neteler.org/gaining-wms-speed-enabling-qgis-cache-directory/
see: http://www.loc.gov/marc/relators/relaterm.html
E.g. in R packages (http://r-pkgs.had.co.nz/description.html#author), but also in general at KWB
As used by Zenodo https://www.crossref.org/services/funder-registry/
For example BMBF: http://dx.doi.org/10.13039/501100002347
https://ropenscilabs.github.io/drake-manual
Start with https://github.com/KWB-R/GeoSalz (formerly before renaming repo and transfering to KWB-R: https://github.com/mrustl/Testprojekt_01)
Improved folder structure analysis and documentation using function kwb.fakin::plot_path_tree()
to be added in kwb.geosalz documentation
Add feature in R pakcage kwb.umberto
for process check KWB-R/kwb.umberto#2
Workflow documentation
Add short documentation in fakin.doc
for progress check https://kwb-r.github.io/fakin.doc/case-studies.html#lca-modelling
https://github.com/KWB-R/kwb.orcid
Supports: #22
kwb.logger (OPTIWELLS, MIA-CSO, Ogre)
kwb.monitoring (Ogre, Flusshygiene)
This page let's you test out regular expressions. Do we already have a chapter about the importance of regular expressions? We should.
I found this link in the DataCamp Course "Intermediate R - Practice"
devtools::check_win_devel()
rhub::check_for_cran()
revdepcheck::revdep_check(num_workers = 4)
email.yml
then revdepcheck::revdep_email_maintainers()
devtools::check_win_devel()
(again!)devtools::submit_cran()
pkgdown::build_site()
Template from r-lib/usethis#338
Too advanced for us as focuss is on CRAN release (as for ggplot2 3.0 tidyverse/ggplot2#2568) but we can use it as a starting point. For R packages Andi`s "kwb.resilience" could be our first use case
How to create a Github styled TO DO list in rmarkdown
(for details see issue #23)Gitlab Gold Features:
https://about.gitlab.com/pricing/gitlab-com/feature-comparison/
Conditions:
https://gitlab.com/gitlab-com/gitlab-oss
According to pandoc tricks, you may first download the task-list.lua file and save it in $DATADIR/pandoc/filters/, so it will be visible to pandoc system-wide, then run pandoc --lua-filter=task-list.lua -o filename.html filename.md
Source: https://stackoverflow.com/questions/28628903/to-do-list-in-rmarkdown
This is linked to issue #18
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.