traitecoevo / data_versioning Goto Github PK
View Code? Open in Web Editor NEWAn approach for practical and simple data versioning in R
An approach for practical and simple data versioning in R
wcornwell/taxonlookup
, traitecoevo/baad.data
plant_lookup.csv
, baad_data.zip
read_csv
, baad's bespoke functionsBasically defining the types of problems and the proposed solutions
Does one exist already?
Throughout we use dataset and database somewhat interchangeably. Can I standardise?
My preference is to label any single product being delivered via datastorr a dataset. And reserve use of database for things like genbank.
Thoughts anyone?
for the sake of marketing and communications, i think we should pick a consistent name for the concept we are discussing.
lightweight versioned data (or, LVD in the ms) could work but of course there are many alternatives
Assuming we write a paper, where to submit:
Do we comment on licenses? Currently using MIT (taxonlookup), BSD_2 (baad.data)
data archiving for papers is more-or-less solved. see 384b57b
Maybe we need to pull out the CKAN, Fli, Dat and put it at the end, so that we separate what is possible now, from where things are going in the future?
what should go here (to replace 'REFS')? https://github.com/traitecoevo/data_versioning/blob/master/ms.tex#L123
options:
Why We Need Versioned Data and an Easy Workflow for Setting It Up
Versioned data: why it's needed and how it can be achieved easily, cheaply, and now
There's a lot of prior work in this space, and @cboettig tells me that people have tried this approach and ended up in a mess. Do we solve this problem? Or are we working with data that will always be simple enough to not get in a quagmire? How do people tell when they need to move to something more heavyweight?
Cut?
I really think that Daniel should go first (I thought I added comments on this in my edits but obviously didn't). Daniel and Will have clearly done the most work and are still actively engaged in careers where authorship order matters!
As pointed out by @cboettig, think about what happens with forked data. Do forks get numbers? What happens on a merge? (technically, can gh forks handle releases?)
Seems like a good time to assign an icon to datastorr. Or even a hex sticker.
In #11 I put in a placeholder.
From memory storr is named after the old man of storr, so icon could be an outline of a mountain range?
apparently there's an r package for creating Hex stickers, suitably named hexSticker
At @richfitz -- what's your thoughts on this?
Do we want to include examples that are not yet public, like fungaltraits and austraits?
@richfitz: what other examples of people using datastorr you know of?
I'm thinking that this whole paragraph could be cut without much loss as it seems largely repetitive with earlier text.
overall, i think we should keep the discussion/future directions section as short as possible since most of our good points are made throughout the other sections
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.