Coder Social home page Coder Social logo

data_versioning's People

Contributors

dfalster avatar richfitz avatar wcornwell avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

smwindecker

data_versioning's Issues

Minimal set of information required

  • repo name: wcornwell/taxonlookup, traitecoevo/baad.data
  • single file name: plant_lookup.csv, baad_data.zip
  • hook to load into R: read_csv, baad's bespoke functions
  • additional version hook? (not clear, used in baad)

Dataset or database

Throughout we use dataset and database somewhat interchangeably. Can I standardise?
My preference is to label any single product being delivered via datastorr a dataset. And reserve use of database for things like genbank.

Thoughts anyone?

concept needs a proper name

for the sake of marketing and communications, i think we should pick a consistent name for the concept we are discussing.

lightweight versioned data (or, LVD in the ms) could work but of course there are many alternatives

Possible journals

Assuming we write a paper, where to submit:

  • Scientific Data has an Article section: The ‘Article’ format can be used to present original reports on systems or techniques that clearly advance data sharing and reuse to support reproducible research. This includes research on sharing, managing and processing scientific research data. Articles describing data repositories, standards and ontologies are welcome when they include compelling demonstrations of data exchange, enrichment or knowledge generation made possible by the system or standard.
  • PLoS Computtational Biology Research articles must be declared as belonging to one of the following categories: General, Methods or Software. Software articles form a specific sub-category. ...Research articles specifically designated as Methods papers should describe outstanding methods of exceptional importance that have been shown, or have the promise to provide new biological insights. The method must already be widely adopted, or have the promise of wide adoption by a broad community of users. Enhancements to existing published methods will only be considered if those enhancements bring exceptional new capabilities. Methods articles and Software articles require presubmission inquiries.
  • Methods in Ecology & Evolution: As an application note. We have a good track record here.

Licenses

Do we comment on licenses? Currently using MIT (taxonlookup), BSD_2 (baad.data)

Minor things needed for submission

  • Abstract (WC)
  • Keywords
  • Import Figures #10 (WC)
  • Try new figure #11 (DF)
  • Discussion (limited). General probelm. An example of solution. Key features (WC, DF)
  • MP comments, needs clearer exposition of how this would work. Currently have lists of benefits, but no overview (MP)
  • Concise description of what datastorr does (MP)
  • Title for paper #13
  • Better name for concept #12
  • More citations of prior work
  • One more review of tables (DF)

a better title

options:

  • Why We Need Versioned Data and an Easy Workflow for Setting It Up

  • Versioned data: why it's needed and how it can be achieved easily, cheaply, and now

Prior work

There's a lot of prior work in this space, and @cboettig tells me that people have tried this approach and ended up in a mess. Do we solve this problem? Or are we working with data that will always be simple enough to not get in a quagmire? How do people tell when they need to move to something more heavyweight?

authorship reorder

I really think that Daniel should go first (I thought I added comments on this in my edits but obviously didn't). Daniel and Will have clearly done the most work and are still actively engaged in careers where authorship order matters!

Forks

As pointed out by @cboettig, think about what happens with forked data. Do forks get numbers? What happens on a merge? (technically, can gh forks handle releases?)

Datastorr needs a icon or hex sticker

Seems like a good time to assign an icon to datastorr. Or even a hex sticker.

In #11 I put in a placeholder.

From memory storr is named after the old man of storr, so icon could be an outline of a mountain range?

apparently there's an r package for creating Hex stickers, suitably named hexSticker

At @richfitz -- what's your thoughts on this?

note on discussion

I'm thinking that this whole paragraph could be cut without much loss as it seems largely repetitive with earlier text.

overall, i think we should keep the discussion/future directions section as short as possible since most of our good points are made throughout the other sections

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.