Coder Social home page Coder Social logo

Comments (17)

timclark avatar timclark commented on August 30, 2024 1

I think it is unlikely that GitHub or BitBucket meet the criteria today, as-is - same is likely true for various data repositories. This is a new set of requirements and you would certainly expect that every player in the ecosystem will have to adapt and change to make it work. So there needs to be a discussion.

Best

Tim

Timothy Clark, Ph.D.
Assistant Professor of Neurology, Harvard Medical School
Director of Informatics, MassGeneral Institute for Neurodegenerative Disease
Computer Scientist, Massachusetts General Hospital
co-Director, Data and Statistics Core, Massachusetts Alzheimer Disease Research Center
website: http://mindinformatics.org http://mindinformatics.org/ mobile: +1 617-947-7098 fax: +1 617-213-5418
ORCID ID: 0000-0003-4060-7360

On Jun 9, 2016, at 4:03 PM, Arfon Smith [email protected] wrote:

Landing page will have metadata that describes the data set or sets, provides its machine readable standard citation, tells you about its size, data license and access restrictions/conditions/terms of use if any, embargo status if any, and if dataset has been de-accessioned the landing page remains in place retaining the metadata after the data goes away & indicates de-accessioning status.

👍 thanks @timclark https://github.com/timclark. So in theory, if a repository host such as GitHub offered some guarantee of archiving/longevity then it would potentially meet the rest of the criteria?

BTW, I'm not asking these questions to be difficult :-) - I'm trying to figure out how far away GitHub/BitBucket/GitLab are from meeting the principles as-is.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub #144 (comment), or mute the thread https://github.com/notifications/unsubscribe/AGT3j1C0vih5nDw4w8Z9-9PQTm1CmgMIks5qKHGMgaJpZM4IwPN0.

from force11-scwg.

matthewturk avatar matthewturk commented on August 30, 2024 1

One possibility is to use the content negotiation that CrossRef provides to supply RDF/DOAP/etc information that includes all the appropriate metadata. It could in principle even include additional formats if the DOI is registered through DataCite, since the DataCite schema 3.1 allows for formats to include MIME types, extensions, etc. Having the landing page include the metadata seems necessary, but additionally providing the SCWG metadata in an additional format supplied and/or embedded in additional content negotiable formats seems quite nice as well.

from force11-scwg.

arfon avatar arfon commented on August 30, 2024

Errr, I don't think so? Is http://dx.doi.org/10.5281/zenodo.49771 an example of a 'landing page with metadata and a link to software'?

from force11-scwg.

danielskatz avatar danielskatz commented on August 30, 2024

Yes. My thought, based on a comment at the Software and Data Citation workshop yesterday, is that the principles could currently be read to say that the identifiers should point to the software directly, such as a URL to a github repo. I don't think this is what we want.

from force11-scwg.

arfon avatar arfon commented on August 30, 2024

Yes. My thought, based on a comment at the Software and Data Citation workshop yesterday, is that the principles could currently be read to say that the identifiers should point to the software directly, such as a URL to a github repo. I don't think this is what we want.

I'm not sure that I understand why pointing to a GitHub repository directly would be a bad idea (aside from the concern about archiving/longevity). Was there discussion why we wouldn't want this?

from force11-scwg.

timclark avatar timclark commented on August 30, 2024

Hi Arfon,

What we say in data citation is that you should not point to a dataset directly from the citation, you should point to a landing page.

Landing page will have metadata that describes the data set or sets, provides its machine readable standard citation, tells you about its size, data license and access restrictions/conditions/terms of use if any, embargo status if any, and if dataset has been de-accessioned the landing page remains in place retaining the metadata after the data goes away & indicates de-accessioning status.

Tim


Timothy Clark, Ph.D.
Assistant Professor of Neurology, Harvard Medical School
Director of Informatics, MassGeneral Institute for Neurodegenerative Disease
Computer Scientist, Massachusetts General Hospital
co-Director, Data and Statistics Core, Massachusetts Alzheimer Disease Research Center
website: http://mindinformatics.orghttp://mindinformatics.org/ mobile: +1 617-947-7098 fax: +1 617-213-5418
ORCID ID: 0000-0003-4060-7360

On Jun 8, 2016, at 9:51 AM, Arfon Smith <[email protected]mailto:[email protected]> wrote:

Yes. My thought, based on a comment at the Software and Data Citation workshop yesterday, is that the principles could currently be read to say that the identifiers should point to the software directly, such as a URL to a github repo. I don't think this is what we want.

I'm not sure that I understand why pointing to a GitHub repository directly would be a bad idea (aside from the concern about archiving/longevity). Was there discussion why we wouldn't want this?


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHubhttps://github.com//issues/144#issuecomment-224595141, or mute the threadhttps://github.com/notifications/unsubscribe/AGT3jwsn6praaBu_EgXRHZ1iUpfaKgcKks5qJsj7gaJpZM4IwPN0.

The information in this e-mail is intended only for the person to whom it is
addressed. If you believe this e-mail was sent to you in error and the e-mail
contains patient information, please contact the Partners Compliance HelpLine at
http://www.partners.org/complianceline . If the e-mail was sent to you in error
but does not contain patient information, please contact the sender and properly
dispose of the e-mail.

from force11-scwg.

danielskatz avatar danielskatz commented on August 30, 2024

This was exactly my thought. The concern is archiving/longevity.

We say:

(4) Persistence: Unique identifiers and metadata describing the software and its disposition should persist – even beyond the lifespan of the software they describe.
(5) Accessibility: Software citations should permit and facilitate access to the software itself and to its associated metadata, documentation, data, and other materials necessary for both humans and machines to make informed use of the referenced software.
(6) Specificity: Software citations should facilitate identification of, and access to, the specific version of software that was used. Software identification should be as specific as necessary, such as using version numbers, revision numbers, or variants such as platforms.

Given this, I would like to add something to the discussion that roughly corresponds with the data language Tim mentioned.

@kyleniemeyer, can you add this to the discussion document that contains the community feedback so we can track this?

from force11-scwg.

kyleniemeyer avatar kyleniemeyer commented on August 30, 2024

@danielskatz will do! I have all the feedback collected in a document, just need to clean it up a bit.

from force11-scwg.

arfon avatar arfon commented on August 30, 2024

Landing page will have metadata that describes the data set or sets, provides its machine readable standard citation, tells you about its size, data license and access restrictions/conditions/terms of use if any, embargo status if any, and if dataset has been de-accessioned the landing page remains in place retaining the metadata after the data goes away & indicates de-accessioning status.

👍 thanks @timclark. So in theory, if a repository host such as GitHub offered some guarantee of archiving/longevity then it would potentially meet the rest of the criteria?

BTW, I'm not asking these questions to be difficult :-) - I'm trying to figure out how far away GitHub/BitBucket/GitLab are from meeting the principles as-is.

from force11-scwg.

danielskatz avatar danielskatz commented on August 30, 2024

I don't think GitHub is set up (or really ever will be) to host sufficient metadata.

Just thinking about authors as one type of metadate, where would GitHub list them, and how would they be tied to specific releases (the things we are asking people to cite)?

from force11-scwg.

arfon avatar arfon commented on August 30, 2024

I don't think GitHub is set up (or really ever will be) to host sufficient metadata.

I would argue that's what the API is for?

from force11-scwg.

danielskatz avatar danielskatz commented on August 30, 2024

I don't know which API you mean.

I think we say in the principles document that we want people to be able to cite software, and we know that there is some metadata that we need to have included in the citation. Once a person goes to the location specified by the unique identifier in the citation, that metadata should be shown.

To me, this is what a landing page does.

from force11-scwg.

arfon avatar arfon commented on August 30, 2024

I don't know which API you mean.

I mean things like this - here are the changes between two releases of a package via the GitHub API: https://api.github.com/repos/github/linguist/compare/v4.8.5...v4.8.6

Once a person goes to the location specified by the unique identifier in the citation, that metadata should be shown.

To me, this is what a landing page does.

👍 ok fair enough.

from force11-scwg.

timclark avatar timclark commented on August 30, 2024

Right

On Jun 9, 2016, at 5:37 PM, Daniel S. Katz [email protected] wrote:

I don't know which API you mean.

I think we say in the principles document that we want people to be able to cite software, and we know that there is some metadata that we need to have included in the citation. Once a person goes to the location specified by the unique identifier in the citation, that metadata should be shown.

To me, this is what a landing page does.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub #144 (comment), or mute the thread https://github.com/notifications/unsubscribe/AGT3jxcYAZFBzg5ByPwy88cTuaenUkHbks5qKIeMgaJpZM4IwPN0.

from force11-scwg.

arfon avatar arfon commented on August 30, 2024

I think it is unlikely that GitHub or BitBucket meet the criteria today, as-is - same is likely true for various data repositories.

👍 that's a useful clarification. Thanks @timclark

from force11-scwg.

danielskatz avatar danielskatz commented on August 30, 2024

I think we need to add some discussion about this, and in particular, mention some examples of how software can be published, such as via Zenodo - this has come up in my current Dagstuhl meeting

from force11-scwg.

kyleniemeyer avatar kyleniemeyer commented on August 30, 2024

After discussion with @arfon and @danielskatz, I drafted a new subsection addressing this issue via 2a647eb.

from force11-scwg.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.