carpentries-incubator / fair-bio-practice Goto Github PK

View Code? Open in Web Editor NEW

8.0 8.0 12.0 133.48 MB

FAIR in (biological) practice

Home Page: https://carpentries-incubator.github.io/fair-bio-practice/

License: Other

Ruby 0.15% Makefile 1.42% R 1.42% Shell 0.15% Python 13.31% Jupyter Notebook 83.57%

beta carpentries-incubator english fair lesson life-sciences

fair-bio-practice's People

Contributors

Stargazers

Watchers

Forkers

boehmin ben-thomas-ed robertn01 smcclatchy ewallace seifip embl-bio-it adyork foxdatamanager biordm hhamazaki

fair-bio-practice's Issues

Enrich public omero record to use it for metadata example/show case

The image descriptoin text in the lessons keeps getting reformatted by editors, also it is too large for powepoint.

I checked that actually omero font can be enlarge and we could easily show the metadata in life public omero.
It will add the WOW effect, plus FAIR in practice, and first glimpse on OMERO

In order to explain the aministrative/descriptive metadata few details are missing, which hopefully you could
add as key value pairs to the data set (only)

https://publicomero.bio.ed.ac.uk/webclient/?show=dataset-263

So the key values could be
creator: Euque
createor ORCID: xxx
curator: Andrew
curator ORCID: xxx
funder: BBSRC
funding: BB/P001335/1
funding: BB/R50614X/1

(there are more funders there but that is enough).

You are the owner of so it is easier for you, sorry.

Add text on patents and licences

After the barriers description. Need to clarify date for patent and copyright. The info that data cannot be copyrighted etc.

Describe the barriers and risks of OS movement:

OS content

Lots of lesson content missing. Marked with TODOs and ideas for the text.

Missing OS parts

Scientific communities, citzen science, and educative resources missing from description

Episode 02 - OS Introduction

Hi All,

I just pushed Ben's changes to the ep-open-science branch with a couple of edits from myself.

Could you check if you're happy with it, if you're missing anything major/minor?
Tomasz, it might be easier if you format it the way you had it in mind in regard to headers etc... I added a couple for better flow of the lesson and as visual aid to break up the big chunks of text as it looked a little grim without. But you might have wanted the text in boxes?

If there's anything you'd like us to add/change, let us know.

Cheers,
Ines

Impossible number (fair chapter)

Example of a figure for which numerical data are not attached and could be useful.

AM was suggesting ideas for possible modeling papers at one of the meeting.

Or I believe benedict papers has leaf heatmaps but not individual timeseries.
Anything that shows importance of underlining numerical data

Using parts of 11-version-control

Thank you for providing this course. I am trying to update our episode about versioning and I intend to use parts of your episode. If successful the new episode will be a part of https://github.com/NBISweden/workshop-dm-practices.

Best regards,
Erik

add where to links in OS

Links to some additional materials/publications about OS.

Check rightfield is valid candidate for templating

https://rightfield.org.uk/

Find righfield paper and check who cites it.
There maybe some newer tools for making metadata templates, if they are they should cite this one so that may be the easiest way to find it.
Download and test if it still works on current Office/Win10/Mac
Discuss do we want to teach it

Repositories examples

We need 4 or 5 domain specific repositories.

Selection of the domains should be based on "most likely being used" by participants.
Or cuase it is a "perfect repository" ie having all cool featuers like rich metadata plus some domain specific perks (I dont believe it existis, somethink like biodare but with good metadata.

Imaging repo should defenitelly be there (the cool feature is that you can see the images), big contrast with zenodo data set.
There is:
https://idr.openmicroscopy.org/

It should be enough, but maybe Ines knows a Nueroscience imaging repo, maybe they are better and we would have medical examples.

Something for NGS or genomics in general. I believe those data are good for re-use
Other omics? There is metabolights repo, if we go with ISA tab templates than this repo is a good compliment.
??I dont know?? Microarrays ... but are microarrays still "cool" technique and being commonly used in studies or are they replaced by NGS like methods.
There is SynBioHub which we know. The cool things are the functional glifs and the annotated sequences, but also FAIR links to definitions of roles etc. OK the data is just the one sbol file that defines design, but my guess is that sharing plasmids structure is quite common task, even for no synthetic biology labs.

BioDare is always an alternative as it has cool factor in visualization and period anlaysis. But it is not "data type" specific but more discipline specific. Also, it is not super FAIR repo at the moment.

For the selected repositories, we need a showcase record == well describe dataset which people can look and be wowed.
Finding example by browsing in domain specific repos should be "relatively" easy, we should not need to make fake deposits.

Interoperable - Intelligible (FAIR and you excerise)

What is the personal benefit of interoperable - intelligible.

Impossible software

Ask EW about software on demand. His papers are actually very FAIR with shared data and scripts.
Only in https://rnajournal.cshlp.org/content/23/5/601.full are R scripts on demand, but those are just for plotting.

If not EW than a paper referencing brass is OKish

good file names and sorting

So my example:

2020-07-14_s12_phyB_on_SD_t04.raw.xlsx
2020-07-14_s1_phyA_on_LD_t05.raw.xlsx
2020-07-14_s2_phyB_on_SD_t11.raw.xlsx
2020-08-12_s03_phyA_on_LD_t03.raw.xlsx
2020-08-12_s12_phyB_on_LD_t01.raw.xlsx
2020-08-13_s01_phyB_on_SD_t02.raw.xlsx
2020-7-12_s2_phyB_on_SD_t01.raw.xlsx
AUG-13_phyB_on_LD_s1_t11.raw.xlsx
JUL-31_phyB_on_LD_s1_t03.raw.xlsx
LD_phyA_off_t04_2020-08-12.norm.xlsx
LD_phyA_on_t04_2020-07-14.norm.xlsx
LD_phyB_off_t04_2020-08-12.norm.xlsx
LD_phyB_on_t04_2020-07-14.norm.xlsx
SD_phyB_off_t04_2020-08-13.norm.xlsx
SD_phyB_on_t04_2020-07-12.norm.xlsx
SD_phya_off_t04_2020-08-13.norm.xlsx
SD_phya_ons_t04_2020-07-12.norm.xlsx
ld_phyA_ons_t04_2020-08-12.norm.xlsx

Shows how dates up fron make it difficult to find by genotype/conditions (thogh dates in front may have value if for example content has multiple variables)
1a Ordering by date obscures pattern in conditions/samples
s12 is before s1,s2 if 0 not used
That you need to be numeric in dates (Aug before Jul)
That you need to be consisnten 2020-7 is after 2020-08-13
You should think how you are going to search or looking at the data, We have clear LD vs SD conditions and then organized by genotype
Be careful with cases, ld is after SD, also phya is after phyB
keeping same length of parts makes easier to read, at there is ons and off (on succrose, off succrose) nicely ordered, above there is on and off makes it jumpy

Proofing FAIR content

Episode intro.
Transition from impossible examples to fair. Re-stating what where the issues in the examples.

New draft for episode 04 - 'Introduction to metadata'

The draft episode is ready for scrutiny!

Impossible format

discussed. Dont know if you have anything better than benedict

'Episode 07 - Working with files' needs a good exercise

Hi people!
I just merged the draft episode 07 to gh-pages. You can see the rendered version here:
https://carpentries-incubator.github.io/fair-bio-practice/07-files-organization/index.html

I am trying to think about a good example for the 'folder structure' challenge (should be doable in 5 minutes): right now it asks the students to look at two different folder structures used to organise the same data and decide which is the better option. However, there are different ways to go about this:

We could just present an images of 2 different folder structures, one messy and one good.
We can change the exercise and ask them to create an appropriate folder structure for a project they are currently working on (or a future one) on a github account (this can be reused at a later episode and was an idea we had previously entertained with @tzielins).
We could do sthg completely different.

Any thoughts?

Best,
A

Metadata notebook - is it complete?

Hello! I am just checking the correspondence between each chapter's slides and notebooks, and I realised chapter 5 (Metadata) includes a final "quiz" in the slides that I can't find in the notebook. In addition, the notebook doesn't include any feedback section... is that intended?
Thank you!

example of FAIR data

Example of FAIR data record for excercise (penultimo en capitulo).
Student asked to look at a link and say why a record or dataset is fair.

Anrew check if uniprot has a nice record to use.
According to fair paper:

UniProt26: UniProt is a comprehensive resource for protein sequence and annotation data. All entries are uniquely identified by a stable URL, that provides access to the record in a variety of formats including a web page, plain-text, and RDF (‘F’ and ‘A’). The record contains rich metadata (‘F’) that is both human-readable (HTML) and machine-readable (text and RDF), where the RDF formatted response utilizes shared vocabularies and ontologies such as UniProt Core, FALDO, and ECO (‘I’). Interlinking with more than 150 different databases, every UniProt record has extensive links into, for example, PubMed, enabling rich citation. These links are machine-actionable in the RDF representation (‘R’). Finally, in the RDF representation, the UniProt Core Ontology explicitly types all records, leaving no ambiguity—neither for humans nor machines—about what the data represents (‘R’), enabling fully-automated retrieval of records and cross-referencing information.

So all letters could be there.
If not maybe Dataverse, again

Dataverse makes the Digital Object Identifier (DOI), or other persistent identifiers (Handles), public when the dataset is published (‘F’). This resolves to a landing page, providing access to metadata, data files, dataset terms, waivers or licenses, and version information, all of which is indexed and searchable (‘F’, ‘A’, and ‘R’). Deposits include metadata, data files, and any complementary files (such as documentation or code) needed to understand the data and analysis (‘R’). Metadata is always public, even if the data are restricted or removed for privacy issues (‘F’, ‘A’). This metadata is offered at three levels, extensively supporting the ‘I’ and ‘R’ FAIR principles: 1) data citation metadata, which maps to DataCite schema or Dublin Core Terms, 2) domain-specific metadata, which when possible maps to metadata standards used within a scientific domain, and 3) file-level metadata, which can be deep and extensive for tabular data files (including column-level metadata). Finally, Dataverse provides public machine-accessible interfaces to search the data, access the metadata and download the data files, using a token to grant access when data files are restricted (‘A’).

including Ed-DaSH logo

Hi @tzielins!
To include the logo in all pages I created https://github.com/carpentries-incubator/fair-bio-practice/blob/gh-pages/_includes/logo.md and then referenced it in https://github.com/carpentries-incubator/fair-bio-practice/blob/gh-pages/_includes/links.md. However, sthg is not right. It works on some pages but not in others. It has to do with how the base address is set. Could you take a look?

Best,
A

Suggested corrections

I'm a member of The Carpentries Core Team and I'm submitting this issue on behalf of another member of the community. In most cases, I won't be able to follow up or provide more details other than what I'm providing below.

What goals do you have for the "follwing" days? > What goals do you have for the following days?
All the practices that enable others to access and use your outcomes directly benefit you and your group" " > All the practices that enable others to access and use your outcomes directly benefit you and your group.
We will start with explaining Open Science principles "and what the benefits are of being open for you and society". > We will start with explaining Open Science principles and how they stand to benefit you and society.
There should be no space between this final bullet and the prior bullet. > "Day 4 We will talk about Version Control. We will consolidate our knowledge of FAIR ready data management and what other tools can help you during your research."
"using" pad, answering questions in pad > Using pad, answering questions in pad > In addition, all the bullets in this section should end with full stops for the sake of consistency, and "Etherpad" may be better than "pad".

Fix FAIR quiz questions wording

There were some issues / daubts

F in  FAIR stands for free. FFFFFFFFF

Only  figures presenting results of statistical analysis need underlying numerical     data FFFF?FFFF

Sharing numerical data as a .pdf in repository as Zenodo is FAIR. TFFFFFTF(It's the lowest end of FAIR)+1+1+1

Sharing numerical data as an Excel file via Github is not FAIR. FFFFFFFFF

Metadata standards (for example MIAME MIQE) assure the “IR” in FAIR. TTTTTTT? T?

Group websites are one of the best places to share your data. FFFFFFFFF

Data from failed experiments are not re-usable. FFFFFFFFF

Data should always be converted to Excel or .cvs files in order to be FAIR. FFFFFFFFF (csv not cvs)

A DOI of a dataset helps in getting credit. TTTTTTTTT

FAIR data are peer reviewed. FFFFFFFFF

FAIR data accompany a publication. FF?ideallyFF it's complicated...+1+1+1

Impossible resource (fair chapter)

Suggestion from AM
argonomics DB, so a paper that refers to data in that db (by author swiss group in timet project), maybe with link to db,
db does not exist.
Or argonomics paper if there was one.

Solution to goals of OS

Missing solution, needed as it may be self taught course.

Using this course

Thanks for putting this course together. I am putting together training on FAIR data management for long-term agricultural experiments and parts of this course have been very helpful for organising my own training offering. I will in due course make this available as a carpentry lesson here.

best wishes

Richard

add icons / graphics to OS power point

OS definitions slides are not longer points, need some graphical elements on the slides

Zenodo (or Figshare) showcase

We need a good example dataset in Zenodo (or Figshare, I think I prefer zenodo)

Good data set:

a set, so multiple files inside
prefereble varius types (for example data and results)
with a readme file inside or some metadata info
with a substantial description in zenodo, so no Figure 2 from Paper X. But a paragraph of text.
preferable of some obvious importance

Maybe Andrew know some from his Covid curator activity.

Finding by browsing in zenoodo looks tricky, some search by topic or maybe it will be easier from some paper (how to find papers that deposited to Zenodo though).

It may be easier and faster and more educative to make our own "perfect" deposit, which then will be used for showcase.

Assemble data from your publish papers (faking data for fake deposit is too much), lets base on what you have and published.

For example, based on the full leaf imaging of Benedict (which Andrew recently released):.
Take some of the images, combine with extarcted timeseries, the matlab data file, the matlab code.
Could drop a file with imaging protocol description.
Then a file_organization.txt that explains layout and naming conventions.
Then a readme

And probably compacted readme as zenodo description plus some tags (if zenodo supports those).

I know it is a lot of work. But the perfect example could be also used then in working in files and potentially in writing readmes episode.

Copy pasting from own papers and lab notes may actually be faster than searching in zenodo.
It will all depends if your own data are complex enough, to justify some inner folder structures, naming or intersting readme. I would say any paper that useses more than one experimental technique is a good candidate.

You know your work, so we can have a call and you could show what you could wrap as a deposit.

Folders examples for project organization

Rather than looking at GOF example, we could show that "No size fits all" and let people decide what are pros and cons of different
folders layouts while being shown the "typical" approaches.

For computational projects we have

Left is like GOF structure. We cannot use it directly as it is Figure1 and we say not to name like this.
I for example prefer the second one (right), as the results sits next to the code that generates them.

Similarly we have two options for wet projects:

Bens neuroscience example, by patients (vs)
different conditions/treatment and then samples(so patients)

So it needs drawing or a files for screenshots.

Then we can have two kind of gorups one compares two computational the other typ two wet projects.

Explain difference between FAIR data/science vs open data/science

About https://carpentries-incubator.github.io/fair-bio-practice/01-wellcome/index.html

From my experience people often confuse FAIR data and science with open data and science. It would be good to clarify this from the start in the introduction and go into the details after that in the dedicated sections. Also an exercise to clarify this difference would help. We can use some inspiration and materials from the lesson on "FAIR data for climate sciences" that I helped developing:
https://escience-academy.github.io/Lesson-FAIR-Data-Climate/introduction/index.html

I think that lesson would be a good source of material, also for other episodes.

Episode 6 - Record Keeping ready for review

Hi All,

We've restructured the record keeping lesson and I've hopefully managed to do most changes the way you pictured them @zajawka - since you asked about what happened in the example with Novartis, I added that to the intro paragraph and explicitly stated how FAIR record keeping can help avoid things like that in the future if we implement it.

Please have a read through and if you could try the Benchling exercise to see if all looks fine and goes well. It is simpler as before and focused on provenance as discussed.

Let me know what changes you'd like us to make.

Cheers
Ines

episode 05 - The Research Data Life Cycle draft is ready for initial review

Hi everyone,
I have just pushed episode 05 on a branch I created:
https://github.com/carpentries-incubator/fair-bio-practice/blob/update-episode-04-and-05/_episodes/05-the-research-data-life-cycle.md

Could you please have a look and tell me what you think about it? Any ideas or suggestions to improve it?

Total episode time: 30 minutes max.

All the best,
Andrés

Naming and sorting needs screenshot or files be moved out of challenge

On rendered page, the on vs ons benefits are not clear as the font is not fixed width so the files do not move arroudn so much.

Deal with edit issues recorded in andrews fork

I beliew you already recorded some edit issues, in your fork of biordm repo

Metadata intro is not an intro

Me not like it :)

Producing metadata is not an intro... if we start producing metadata now, what are we going to do in another part.
Or why now we have "silly excercise" now if then we have a better one.

Let's remember we are going to have a proper episode for producing metadata.

At the same time, there are important concepts as annotating with PermId, MIAMIs standards.

Obj1: Know what is metadata
Obj2: Know how to provide metadata
Obj3: How FAIR applies also to metadata

Obj1 & Obj2. We are actually missing example of metadata. Remember that at workshop students do not look at the text,
so we need an examples to look at while intstructor explains the concept of metadata and it types.

I propose to have a readme like text (half page) that describes a data file
and a second example table data with some embeded metadata

But lets have them buffy, half page readme so it has details. Same for data table. It should be obvious it takes time to produce.

Excercies:

Identify the 3 types of the metadata in the examples
think which part of metadata in the examples can be treated as data or reverse (e.g. if there two strains names strain is probably a data not a metadata)

Obj3.
The dobbleganger is funny, but, making a record with just orcid ....

What about showing publication record that uses orcid,
for example https://wellcomeopenresearch.org/articles/5-96/v2
and ask to click on authors orid which takes to their orcid pages and their own work not doblegangers.

If we have time we could change this example to another wellcome paperwith some common name like John Smith or an Asian one as they have real problem of having a lot of dopplegangers (a quick search did not give me nice example).

Metadata standards needs more attention.
I used before https://fairsharing.org/standards/ to find some, in the linked DCC I could not find even MIAME.

Maybe excercie to find a two specific standards? Or what issues the standards help to address.

We are not going to cover standards in real follow up episodes as we are "type" agnostic also those standards are pain in reality.
So that is the only episode we are going to talk about standards.

Add flowrepository to examples?

Add http://flowrepository.org to specialised repository examples in lesson exercise?

Impossible protocol

Complete the finding protocol task.

Andrew contact the guy from https://twitter.com/dgonzales1990/status/953737802205794304
Sorry you are a twitter person if he can tell what was the paper.

If not use your previous 3 loop example unless you have one with a paywall or paper one.