carpentries-incubator / fair-bio-practice Goto Github PK
View Code? Open in Web Editor NEWFAIR in (biological) practice
Home Page: https://carpentries-incubator.github.io/fair-bio-practice/
License: Other
FAIR in (biological) practice
Home Page: https://carpentries-incubator.github.io/fair-bio-practice/
License: Other
The image descriptoin text in the lessons keeps getting reformatted by editors, also it is too large for powepoint.
I checked that actually omero font can be enlarge and we could easily show the metadata in life public omero.
It will add the WOW effect, plus FAIR in practice, and first glimpse on OMERO
In order to explain the aministrative/descriptive metadata few details are missing, which hopefully you could
add as key value pairs to the data set (only)
https://publicomero.bio.ed.ac.uk/webclient/?show=dataset-263
So the key values could be
creator: Euque
createor ORCID: xxx
curator: Andrew
curator ORCID: xxx
funder: BBSRC
funding: BB/P001335/1
funding: BB/R50614X/1
(there are more funders there but that is enough).
You are the owner of so it is easier for you, sorry.
After the barriers description. Need to clarify date for patent and copyright. The info that data cannot be copyrighted etc.
Lots of lesson content missing. Marked with TODOs and ideas for the text.
Check the reasons, think of better ones
Scientific communities, citzen science, and educative resources missing from description
Hi All,
I just pushed Ben's changes to the ep-open-science branch with a couple of edits from myself.
Could you check if you're happy with it, if you're missing anything major/minor?
Tomasz, it might be easier if you format it the way you had it in mind in regard to headers etc... I added a couple for better flow of the lesson and as visual aid to break up the big chunks of text as it looked a little grim without. But you might have wanted the text in boxes?
If there's anything you'd like us to add/change, let us know.
Cheers,
Ines
Example of a figure for which numerical data are not attached and could be useful.
AM was suggesting ideas for possible modeling papers at one of the meeting.
Or I believe benedict papers has leaf heatmaps but not individual timeseries.
Anything that shows importance of underlining numerical data
Thank you for providing this course. I am trying to update our episode about versioning and I intend to use parts of your episode. If successful the new episode will be a part of https://github.com/NBISweden/workshop-dm-practices.
Best regards,
Erik
Links to some additional materials/publications about OS.
Find righfield paper and check who cites it.
There maybe some newer tools for making metadata templates, if they are they should cite this one so that may be the easiest way to find it.
Download and test if it still works on current Office/Win10/Mac
Discuss do we want to teach it
We need 4 or 5 domain specific repositories.
Selection of the domains should be based on "most likely being used" by participants.
Or cuase it is a "perfect repository" ie having all cool featuers like rich metadata plus some domain specific perks (I dont believe it existis, somethink like biodare but with good metadata.
It should be enough, but maybe Ines knows a Nueroscience imaging repo, maybe they are better and we would have medical examples.
Something for NGS or genomics in general. I believe those data are good for re-use
Other omics? There is metabolights repo, if we go with ISA tab templates than this repo is a good compliment.
??I dont know?? Microarrays ... but are microarrays still "cool" technique and being commonly used in studies or are they replaced by NGS like methods.
There is SynBioHub which we know. The cool things are the functional glifs and the annotated sequences, but also FAIR links to definitions of roles etc. OK the data is just the one sbol file that defines design, but my guess is that sharing plasmids structure is quite common task, even for no synthetic biology labs.
BioDare is always an alternative as it has cool factor in visualization and period anlaysis. But it is not "data type" specific but more discipline specific. Also, it is not super FAIR repo at the moment.
For the selected repositories, we need a showcase record == well describe dataset which people can look and be wowed.
Finding example by browsing in domain specific repos should be "relatively" easy, we should not need to make fake deposits.
What is the personal benefit of interoperable - intelligible.
Ask EW about software on demand. His papers are actually very FAIR with shared data and scripts.
Only in https://rnajournal.cshlp.org/content/23/5/601.full are R scripts on demand, but those are just for plotting.
If not EW than a paper referencing brass is OKish
So my example:
2020-07-14_s12_phyB_on_SD_t04.raw.xlsx
2020-07-14_s1_phyA_on_LD_t05.raw.xlsx
2020-07-14_s2_phyB_on_SD_t11.raw.xlsx
2020-08-12_s03_phyA_on_LD_t03.raw.xlsx
2020-08-12_s12_phyB_on_LD_t01.raw.xlsx
2020-08-13_s01_phyB_on_SD_t02.raw.xlsx
2020-7-12_s2_phyB_on_SD_t01.raw.xlsx
AUG-13_phyB_on_LD_s1_t11.raw.xlsx
JUL-31_phyB_on_LD_s1_t03.raw.xlsx
LD_phyA_off_t04_2020-08-12.norm.xlsx
LD_phyA_on_t04_2020-07-14.norm.xlsx
LD_phyB_off_t04_2020-08-12.norm.xlsx
LD_phyB_on_t04_2020-07-14.norm.xlsx
SD_phyB_off_t04_2020-08-13.norm.xlsx
SD_phyB_on_t04_2020-07-12.norm.xlsx
SD_phya_off_t04_2020-08-13.norm.xlsx
SD_phya_ons_t04_2020-07-12.norm.xlsx
ld_phyA_ons_t04_2020-08-12.norm.xlsx
Shows how dates up fron make it difficult to find by genotype/conditions (thogh dates in front may have value if for example content has multiple variables)
1a Ordering by date obscures pattern in conditions/samples
s12 is before s1,s2 if 0 not used
That you need to be numeric in dates (Aug before Jul)
That you need to be consisnten 2020-7 is after 2020-08-13
You should think how you are going to search or looking at the data, We have clear LD vs SD conditions and then organized by genotype
Be careful with cases, ld is after SD, also phya is after phyB
keeping same length of parts makes easier to read, at there is ons and off (on succrose, off succrose) nicely ordered, above there is on and off makes it jumpy
Episode intro.
Transition from impossible examples to fair. Re-stating what where the issues in the examples.
The draft episode is ready for scrutiny!
discussed. Dont know if you have anything better than benedict
Hi people!
I just merged the draft episode 07 to gh-pages. You can see the rendered version here:
https://carpentries-incubator.github.io/fair-bio-practice/07-files-organization/index.html
I am trying to think about a good example for the 'folder structure' challenge (should be doable in 5 minutes): right now it asks the students to look at two different folder structures used to organise the same data and decide which is the better option. However, there are different ways to go about this:
Any thoughts?
Best,
A
Hello! I am just checking the correspondence between each chapter's slides and notebooks, and I realised chapter 5 (Metadata) includes a final "quiz" in the slides that I can't find in the notebook. In addition, the notebook doesn't include any feedback section... is that intended?
Thank you!
Example of FAIR data record for excercise (penultimo en capitulo).
Student asked to look at a link and say why a record or dataset is fair.
Anrew check if uniprot has a nice record to use.
According to fair paper:
UniProt26: UniProt is a comprehensive resource for protein sequence and annotation data. All entries are uniquely identified by a stable URL, that provides access to the record in a variety of formats including a web page, plain-text, and RDF (‘F’ and ‘A’). The record contains rich metadata (‘F’) that is both human-readable (HTML) and machine-readable (text and RDF), where the RDF formatted response utilizes shared vocabularies and ontologies such as UniProt Core, FALDO, and ECO (‘I’). Interlinking with more than 150 different databases, every UniProt record has extensive links into, for example, PubMed, enabling rich citation. These links are machine-actionable in the RDF representation (‘R’). Finally, in the RDF representation, the UniProt Core Ontology explicitly types all records, leaving no ambiguity—neither for humans nor machines—about what the data represents (‘R’), enabling fully-automated retrieval of records and cross-referencing information.
So all letters could be there.
If not maybe Dataverse, again
Dataverse makes the Digital Object Identifier (DOI), or other persistent identifiers (Handles), public when the dataset is published (‘F’). This resolves to a landing page, providing access to metadata, data files, dataset terms, waivers or licenses, and version information, all of which is indexed and searchable (‘F’, ‘A’, and ‘R’). Deposits include metadata, data files, and any complementary files (such as documentation or code) needed to understand the data and analysis (‘R’). Metadata is always public, even if the data are restricted or removed for privacy issues (‘F’, ‘A’). This metadata is offered at three levels, extensively supporting the ‘I’ and ‘R’ FAIR principles: 1) data citation metadata, which maps to DataCite schema or Dublin Core Terms, 2) domain-specific metadata, which when possible maps to metadata standards used within a scientific domain, and 3) file-level metadata, which can be deep and extensive for tabular data files (including column-level metadata). Finally, Dataverse provides public machine-accessible interfaces to search the data, access the metadata and download the data files, using a token to grant access when data files are restricted (‘A’).
Hi @tzielins!
To include the logo in all pages I created https://github.com/carpentries-incubator/fair-bio-practice/blob/gh-pages/_includes/logo.md and then referenced it in https://github.com/carpentries-incubator/fair-bio-practice/blob/gh-pages/_includes/links.md. However, sthg is not right. It works on some pages but not in others. It has to do with how the base address is set. Could you take a look?
Best,
A
I'm a member of The Carpentries Core Team and I'm submitting this issue on behalf of another member of the community. In most cases, I won't be able to follow up or provide more details other than what I'm providing below.
There were some issues / daubts
F in FAIR stands for free. FFFFFFFFF
Only figures presenting results of statistical analysis need underlying numerical data FFFF?FFFF
Sharing numerical data as a .pdf in repository as Zenodo is FAIR. TFFFFFTF(It's the lowest end of FAIR)+1+1+1
Sharing numerical data as an Excel file via Github is not FAIR. FFFFFFFFF
Metadata standards (for example MIAME MIQE) assure the “IR” in FAIR. TTTTTTT? T?
Group websites are one of the best places to share your data. FFFFFFFFF
Data from failed experiments are not re-usable. FFFFFFFFF
Data should always be converted to Excel or .cvs files in order to be FAIR. FFFFFFFFF (csv not cvs)
A DOI of a dataset helps in getting credit. TTTTTTTTT
FAIR data are peer reviewed. FFFFFFFFF
FAIR data accompany a publication. FF?ideallyFF it's complicated...+1+1+1
Suggestion from AM
argonomics DB, so a paper that refers to data in that db (by author swiss group in timet project), maybe with link to db,
db does not exist.
Or argonomics paper if there was one.
Missing solution, needed as it may be self taught course.
Thanks for putting this course together. I am putting together training on FAIR data management for long-term agricultural experiments and parts of this course have been very helpful for organising my own training offering. I will in due course make this available as a carpentry lesson here.
best wishes
Richard
OS definitions slides are not longer points, need some graphical elements on the slides
We need a good example dataset in Zenodo (or Figshare, I think I prefer zenodo)
Good data set:
Maybe Andrew know some from his Covid curator activity.
Finding by browsing in zenoodo looks tricky, some search by topic or maybe it will be easier from some paper (how to find papers that deposited to Zenodo though).
It may be easier and faster and more educative to make our own "perfect" deposit, which then will be used for showcase.
Assemble data from your publish papers (faking data for fake deposit is too much), lets base on what you have and published.
For example, based on the full leaf imaging of Benedict (which Andrew recently released):.
Take some of the images, combine with extarcted timeseries, the matlab data file, the matlab code.
Could drop a file with imaging protocol description.
Then a file_organization.txt that explains layout and naming conventions.
Then a readme
And probably compacted readme as zenodo description plus some tags (if zenodo supports those).
I know it is a lot of work. But the perfect example could be also used then in working in files and potentially in writing readmes episode.
Copy pasting from own papers and lab notes may actually be faster than searching in zenodo.
It will all depends if your own data are complex enough, to justify some inner folder structures, naming or intersting readme. I would say any paper that useses more than one experimental technique is a good candidate.
You know your work, so we can have a call and you could show what you could wrap as a deposit.
Rather than looking at GOF example, we could show that "No size fits all" and let people decide what are pros and cons of different
folders layouts while being shown the "typical" approaches.
For computational projects we have
Left is like GOF structure. We cannot use it directly as it is Figure1 and we say not to name like this.
I for example prefer the second one (right), as the results sits next to the code that generates them.
Similarly we have two options for wet projects:
So it needs drawing or a files for screenshots.
Then we can have two kind of gorups one compares two computational the other typ two wet projects.
About https://carpentries-incubator.github.io/fair-bio-practice/01-wellcome/index.html
From my experience people often confuse FAIR data and science with open data and science. It would be good to clarify this from the start in the introduction and go into the details after that in the dedicated sections. Also an exercise to clarify this difference would help. We can use some inspiration and materials from the lesson on "FAIR data for climate sciences" that I helped developing:
https://escience-academy.github.io/Lesson-FAIR-Data-Climate/introduction/index.html
I think that lesson would be a good source of material, also for other episodes.
Hi All,
We've restructured the record keeping lesson and I've hopefully managed to do most changes the way you pictured them @zajawka - since you asked about what happened in the example with Novartis, I added that to the intro paragraph and explicitly stated how FAIR record keeping can help avoid things like that in the future if we implement it.
Please have a read through and if you could try the Benchling exercise to see if all looks fine and goes well. It is simpler as before and focused on provenance as discussed.
Let me know what changes you'd like us to make.
Cheers
Ines
Hi everyone,
I have just pushed episode 05 on a branch I created:
https://github.com/carpentries-incubator/fair-bio-practice/blob/update-episode-04-and-05/_episodes/05-the-research-data-life-cycle.md
Could you please have a look and tell me what you think about it? Any ideas or suggestions to improve it?
Total episode time: 30 minutes max.
All the best,
Andrés
On rendered page, the on vs ons benefits are not clear as the font is not fixed width so the files do not move arroudn so much.
I beliew you already recorded some edit issues, in your fork of biordm repo
Me not like it :)
Producing metadata is not an intro... if we start producing metadata now, what are we going to do in another part.
Or why now we have "silly excercise" now if then we have a better one.
Let's remember we are going to have a proper episode for producing metadata.
At the same time, there are important concepts as annotating with PermId, MIAMIs standards.
Obj1: Know what is metadata
Obj2: Know how to provide metadata
Obj3: How FAIR applies also to metadata
Obj1 & Obj2. We are actually missing example of metadata. Remember that at workshop students do not look at the text,
so we need an examples to look at while intstructor explains the concept of metadata and it types.
I propose to have a readme like text (half page) that describes a data file
and a second example table data with some embeded metadata
But lets have them buffy, half page readme so it has details. Same for data table. It should be obvious it takes time to produce.
Excercies:
Obj3.
The dobbleganger is funny, but, making a record with just orcid ....
What about showing publication record that uses orcid,
for example https://wellcomeopenresearch.org/articles/5-96/v2
and ask to click on authors orid which takes to their orcid pages and their own work not doblegangers.
If we have time we could change this example to another wellcome paperwith some common name like John Smith or an Asian one as they have real problem of having a lot of dopplegangers (a quick search did not give me nice example).
Metadata standards needs more attention.
I used before https://fairsharing.org/standards/ to find some, in the linked DCC I could not find even MIAME.
Maybe excercie to find a two specific standards? Or what issues the standards help to address.
We are not going to cover standards in real follow up episodes as we are "type" agnostic also those standards are pain in reality.
So that is the only episode we are going to talk about standards.
Add http://flowrepository.org to specialised repository examples in lesson exercise?
Complete the finding protocol task.
Andrew contact the guy from https://twitter.com/dgonzales1990/status/953737802205794304
Sorry you are a twitter person if he can tell what was the paper.
If not use your previous 3 loop example unless you have one with a paywall or paper one.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.