cidgoh / pathogen-genomics-package Goto Github PK
View Code? Open in Web Editor NEWThis is the DataHarmonizer spreadsheet web application bundled with pathogen genomics data entry and validation templates
License: MIT License
This is the DataHarmonizer spreadsheet web application bundled with pathogen genomics data entry and validation templates
License: MIT License
In the CanCOGeN DH template, the sequencing instrument information is supposed to export to PH_INSTRUMENT_CGN field whereas the MPX template exports sequencing instrument to PH_INSTRUMENT.
For some reason both are showing up in the MPX export. Can we turn off the "PH_INSTRUMENT_CGN" field in the MPX export?
I'm not even sure how that field is showing up since it's not in the MPX template's LIMS export column at all...
Currently in the Canadian MPX template a user can have a dataset in which the age value is a null value, but they have entered an age bin instead (so that the exact age is obfuscated but the general age range is shared). There is code in the DH that says "if there is a null value in the age field, put the same null value in the age bin field", which was created in order to automate populating associated fields and reduce data entry. If the user saves the dataset and re-opens it later, then the entered/saved age bin data gets overwritten with the same null value as in the age field.
This is erasing information that the user has entered. And we had the same issue in the CanCOGeN template.
You put a fix in place in the CanCOGeN template so that upon opening a file, whatever is in the age bin field will remain untouched. BUT if a user is entering fresh data, if they enter a null value for the age field, the age bin field still autofills the same null value (which they can edit themselves if they want).
Can you put the same fix into the MPX template that you put in for the CanCOGeN template to address this issue?
Thanks!
I believe that data entered more than one time in the vaccination fields that are concatenated into the PH_VACCINATION_HISTORY field is getting unintentionally omitted from the NML LIMS dataharmonizer export from the CanCOGeN template.
When, for example:
Astrazeneca (Vaczevria)
is entered for more than one vaccination dose name
the result is:
<Host Vaccination Status>;Astrazeneca (Vaxzevria);2022-11-01;2023-01-01;2023-03-01;2023-06-01;<Vaccination History>
rather than the expected:
<Host Vaccination Status>;Astrazeneca (Vaxzevria);2022-11-01;Astrazeneca (Vaxzevria);2023-01-01;Astrazeneca (Vaxzevria);2023-03-01;Astrazeneca (Vaxzevria);2023-06-01;<Vaccination History>
The same is occurring when the same date is used for one or more vaccination doses.
Thank you!
The isolated_by field currently takes a string. However, the same field in the excel template over at the GRDI_AMR_One_Health repo is limited to its own isolated_by menu. This presumably should be incorporated here as well.
There appears to be an isolated_by menu in the corresponding YAML. I might suggest that, instead of having its own menu, this field take the same menu as sample_collected_by.
We need to provide a summary page that consolidates template resources for each spec (e.g. add in CanCOGeN, MonkeyPox, AMBR, etc.), kind of like:
https://github.com/cidgoh/DataHarmonizer/wiki/DataHarmonizer-Template-SOPs
Update in both the DH wiki and the pathogen-genomics-package.
Wiki page title: Pathogen Genomics Template Resources
Some fields have been removed, others added. New picklists have been added. Ontology IDs have been added. Guidance and examples have been updated.
Can we do a new release of the AMBR template pretty please?
I tracked the changes in the version tracker and bumped the proposed version number to 2.1.1 (in red) as there were changes to fields (x), terms (y) and guidance/defs/IDs (z).
If you open your instance of the PGP and click on "Get latest release" under the Help button, it takes you to the latest DH release (in the DH repo), not the latest release in the PGP repo.
e.g.
Go into pathogen-genomics-package-PGPv1.3.7 data harmonizer
and clicked "get latest release"
and it took me to here:https://github.com/cidgoh/DataHarmonizer/releases
but shouldn't it be going here: https://github.com/cidgoh/pathogen-genomics-package/releases
Can we update please?
and:
Poultry litter is the correct one: http://purl.obolibrary.org/obo/AGRO_00000080
I added a new field called "travel history availability" and values.
Can the values from this field be added to those that are concatenated in the NML LIMS field "PH_TRAVEL" in the NML LIMS export, pretty please?
And then can we do a new release (at the same time as the AMBR release maybe?)? I bumped the template version to 2.1.2 (in red) because there were changes to fields, terms and guidance/examples.
I suggested bumping the PGP version to 2.0.1 (in red) as there were no new templates added (x), no new schemas (y), but there were changes to existing templates (z).
Can we make it so this readme includes the Stand-Alone DataHarmonizer Functionality
section from the main DataHarmonizer readme? Also the information from the old "stand-alone" installation instructions?
A sample can contain multiple organisms, multiple kinds of the same organism (i.e. multiple isolates), and isolates may be sequenced multiple times using different protocols or instruments. This creates a 1-to-many issue, where one sample may need to be linked to multiple organisms, isolates, library IDs, associated tests (AMR drug panels from different companies) etc.
Currently the contextual data for organisms, isolates etc from the same sample have to be entered repeatedly over and over again which creates a data entry burden for data providers.
Ideally, modularity could be created so that sample information could be entered once and linked to different isolates.
Similarly, isolate information could be entered once and linked to different libraries with different processing details/instruments.
Also similarly, libraries could be linked to multiple sequencing runs and/or associated tests.
To submit the data to LIMS or public repositories, every library or isolate or organism would need the metadata from the sample so
ideally upon export, the DH would populate that info and present each thing as a separate line in a spreadsheet.
e.g. the above situation would appear like:
sample 1 --> organism 1 --> isolate A --> library 1 --> sequence 1
sample 1 --> organism 2 --> isolate B --> library 2 --> sequence 2
sample 1 --> organism 2 --> isolate C --> library 3 --> sequence 3
sample 1 --> organism 2 --> isolate C --> library 4 --> sequence 4
sample 1 --> organism 2 --> isolate C --> library 4 --> sequence 5
*But the data provider wouldn't have to enter the different metadata multiple times.
Can we make the DH do this modular/1:N data capture and transformation (pretty please)?
Hi all!
Following some conversation with our data partners, we'd like to request that the option for:
"Throat"
be added to the picklist for "Anatomical part".
Thank you!
The images referenced in this readme don't appear to get bundled in this repos package:
https://github.com/cidgoh/pathogen-genomics-package/tree/main/templates/canada_covid19/exampleInput
Found another CanCOGeN artifact in the NML LIMS export from the MPX template.
Can we please replace "PH_CANCOGEN_AUTHORS" with "PH_SEQUENCING_AUTHORS" after the "SUBMITTED_RESLT - Gene Target #5 CT Value" field in the NML export, pretty please?
In the DH template it's supposed to export as "PH_SEQUENCING_AUTHORS" so I'm not sure where "PH_CANCOGEN_AUTHORS" is coming from...
There are 2 places in the NML LIMS export that the field "host (scientific name)" outputs to - the field that goes into NML LIMS called PH_SPECIMEN_SOURCE and the DH field that also appears in the export file but doesn't get uploaded to LIMS called "host (scientific name)".
In the recent changes to the PGP, we lost the rule that says IF host (scientific name) is Homo sapiens THEN PH_SPECIMEN_SOURCE is Human. The NML uses "Human" instead of "Homo sapiens".
The issue we had before was that the DH is outputting the Human rule in the host (scientific name) field as well as PH_SPECIMEN_SOURCE.
i.e. IF host (scientific name) is Homo sapiens THEN host (scientific name) should be Homo sapiens and NOT Human.
In other words, we want the entered data (Homo sapiens) to be in the DH output fields (lower case after the Provenance field), but the transformed value (Human) in the NML LIMS field (PH_SPECIMEN_SOURCE, before the Provenance field).
Can we do this?
The fix is needed for the NML LIMS from both the CanCOGeN and Monkeypox templates.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.