monarch-initiative / monarch-phenote Goto Github PK
View Code? Open in Web Editor NEWstub for monarch phenote
stub for monarch phenote
see examples/*.owl for two ultra-simple ontologies. One for diseases
These will be used for loading
After this add a barista middleware server, and finally the noctua client layer. The last part won't make much sense with its default configuration and the two loaded ontologies, but we'll try for this anyway to demonstrate proof of principle
Not sure whether new app vs yaml vs fork is the best long term/ short term fix, this is just to give a rough idea
pages such as http://bioontology.stanford.edu/phenote
reference phenote.org
phenote.org (54.83.197.87), should lead somewhere useful
whois phenote.org
Domain Name: PHENOTE.ORG
Domain ID: D138191835-LROR
WHOIS Server:
Referral URL: http://www.networksolutions.com
Updated Date: 2015-07-23T01:14:18Z
Creation Date: 2007-01-25T19:33:49Z
Registry Expiry Date: 2017-01-25T19:33:49Z
Sponsoring Registrar: Network Solutions, LLC
Sponsoring Registrar IANA ID: 2
Domain Status: clientTransferProhibited https://www.icann.org/epp#clientTransferProhibited
Registrant ID: 15032216-NSI
Registrant Name: Jeanne Gerstle
Registrant Organization: Lawrence Berkeley National Lab
Registrant Street: One Cyclotron Road
Registrant City: Berkeley
Registrant State/Province: CA
Registrant Postal Code: 94720
Registrant Country: US
Registrant Phone: +1.5104865095
Registrant Phone Ext:
Registrant Fax: +1.5104867000
Registrant Fax Ext:
Registrant Email: [email protected]
Admin ID: 15032216-NSI
Admin Name: Jeanne Gerstle
Admin Organization: Lawrence Berkeley National Lab
Admin Street: One Cyclotron Road
Admin City: Berkeley
Admin State/Province: CA
Admin Postal Code: 94720
Admin Country: US
Admin Phone: +1.5104865095
Admin Phone Ext:
Admin Fax: +1.5104867000
Admin Fax Ext:
Admin Email: [email protected]
Tech ID: 15048445-NSI
Tech Name: Craig A Leres
Tech Organization: 1 Cyclotron Road
Tech Street: MAIL STOP 50A-3111
Tech City: Berkeley
Tech State/Province: CA
Tech Postal Code: 94720
Tech Country: US
Tech Phone: +1.510486757
Tech Phone Ext:
Tech Fax: +1.510845201
Tech Fax Ext:
Tech Email: [email protected]
Name Server: NSX.LBL.GOV
Name Server: NSD.LBL.GOV
Name Server: ADNS1.ES.NET
Name Server: ADNS2.ES.NET
DNSSEC: unsigned
Last update of WHOIS database: 2016-02-09T17:43:44Z <<<
Example diagram from @pnrobinson
Will work with @mellybelly on what this means
This is the summary of the desired behavior @pnrobinson needs to proceed with using WebPhenote effectively for curation.
The current plan is to enable Peter to import an existing TSV, edit and extend it within WebPhenote, and to then export the new model to a TSV. Some columns need to be preserved round-trip, even though they may not be visible or editable in WebPhenote.
These are the columns:
'Disease ID', 'Disease Name', 'Gene ID', 'Gene Name', 'Genotype', 'Gene Symbol(s)', 'Phenotype ID', 'Phenotype Name', 'Age of Onset ID', 'Age of Onset Name', 'Evidence ID', 'Evidence Name', 'Frequency', 'Sex ID', 'Sex Name', 'Negation ID', 'Negation Name', 'Description', 'Pub', 'Assigned by', 'Date Created'
All of the ‘XXX Name’ fields do not need to be ‘preserved’ round-trip (through import/export), as long as I can ensure that the ‘XXX Name’ field is populated with the name derived from the corresponding ‘XXX ID’ field. So the actual fields to be preserved round-trip are:
'Disease ID',
'Gene ID',
'Genotype',
'Gene Symbol(s)',
'Phenotype ID',
'Phenotype Name',
'Age of Onset ID',
'Evidence ID',
'Frequency',
'Sex ID',
'Negation ID',
'Description',
'Pub',
'Assigned by',
'Date Created',
I’m not sure what to do with Genotype and Gene Symbols. Presumably, Genotype will be something like 'MGI:3711884’ (https://monarchinitiative.org/genotype/MGI:3711884) and the ‘Gene Symbols’ for that would be 'Gas1/Gas1; Shh/Shh<+> [involves: 129S1/Sv * 129X1/SvJ * C57BL/6J]’, which is derivable from Monarch and therefore I don’t need to store it.
The Gene and Genotype columns are NOT going to be visible or editable in WebPhenote, but they must be preserved round-trip.
The ‘Assigned By’ field will not be visible or editable, but must be preserved.
Negation, Frequency and Sex columns need to be added to WebPhenote and made visible and editable.
Summarizing (assuming anything VISIBLE/EDITABLE is preserved):
VISIBLE/EDITABLE 'Disease ID',
VISIBLE/DERIVED 'Disease Name',
HIDDEN/PRESERVED 'Gene ID',
DERIVED 'Gene Name',
HIDDEN/PRESERVED 'Genotype',
DERIVED 'Gene Symbol(s)',
VISIBLE/EDITABLE 'Phenotype ID',
VISIBLE/DERIVED 'Phenotype Name',
VISIBLE/EDITABLE 'Age of Onset ID',
VISIBLE/DERIVED 'Age of Onset Name',
VISIBLE/EDITABLE 'Evidence ID',
VISIBLE/DERIVED 'Evidence Name',
VISIBLE/EDITABLE 'Frequency',
VISIBLE/EDITABLE 'Sex ID',
VISIBLE/DERIVED 'Sex Name',
VISIBLE/EDITABLE 'Negation ID',
VISIBLE/DERIVED 'Negation Name',
VISIBLE/EDITABLE 'Description',
VISIBLE/EDITABLE 'Pub',
HIDDEN/PRESERVED 'Assigned by',
HIDDEN/PRESERVED 'Date Created',
See for example the Smoking node in #22
This is more of an ontology group action item but added here so we don't lose it, and so we can discuss it particularly in the context of causal roles in disease progression.
These should be modeled as occurrents (processes), as this fits best into our causal model.
The following is a draft of a guide for transitioning from Phenote to
WebPhenote. We should tidy this, make a wiki page for this, after
'testing' with HPOA.
With Phenote, knowledge organized as tabular files (TSVs). The semantics of the
files (i.e. column-column relations) comes from outside the
application. The TSVs are typically managed in GitHub.
With WebPhenote, the underlying model is RDF/OWL, with explicit
semantics. Currently knowledge organized as models, with each model
stored as a single file in GitHub, but in future will change to
triplestore; will still be organized by models (==~ Named Graphs) but
not directly file-based.
In either case, what counts as a file/model/named graph is a matter of
convention. For D2P, this is one per disease.
Here we describe 3 phases in the transition:
As the name suggests, WebPhenote is used for testing, any model
created may be discarded.
During the dual phase, all new models will be created using
WebPhenote. The sum total of knowledge is obtained by merging the
Phenote-generated TSVs plus the WebPhenote OWL store.
This will require converters: down-concerters to go from OWL to TSV,
and up-converters to go from TSV to OWL. This may go via an
intermediate form, e.g. PhenoPackets.
It is likely there will be consumers for both forms, so both
converters will need to be maintained.
While new-only is encouraged in the dual use phase, an upconverter can
be used on an as-needed basis to place an OWL model in the OWL GitHub
repo. It should be simultaneous removed from the TSV store, to avoid
stale duplicates.
In emergencies, it is possible to do the reverse: take a model created
in WebPhenote, downconvert to TSV, and edit in Phenote, if some vital
functionality is missing and their is a blocker on implementing it
(obviously we should guard against this scenario).
Note that WebPhenote can be in a beta release during the dual phase,
problems are expected to be easier to recover from.
In this phase, the TSV repository is transitioned en-masse to
the OWL repository and retired.
Note there will likely be pipelines consuming the TSVs, so a pipeline
should be set up that downconverts the OWL repo, until the consuming
software can be upgraded.
Note that a number of checks must first be done before switching to
this phase:
If the first fails, then fixes have to be made to the converters.
If the second is a problem, then other strategies can be pursued short
and long term, e.g. consolidating models. It is expected that these
would be resolved fully without workarounds when we switch to a triplestore.
Latest looks good. Comments
We would like the monarch phenotype noctua interface to have its own distinct web presence. This could simply be the instance of the tool itself (as http://noctua.berkeleybop.org is the instance for GO); if so we need a way to configure the front page a bit more to provide context specific to the instance. We would then coordinate larry to deploy the instance (presumably something like phenotua.monarchinitiative.org).
Or in the interim just an about page that lives somewhere.
We will also need a link to this page from the old phenote website: http://phenote.org/ saying "we are currently developing a web-based version of phenote, head on over to URL to find out more". @nlwashington / @kltm do we remember how to edit that page?
This all presupposes we have figured out the name: phenotue vs webphenote vs...
what is on production is really beta, but we should make it clearer and have a phased release plan
Use the disease as title, and the state should be 'production' not 'development'
would be great to have on the overview of models page, as well as on each individual model, for convenience.
Note that we'll want to agree with @pnrobinson and @drseb on the export format, should be similar/identical to the current HPO annotation format here: https://github.com/monarch-initiative/hpo-annotation-data
cc @cmungall
It would be great if there was a mechanism to request new terms and use a placeholder ID.
We'd want to use some term broker mechanism, and then also be able to report on terms that had recently been implemented within one's annotation set.
Might also be good to be be able to search for terms that have been requested but not yet implemented, so as to not duplicate requests? and/or comment on existing ones?
Can we have some kind of TermGenie tooling inside PhenoTua? or more like PhenoDiscuss, or both?
@cmungall @tudorgroza @jmcmurry
This would be a distinct microservice; the core phenote uses a native RDF/OWL model, this is what will be stored in the repo. Conversion will be via a distinct library/service (I will produce new architecture docs clarifying this).
This service should be considered distinct and can be any language.
One option is to implement in java and use this lib: https://github.com/phenopackets/phenopacket-reference-implementation
As subset of graphs that are generated via phenote forms will turn out to be near-equivalent to the JSON-LD representation of a phenopacket, see phenopackets/phenopacket-format#40 -- unfortunately exact equivalence may be unlikely due to awkwardness of reification (we use OWL2 axiom annotation in phenote/noctua, which looks awful in JSON-LD). But the mapping should be trivial
For curating association between mutation, regulatory region, gene and phenotype
Underlying model: will follow FALDO for the genomic variant
First pass should have 3 fields:
(To make things simpler, as a very first pass, omit 3.)
This should map to a graph in the following way:
di
pi
Check with @hdietze for modeling of Pub
See monarch-initiative/monarch-disease-ontology-RETIRED#66
Note that as all phenopackets have a defined structural translation to JSON-LD, then the import function may be equivalent to OWL import (but with additional things to make things conform, such as turning blank nodes into IRIs).
The export function may require tweaking but it's possible but a standard JSON-LD export plus 'frame' capabilities may work
Add some kind of integration into these
Whilst annotating the user should be able to see sufficiency score and similar entities
This could be a link out or some kind of widget that can be added into either the form or network display
The algorithm is simply to collect all direct classes from the current model and send that. This may send things that are not phenotypes (e.g. genes, age of onset) but it should be up to the server to deal with this.
Could not successfully handle batch request. Exception: org.geneontology.minerva.MolecularModelManager$UnknownIdentifierException. Could not validate the id: MESH:C537176 Could not successfully handle batch request. Exception: org.geneontology.minerva.MolecularModelManager$UnknownIdentifierException. Could not validate the id: MESH:C537176
Is it the case that only HP terms are accepted? Perhaps at present, we could at least change the tooltip accordingly.
We want to be able most easily create proper attribution for annotations or Phenopacket creation. Easiest way to ensure this might be to enable login via ORCID.
IF NOT, other mechanisms for a diversity of person identifiers should be supported, such as a foaf page or a OCLC id.
ALSO, more than one person ID needs to be allowed for each record or phenopacket
In the simplest case this will link a pre-existing feature (pulled from SciGraph) with a phenotype, analogous to #2
We may also allow some part of genotype post-composition
I have this issue with WebPhenote, Minerva, GOLR that I don't understand how to address, but I suspect that my GOLR query can be improved. I think it may have to do with a clique containing DOID:1932 and OMIM:105830, where OMIM:105830 is the leader, and the metadata is only associated with the leader, OMIM:105830, so DOID:1932 doesn't return useful label/description info.
What I consider a bug (from the WebPhenote user's experience), is that the label for an entry like DOID:1932 is 'Angelman Syndrome', but that WebPhenote cannot later retrieve this label. Does the autocomplete query include information I should add to Minerva?
Here's the problem in detail as manifested in WebPhenote:
Here is Minerva's request to GOLR:
curl 'https://solr-dev.monarchinitiative.org/solr/ontology/select?defType=edismax&qt=standard&wt=json&indent=on&fl=document_category,annotation_class,annotation_class_label,description,source,is_obsolete,alternate_id,replaced_by,synonym,subset,definition_xref,database_xref,isa_partof_closure,regulates_closure,only_in_taxon,only_in_taxon_closure&facet=false&json.nl=arrarr&q=*:*&rows=100&start=0&fq=document_category:"ontology_class"&fq=annotation_class:"DOID:1932"'
and here is the response, which does not contain an annotation class label or description:
"response":{"numFound":1,"start":0,"docs":[
{
"document_category":"ontology_class",
"annotation_class":"DOID:1932",
"is_obsolete":false,
"regulates_closure":["Orphanet:377788","DOID:0080014","Orphanet:182222","DOID:4","Orphanet:377794","DOID:1932","DOID:630","MESH:D035583","Orphanet:98053","Orphanet:68335","MESH:D025063","MESH:C"],
"isa_partof_closure":["Orphanet:377788","DOID:0080014","Orphanet:182222","DOID:4","Orphanet:377794","DOID:1932","DOID:630","MESH:D035583","Orphanet:98053","Orphanet:68335","MESH:D025063","MESH:C"]}]
}}
Here is Minerva's request for OMIM:105830:
curl 'https://solr-dev.monarchinitiative.org/solr/ontology/select?defType=edismax&qt=standard&wt=json&indent=on&fl=document_category,annotation_class,annotation_class_label,description,source,is_obsolete,alternate_id,replaced_by,synonym,subset,definition_xref,database_xref,isa_partof_closure,regulates_closure,only_in_taxon,only_in_taxon_closure&facet=false&json.nl=arrarr&q=*:*&rows=100&start=0&fq=document_category:"ontology_class"&fq=annotation_class:"OMIM:105830"'
and a subset of the response, showing the label and description:
"response":{"numFound":1,"start":0,"docs":[
{
"document_category":"ontology_class",
"annotation_class":"OMIM:105830",
"annotation_class_label":"Angelman syndrome",
"description":"Angelman syndrome is a neurodevelopmental disorder characterized by mental retardation, movement or balance disorder, typical abnormal behaviors, and severe limitations in speech and language. Most cases are caused by absence of a maternal contribution to the imprinted region on chromosome 15q11-q13. Prader-Willi syndrome (PWS; OMIM:176270) is a clinically di
The code I'm concerned with is in:
although the actual Golr request is in OwlTools I suspect.
At least for human level-1, WebPhenote needs to be able to read/write phenopacket format
At a meeting with @DoctorBud @drseb @kltm
We want to be able to delete assertions, but we never want to lose them. They should be obsoleted/deprecated, using the same annotation properties we use for classes, etc
This will probably be implemented entirely in minerva, cc @hdietze
It is not immediately obvious what to put in "age of onset".
Typing "birth" for instance, does nothing because it expects "congenital".
Not sure how many age terms there are, but loading all of them into a combination autocomplete and dropdown might be the best compromise.
Additionally, having autocomplete start with one typed letter (rather than 3) may in this instance be preferred.
Existing toy instance that we had was driven by a set of toy ontologies. We should now set up the real deal.
For the golr part, we could piggy back off the monarch golr instance (though this may not yet be stable enough). We're not currently loading ontologies in yet but this should be easy.
For example,
het vs. homo vs. hemizygous for a single gene,
multi-gene combinations +/- any given variant
GoF vs. LoF
Most annotations are made in pairs or sets comparing against some control, but the control is not always WT background. We need to be able to capture these "sets" of comparisons.
We should be able to get most of these from the MODs by using the pub or figure as a way to identify the comparisons.
@nlwashington has ideas for UI
See:
http://create.monarchinitiative.org/editor/graph/gomodel:5797c67200000016
The underlying edge is
[a ParkinsonDisease] --has_part--> [a Tremor]
When folding a relationship, the directionality must be preserved. ie this should be shown as:
PD
has_part(Tremor)
however, it is inverting it:
Tremor
has_part(PD)
it is never semantically correct to do this, unless R is replaced with InverseOf(R). This is the same as translating the base graph to the equivalent:
[a Tremor] --part_of--> [a ParkinsonDisease]
based on the inverse properties declaration
(although a generic graph editor issue this is only affecting monarch so far)
We should have options to export JSON-LD and TSV
For JSON-LD, this should be possible via a standard RDF to JSON-LD converter (ie apache-jena). However, we should tweak this with a standard context file that will compact the JSON to something that conforms to a TBD PhenoJSON schema. It might be simpler to implement custom server side code first of all.
The TSV may be lossy; it may be isomorphic to the forms mapping. TBD.
This ticket is a stub. More details forthcoming. cc @mellybelly @nlwashington
The front page should have a link to an exemplar model. I will curate the model along with @mellybelly
This interface:
http://monarchinitiative.org/annotate/text
Allows marking up text with ontology classes (including genes), e.g
http://monarchinitiative.org/annotate/text/?q=Tyr+gene+implicated+in+ocular+albinism&longestOnly=true
(Aside: yes there are too many genes, see https://github.com/monarch-initiative/monarch-app/issues/925)
We have an option at the bottom to do an owlsim search (this could be made more visible)
We would like another option that would send the list of genes and ontology classes into noctua to 'seed' a model. Formally, we would create a new set of individuals ClassAssertion(C I)
where I is a uuid (generated using normal minerva procedures) and C is the marked up class. There are various smart ways to improve this. The table of results could have checkboxes allowing the user not to flood a model. The NLP could be enhanced to create links between individuals (e.g. an age of onset followed by a phenotype could be linked).
It may be easiest to throw people into the network view rather than the form/table view.
Spoke to @kltm last week and one way to do this would be
This is exactly how galaxy talks to external tools. We are thinking this might be a nice lightweight lightly coupled way for noctua to talk to a variety of external tools. (I thought we had docs on this in the GO wiki but can't find it)
We may want to separate out the annotator into its own standalone component before proceeding with this
See: http://create.monarchinitiative.org/download/gomodel:5797c67200000003/owl
Individual: <http://model.geneontology.org/5797c67200000003/5797c67200000005>
Annotations:
****** <http://geneontology.org/lego/evidence> <http://model.geneontology.org/5797c67200000003/5797c67200000007>,
<http://purl.org/dc/elements/1.1/date> "2016-07-26"^^xsd:string,
<http://purl.org/dc/elements/1.1/contributor> "http://orcid.org/0000-0002-6601-2165"^^xsd:string
Types:
<http://purl.obolibrary.org/obo/HP_0001337>
Facts:
Annotations: <http://purl.org/dc/elements/1.1/contributor> "http://orcid.org/0000-0002-6601-2165"^^xsd:string,
<http://purl.org/dc/elements/1.1/date> "2016-07-26"^^xsd:string
<http://purl.obolibrary.org/obo/RO_0002488> <http://model.geneontology.org/5797c67200000003/5797c67200000008>
Note that 5797c67200000007 is the ECO instance.
Two issues:
Use Webphenote and Tudors text mining tool to extract phenotypes for a comprehensive collection of publications about genotype phenotype correlations. For example, FBN1 or LMNA are genes in which distinct mutations can lead to completely different diseases.
Project phases
This conforms to the simple pattern here. geneontology/noctua#150
Source will be a variant entity, Object will be phenotype (as per disease-phenotype form)
This will need a small amount of upstream work to load the right entities into the OWL. We will use the clinvar ttl, but pun it such that we have variant classes that can be instantiated in the model
So that users submitting data to a journal or repo, can also simultaneously be in ClinVar submission format, or even automate this in some way.
@mellybelly @cmungall please advise.
In order for people to use webphenote for authoring phenopackets, it must be possible to have the disease field be empty to account for:
In light of upcoming meetings / papers, we must do one of the following in the short term:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.