monarch-initiative / monarch-phenote Goto Github PK

View Code? Open in Web Editor NEW

4.0 34.0 3.0 141 KB

stub for monarch phenote

Java 50.91% Shell 49.09%

monarchinitiative phenotypes

monarch-phenote's Issues

Load a minerva and golr instance using examples files

see examples/*.owl for two ultra-simple ontologies. One for diseases

These will be used for loading

A minerva instance (@hdietze can help )
A golr instance ( @kltm can help)

After this add a barista middleware server, and finally the noctua client layer. The last part won't make much sense with its default configuration and the two loaded ontologies, but we'll try for this anyway to demonstrate proof of principle

Surface customization hooks required

Title
- Phenote
All
- "Noctua Preview" in top-left
- Monarch colors nice but lower priority
Landing
- Note that the front page in generic Noctua is now a selection table, so we have a data-driven way of customizing the landing page. We just need content! Not a developer issue
- "Learn more" - we will have a custom md file here, I can specify, use a skeleton for now
Browse
- AmiGO labs link should be suppressed <-- the analog for monarch here is simply the monarch site
- This is kind of an odd tab anyway, and may end up being shuffled in core
Create
- 'model from process and taxon' doesn't quite make sense. We will have an analogous operation for phenote in the future, but for now suppression is best
- Need some way of getting to form interface here, but this is less a surface issue and more a deeper one of how we handle the form vs causal model interfaces
About
- Needs customized but this tab may go away in core, @kltm to comment
Danger Zone
- Unfortunately the monarch instance can probably get away with less cuteness and geekiness than the GO instance. "Danger Zone" is not good advertising here...

Not sure whether new app vs yaml vs fork is the best long term/ short term fix, this is just to give a rough idea

phenote.org

pages such as http://bioontology.stanford.edu/phenote
reference phenote.org

phenote.org (54.83.197.87), should lead somewhere useful

whois phenote.org
Domain Name: PHENOTE.ORG
Domain ID: D138191835-LROR
WHOIS Server:
Referral URL: http://www.networksolutions.com
Updated Date: 2015-07-23T01:14:18Z
Creation Date: 2007-01-25T19:33:49Z
Registry Expiry Date: 2017-01-25T19:33:49Z
Sponsoring Registrar: Network Solutions, LLC
Sponsoring Registrar IANA ID: 2
Domain Status: clientTransferProhibited https://www.icann.org/epp#clientTransferProhibited
Registrant ID: 15032216-NSI
Registrant Name: Jeanne Gerstle
Registrant Organization: Lawrence Berkeley National Lab
Registrant Street: One Cyclotron Road
Registrant City: Berkeley
Registrant State/Province: CA
Registrant Postal Code: 94720
Registrant Country: US
Registrant Phone: +1.5104865095
Registrant Phone Ext:
Registrant Fax: +1.5104867000
Registrant Fax Ext:
Registrant Email: [email protected]
Admin ID: 15032216-NSI
Admin Name: Jeanne Gerstle
Admin Organization: Lawrence Berkeley National Lab
Admin Street: One Cyclotron Road
Admin City: Berkeley
Admin State/Province: CA
Admin Postal Code: 94720
Admin Country: US
Admin Phone: +1.5104865095
Admin Phone Ext:
Admin Fax: +1.5104867000
Admin Fax Ext:
Admin Email: [email protected]
Tech ID: 15048445-NSI
Tech Name: Craig A Leres
Tech Organization: 1 Cyclotron Road
Tech Street: MAIL STOP 50A-3111
Tech City: Berkeley
Tech State/Province: CA
Tech Postal Code: 94720
Tech Country: US
Tech Phone: +1.510486757
Tech Phone Ext:
Tech Fax: +1.510845201
Tech Fax Ext:
Tech Email: [email protected]
Name Server: NSX.LBL.GOV
Name Server: NSD.LBL.GOV
Name Server: ADNS1.ES.NET
Name Server: ADNS2.ES.NET
DNSSEC: unsigned

Last update of WHOIS database: 2016-02-09T17:43:44Z <<<

Harmonize model with https://www.w3.org/annotation/

Support common disease causal model

Example diagram from @pnrobinson

Roll out to MODs

Will work with @mellybelly on what this means

HPO-mode WebPhenote support - TSV Import/Export, additional columns

This is the summary of the desired behavior @pnrobinson needs to proceed with using WebPhenote effectively for curation.

The current plan is to enable Peter to import an existing TSV, edit and extend it within WebPhenote, and to then export the new model to a TSV. Some columns need to be preserved round-trip, even though they may not be visible or editable in WebPhenote.

These are the columns:
'Disease ID', 'Disease Name', 'Gene ID', 'Gene Name', 'Genotype', 'Gene Symbol(s)', 'Phenotype ID', 'Phenotype Name', 'Age of Onset ID', 'Age of Onset Name', 'Evidence ID', 'Evidence Name', 'Frequency', 'Sex ID', 'Sex Name', 'Negation ID', 'Negation Name', 'Description', 'Pub', 'Assigned by', 'Date Created'
All of the ‘XXX Name’ fields do not need to be ‘preserved’ round-trip (through import/export), as long as I can ensure that the ‘XXX Name’ field is populated with the name derived from the corresponding ‘XXX ID’ field. So the actual fields to be preserved round-trip are:

'Disease ID',
'Gene ID',
'Genotype',
'Gene Symbol(s)',
'Phenotype ID',
'Phenotype Name',
'Age of Onset ID',
'Evidence ID',
'Frequency',
'Sex ID',
'Negation ID',
'Description',
'Pub',
'Assigned by',
'Date Created',

I’m not sure what to do with Genotype and Gene Symbols. Presumably, Genotype will be something like 'MGI:3711884’ (https://monarchinitiative.org/genotype/MGI:3711884) and the ‘Gene Symbols’ for that would be 'Gas1/Gas1; Shh/Shh<+> [involves: 129S1/Sv * 129X1/SvJ * C57BL/6J]’, which is derivable from Monarch and therefore I don’t need to store it.
The Gene and Genotype columns are NOT going to be visible or editable in WebPhenote, but they must be preserved round-trip.
The ‘Assigned By’ field will not be visible or editable, but must be preserved.
Negation, Frequency and Sex columns need to be added to WebPhenote and made visible and editable.
Summarizing (assuming anything VISIBLE/EDITABLE is preserved):

VISIBLE/EDITABLE 'Disease ID',
VISIBLE/DERIVED 'Disease Name',
HIDDEN/PRESERVED 'Gene ID',
DERIVED 'Gene Name',
HIDDEN/PRESERVED 'Genotype',
DERIVED 'Gene Symbol(s)',
VISIBLE/EDITABLE 'Phenotype ID',
VISIBLE/DERIVED 'Phenotype Name',
VISIBLE/EDITABLE 'Age of Onset ID',
VISIBLE/DERIVED 'Age of Onset Name',
VISIBLE/EDITABLE 'Evidence ID',
VISIBLE/DERIVED 'Evidence Name',
VISIBLE/EDITABLE 'Frequency',
VISIBLE/EDITABLE 'Sex ID',
VISIBLE/DERIVED 'Sex Name',
VISIBLE/EDITABLE 'Negation ID',
VISIBLE/DERIVED 'Negation Name',
VISIBLE/EDITABLE 'Description',
VISIBLE/EDITABLE 'Pub',
HIDDEN/PRESERVED 'Assigned by',
HIDDEN/PRESERVED 'Date Created',

Add exposure and external conditions ontology

See for example the Smoking node in #22

This is more of an ontology group action item but added here so we don't lose it, and so we can discuss it particularly in the context of causal roles in disease progression.

These should be modeled as occurrents (processes), as this fits best into our causal model.

smoking, drinking
drug history (prescribed or otherwise), food ingestion
...

Document process for transitioning from Phenote to WebPhenote

The following is a draft of a guide for transitioning from Phenote to
WebPhenote. We should tidy this, make a wiki page for this, after
'testing' with HPOA.

Transitioning from Phenote to WebPhenote

With Phenote, knowledge organized as tabular files (TSVs). The semantics of the
files (i.e. column-column relations) comes from outside the
application. The TSVs are typically managed in GitHub.

With WebPhenote, the underlying model is RDF/OWL, with explicit
semantics. Currently knowledge organized as models, with each model
stored as a single file in GitHub, but in future will change to
triplestore; will still be organized by models (==~ Named Graphs) but
not directly file-based.

In either case, what counts as a file/model/named graph is a matter of
convention. For D2P, this is one per disease.

Here we describe 3 phases in the transition:

Test - getting the configuration and modeling right
Dual - use WebPhenote for new only
Transitioned - port entire knowledge base

Test Phase

As the name suggests, WebPhenote is used for testing, any model
created may be discarded.

Dual Phase

During the dual phase, all new models will be created using
WebPhenote. The sum total of knowledge is obtained by merging the
Phenote-generated TSVs plus the WebPhenote OWL store.

This will require converters: down-concerters to go from OWL to TSV,
and up-converters to go from TSV to OWL. This may go via an
intermediate form, e.g. PhenoPackets.

It is likely there will be consumers for both forms, so both
converters will need to be maintained.

While new-only is encouraged in the dual use phase, an upconverter can
be used on an as-needed basis to place an OWL model in the OWL GitHub
repo. It should be simultaneous removed from the TSV store, to avoid
stale duplicates.

In emergencies, it is possible to do the reverse: take a model created
in WebPhenote, downconvert to TSV, and edit in Phenote, if some vital
functionality is missing and their is a blocker on implementing it
(obviously we should guard against this scenario).

Note that WebPhenote can be in a beta release during the dual phase,
problems are expected to be easier to recover from.

Full WebPhenote phase

In this phase, the TSV repository is transitioned en-masse to
the OWL repository and retired.

Note there will likely be pipelines consuming the TSVs, so a pipeline
should be set up that downconverts the OWL repo, until the consuming
software can be upgraded.

Note that a number of checks must first be done before switching to
this phase:

does the entire TSV repo convert and roundtrip without issues?
does conversion of the entire repo cause any performance issues for WebPhenote.
- E.g. is the front page selector still performant?
- E.g. do we hit the maximal number of file limit in GH?

If the first fails, then fixes have to be made to the converters.

If the second is a problem, then other strategies can be pursued short
and long term, e.g. consolidating models. It is expected that these
would be resolved fully without workarounds when we switch to a triplestore.

Minor stylistic/usability comments on WebPhenote form

Latest looks good. Comments

'Annotate' button confusingly labeled. Perhaps better as 'add annotation' (and I prefer 'association'). See also the original phenote: a big plus button is fairly unambiguous
Autocomplete on evidence doesn't recognize common ECO synonyms ('TAS' etc). Should also default to something sensible, and perhaps we want to hardcode a slim here, or at least use a slim.
Clone button is cool functionality but not at all obvious what it does by the icon. I can't remember what was in the original desktop phenote but it was fairly clear
Relabel 'Description' as 'Description of phenotype'
Onset should only autocomplete on onsets
It's very easy to forget to click 'Save'. Can we have autosave?
OK, so I have to click the save icon for every row, to save every row, and then save the whole form too?
I'm prohibited from moving from a row / saving a row until I have filled in all the fields it looks like. There is a minimum number of fields that are required for integrity (the datamodel requires a triple, ie D->P) but beyond that things should be optional
It feels a bit button heavy for what at the end of the day is mostly entering a list of phenotypes, optionally qualified

Create a web page and link from old Phenote website

We would like the monarch phenotype noctua interface to have its own distinct web presence. This could simply be the instance of the tool itself (as http://noctua.berkeleybop.org is the instance for GO); if so we need a way to configure the front page a bit more to provide context specific to the instance. We would then coordinate larry to deploy the instance (presumably something like phenotua.monarchinitiative.org).

Or in the interim just an about page that lives somewhere.

We will also need a link to this page from the old phenote website: http://phenote.org/ saying "we are currently developing a web-based version of phenote, head on over to URL to find out more". @nlwashington / @kltm do we remember how to edit that page?

This all presupposes we have figured out the name: phenotue vs webphenote vs...

need a beta instance of webphenote

what is on production is really beta, but we should make it clearer and have a phased release plan

Give each model a meaningful title and set state to production

Use the disease as title, and the state should be 'production' not 'development'

Add tab or csv export feature

would be great to have on the overview of models page, as well as on each individual model, for convenience.

Note that we'll want to agree with @pnrobinson and @drseb on the export format, should be similar/identical to the current HPO annotation format here: https://github.com/monarch-initiative/hpo-annotation-data

cc @cmungall

add term request feature

It would be great if there was a mechanism to request new terms and use a placeholder ID.

We'd want to use some term broker mechanism, and then also be able to report on terms that had recently been implemented within one's annotation set.

Might also be good to be be able to search for terms that have been requested but not yet implemented, so as to not duplicate requests? and/or comment on existing ones?

Can we have some kind of TermGenie tooling inside PhenoTua? or more like PhenoDiscuss, or both?
@cmungall @tudorgroza @jmcmurry

Add service for generating and importing phenopackets

This would be a distinct microservice; the core phenote uses a native RDF/OWL model, this is what will be stored in the repo. Conversion will be via a distinct library/service (I will produce new architecture docs clarifying this).

This service should be considered distinct and can be any language.

One option is to implement in java and use this lib: https://github.com/phenopackets/phenopacket-reference-implementation

As subset of graphs that are generated via phenote forms will turn out to be near-equivalent to the JSON-LD representation of a phenopacket, see phenopackets/phenopacket-format#40 -- unfortunately exact equivalence may be unlikely due to awkwardness of reification (we use OWL2 axiom annotation in phenote/noctua, which looks awful in JSON-LD). But the mapping should be trivial

Add regulatory mutation form

For curating association between mutation, regulatory region, gene and phenotype

Underlying model: will follow FALDO for the genomic variant

Add basic form-based disease-phenotype entry

First pass should have 3 fields:

Disease
Phenotype
Publication

(To make things simpler, as a very first pass, omit 3.)

The value of 1 should be anything from the disease ontology (D)
The value of 2 should be anything from the phenotype ontology (Ph)
The value of 3 will at first pass be a string (first pass) (Pub)

This should map to a graph in the following way:

generate IRI for disease instance, di
generate IRI for phenotype instance, pi
create 3 triples
- di rdf:type D
- pi rdf:type Ph
- di RO_0002200 pi

Check with @hdietze for modeling of Pub

Add field for pathognomonicity

cc @drseb @DoctorBud @pnrobinson

Add ability to export and import phenopackets

See monarch-initiative/monarch-disease-ontology-RETIRED#66

Note that as all phenopackets have a defined structural translation to JSON-LD, then the import function may be equivalent to OWL import (but with additional things to make things conform, such as turning blank nodes into IRIs).

The export function may require tweaking but it's possible but a standard JSON-LD export plus 'frame' capabilities may work

Hook into phenogrid and annotation sufficiency

Add some kind of integration into these

Whilst annotating the user should be able to see sufficiency score and similar entities

This could be a link out or some kind of widget that can be added into either the form or network display

The algorithm is simply to collect all direct classes from the current model and send that. This may send things that are not phenotypes (e.g. genes, age of onset) but it should be up to the server to deal with this.

could not handle batch request

Could not successfully handle batch request. Exception: org.geneontology.minerva.MolecularModelManager$UnknownIdentifierException. Could not validate the id: MESH:C537176 Could not successfully handle batch request. Exception: org.geneontology.minerva.MolecularModelManager$UnknownIdentifierException. Could not validate the id: MESH:C537176

Is it the case that only HP terms are accepted? Perhaps at present, we could at least change the tooltip accordingly.

Identification of the people who create records

We want to be able most easily create proper attribution for annotations or Phenopacket creation. Easiest way to ensure this might be to enable login via ORCID.

IF NOT, other mechanisms for a diversity of person identifiers should be supported, such as a foaf page or a OCLC id.

ALSO, more than one person ID needs to be allowed for each record or phenopacket

Add model organism genotype/allele to disease curation form

In the simplest case this will link a pre-existing feature (pulled from SciGraph) with a phenotype, analogous to #2

We may also allow some part of genotype post-composition

Minerva requests for DOID:1932 and OMIM:105830 do not return same metadata

I have this issue with WebPhenote, Minerva, GOLR that I don't understand how to address, but I suspect that my GOLR query can be improved. I think it may have to do with a clique containing DOID:1932 and OMIM:105830, where OMIM:105830 is the leader, and the metadata is only associated with the leader, OMIM:105830, so DOID:1932 doesn't return useful label/description info.

What I consider a bug (from the WebPhenote user's experience), is that the label for an entry like DOID:1932 is 'Angelman Syndrome', but that WebPhenote cannot later retrieve this label. Does the autocomplete query include information I should add to Minerva?

Here's the problem in detail as manifested in WebPhenote:

User types in 'Angelman' and gets an autocomplete list of many entries, but let's focus on two:

DOID:1932 Angelman Syndrome
OMIM:105830 Angelman Syndrome

If user chooses 'DOID:1932', then that is allowed, but Minerva isn't able to get the label ('Angelman Syndrome'), because Minerva issues the following query to GOLR and doesn't get back a useful label, although it does return a result.

Here is Minerva's request to GOLR:

curl 'https://solr-dev.monarchinitiative.org/solr/ontology/select?defType=edismax&qt=standard&wt=json&indent=on&fl=document_category,annotation_class,annotation_class_label,description,source,is_obsolete,alternate_id,replaced_by,synonym,subset,definition_xref,database_xref,isa_partof_closure,regulates_closure,only_in_taxon,only_in_taxon_closure&facet=false&json.nl=arrarr&q=*:*&rows=100&start=0&fq=document_category:"ontology_class"&fq=annotation_class:"DOID:1932"'

and here is the response, which does not contain an annotation class label or description:

  "response":{"numFound":1,"start":0,"docs":[
      {
        "document_category":"ontology_class",
        "annotation_class":"DOID:1932",
        "is_obsolete":false,
        "regulates_closure":["Orphanet:377788","DOID:0080014","Orphanet:182222","DOID:4","Orphanet:377794","DOID:1932","DOID:630","MESH:D035583","Orphanet:98053","Orphanet:68335","MESH:D025063","MESH:C"],
        "isa_partof_closure":["Orphanet:377788","DOID:0080014","Orphanet:182222","DOID:4","Orphanet:377794","DOID:1932","DOID:630","MESH:D035583","Orphanet:98053","Orphanet:68335","MESH:D025063","MESH:C"]}]
  }}

If instead, the user chooses 'OMIM:105830', then that also succeeds, but in this case, Minerva's request to GOLR returns a full response with a Description and a Label field.

Here is Minerva's request for OMIM:105830:

curl 'https://solr-dev.monarchinitiative.org/solr/ontology/select?defType=edismax&qt=standard&wt=json&indent=on&fl=document_category,annotation_class,annotation_class_label,description,source,is_obsolete,alternate_id,replaced_by,synonym,subset,definition_xref,database_xref,isa_partof_closure,regulates_closure,only_in_taxon,only_in_taxon_closure&facet=false&json.nl=arrarr&q=*:*&rows=100&start=0&fq=document_category:"ontology_class"&fq=annotation_class:"OMIM:105830"'

and a subset of the response, showing the label and description:

  "response":{"numFound":1,"start":0,"docs":[
      {
        "document_category":"ontology_class",
        "annotation_class":"OMIM:105830",
        "annotation_class_label":"Angelman syndrome",
        "description":"Angelman syndrome is a neurodevelopmental disorder characterized by mental retardation, movement or balance disorder, typical abnormal behaviors, and severe limitations in speech and language. Most cases are caused by absence of a maternal contribution to the imprinted region on chromosome 15q11-q13. Prader-Willi syndrome (PWS; OMIM:176270) is a clinically di

The code I'm concerned with is in:

https://github.com/geneontology/minerva/blob/master/minerva-lookup/src/main/java/org/geneontology/minerva/lookup/MonarchExternalLookupService.java#L57

although the actual Golr request is in OwlTools I suspect.

Read/Write Phenopacket format

At least for human level-1, WebPhenote needs to be able to read/write phenopacket format

https://github.com/monarch-initiative/phenopacket-format/blob/master/schema/phenopacket-level-1-schema.yaml

@cmungall @drseb

WebPhenote must archive assertions

At a meeting with @DoctorBud @drseb @kltm

We want to be able to delete assertions, but we never want to lose them. They should be obsoleted/deprecated, using the same annotation properties we use for classes, etc

This will probably be implemented entirely in minerva, cc @hdietze

pick list for age of onset

It is not immediately obvious what to put in "age of onset".
Typing "birth" for instance, does nothing because it expects "congenital".
Not sure how many age terms there are, but loading all of them into a combination autocomplete and dropdown might be the best compromise.
Additionally, having autocomplete start with one typed letter (rather than 3) may in this instance be preferred.

Create phenotua instance driven by monarch.owl

Existing toy instance that we had was driven by a set of toy ontologies. We should now set up the real deal.

For the golr part, we could piggy back off the monarch golr instance (though this may not yet be stable enough). We're not currently loading ontologies in yet but this should be easy.

Roll out to clinicians

Add ability to record comparative phenotype annotations

For example,
het vs. homo vs. hemizygous for a single gene,
multi-gene combinations +/- any given variant
GoF vs. LoF

Most annotations are made in pairs or sets comparing against some control, but the control is not always WT background. We need to be able to capture these "sets" of comparisons.

We should be able to get most of these from the MODs by using the pub or figure as a way to identify the comparisons.

@nlwashington has ideas for UI

Oddities in folding in graph display

See:
http://create.monarchinitiative.org/editor/graph/gomodel:5797c67200000016

The underlying edge is

[a ParkinsonDisease] --has_part--> [a Tremor]

When folding a relationship, the directionality must be preserved. ie this should be shown as:

PD 
has_part(Tremor)

however, it is inverting it:

Tremor
has_part(PD)

it is never semantically correct to do this, unless R is replaced with InverseOf(R). This is the same as translating the base graph to the equivalent:

[a Tremor] --part_of--> [a ParkinsonDisease]

based on the inverse properties declaration

(although a generic graph editor issue this is only affecting monarch so far)

Add export JSON-LD and TSV options

We should have options to export JSON-LD and TSV

For JSON-LD, this should be possible via a standard RDF to JSON-LD converter (ie apache-jena). However, we should tweak this with a standard context file that will compact the JSON to something that conforms to a TBD PhenoJSON schema. It might be simpler to implement custom server side code first of all.

The TSV may be lossy; it may be isomorphic to the forms mapping. TBD.

This ticket is a stub. More details forthcoming. cc @mellybelly @nlwashington

Add link to exemplar model on front page

The front page should have a link to an exemplar model. I will curate the model along with @mellybelly

Hook /annotate/text into web phenote

This interface:
http://monarchinitiative.org/annotate/text

Allows marking up text with ontology classes (including genes), e.g
http://monarchinitiative.org/annotate/text/?q=Tyr+gene+implicated+in+ocular+albinism&longestOnly=true

(Aside: yes there are too many genes, see https://github.com/monarch-initiative/monarch-app/issues/925)

We have an option at the bottom to do an owlsim search (this could be made more visible)

We would like another option that would send the list of genes and ontology classes into noctua to 'seed' a model. Formally, we would create a new set of individuals ClassAssertion(C I) where I is a uuid (generated using normal minerva procedures) and C is the marked up class. There are various smart ways to improve this. The table of results could have checkboxes allowing the user not to flood a model. The NLP could be enhanced to create links between individuals (e.g. an age of onset followed by a phenotype could be linked).

It may be easiest to throw people into the network view rather than the form/table view.

Spoke to @kltm last week and one way to do this would be

user starts in Noctua and logs in
User gets sent to /annotate/text, but with their barista token
The annotator detects this and says "welcome noctua user"
after entering text and getting list, the user also sees a link back to noctua (the app provides this because it knows the user came from here)
the user follows this, which takes them into a new model seeded with their selected classes

This is exactly how galaxy talks to external tools. We are thinking this might be a nice lightweight lightly coupled way for noctua to talk to a variety of external tools. (I thought we had docs on this in the GO wiki but can't find it)

We may want to separate out the annotator into its own standalone component before proceeding with this

Fix OWL model of evidence to conform to standard

See: http://create.monarchinitiative.org/download/gomodel:5797c67200000003/owl

Individual: <http://model.geneontology.org/5797c67200000003/5797c67200000005>

    Annotations: 
******        <http://geneontology.org/lego/evidence> <http://model.geneontology.org/5797c67200000003/5797c67200000007>,
        <http://purl.org/dc/elements/1.1/date> "2016-07-26"^^xsd:string,
        <http://purl.org/dc/elements/1.1/contributor> "http://orcid.org/0000-0002-6601-2165"^^xsd:string

    Types: 
        <http://purl.obolibrary.org/obo/HP_0001337>

    Facts:  

     Annotations: <http://purl.org/dc/elements/1.1/contributor> "http://orcid.org/0000-0002-6601-2165"^^xsd:string, 
                 <http://purl.org/dc/elements/1.1/date> "2016-07-26"^^xsd:string

                 <http://purl.obolibrary.org/obo/RO_0002488>  <http://model.geneontology.org/5797c67200000003/5797c67200000008>

Note that 5797c67200000007 is the ECO instance.

Two issues:

it's using the old vocabulary for evidence (not a big deal, we haven't switched in GO yet)
structurally it's not correct. It's placing the evidence instance as an annotation on the phenotype instance. It should be putting it as an axiom annotation on the doid->hp OPA

https://github.com/geneontology/minerva/blob/master/specs/owl-model.md#axiom-annotations-and-evidence

cc @kltm @balhoff

Consider biocuration project

Use Webphenote and Tudors text mining tool to extract phenotypes for a comprehensive collection of publications about genotype phenotype correlations. For example, FBN1 or LMNA are genes in which distinct mutations can lead to completely different diseases.
Project phases

Set up database to accept phenopackets and HGVS mutations
Biocurate all/most papers about mutations in the gene
Look for genotype phenotype correlations (e.g., compare mutations in certain exons, compare nonsense vs missense mutations etc) and perform chi-squared test for each HPO terms that we have annotation
It might be interesting to do this for one gene like LMNA where correlations are well known and also one for genes without described genotype phenotype correlations

Add a variant-phenotype entry form

This conforms to the simple pattern here. geneontology/noctua#150

Source will be a variant entity, Object will be phenotype (as per disease-phenotype form)

This will need a small amount of upstream work to load the right entities into the OWL. We will use the clinvar ttl, but pun it such that we have variant classes that can be instantiated in the model