monarch-initiative / sepio-ontology Goto Github PK

Ontology for representing scientific evidence and provenance information

Makefile 69.42% Shell 13.64% Batchfile 0.28% Scala 9.60% Ruby 7.06%

monarchinitiative evidence provenance obofoundry

sepio-ontology's Introduction

WARNING: The ontology here is a pre-release draft version that is still evolving, and subject to change. A first official release of a stable core will likely happen in late 2023. If you are making use of the ontology and would like to be updated about major changes and releases, please register in the USERS.md file.

The Scientific Evidence and Provenance Information Ontology (SEPIO) is an OWL ontology developed to support rich, computable representations of the evidence and provenance behind scientific assertions. The core ontology defines a flexible and generic model that can be applied in any domain and extended with domain-specific features. The ontological model is the foundation of a larger SEPIO Framework that provides mechanisms to create custom schema for specific applications that leverage modern semantic web standards. The framework is comprised of four main components:

SEPIO Core Ontology: a computable, 'open-world' domain model encoded in the OWL description logic language.
SEPIO Information Model: provides an informal specification for how terms and design patterns defined in the ontology can be applied as a 'closed-world' model for structuring data.
SEPIO Profiles: subsets of the maximal information model that can be customized and extended to support a particular use case, and implemented in a formal schema language (e.g. JSON schema, ShEx).
SEPIO Value Sets: re-usable collections of terms bound to a particular attribute that can constrain values it can take in a particular Profile.

Data sources or developers interested in using SEPIO should begin by reading the Wiki pages recommended below, and browsing the current version of the SEPIO ontology located here. Comments or questions can be sent to Matthew Brush at [email protected], or posted as tickets in the SEPIO issue tracker.

Resources

The Wiki pages in this repository are currently being updated to reflect the current state of the SEPIO Model and Framework. The pages below are recommended starting points for exploration.

SEPIO Home: Summary level view of most important features and considerations for using SEPIO, with links to deeper dives into specific topics.
SEPIO Framework: Overview of the components of the framework and interactions between them.
SEPIO Ontology: Deeper dive into the foundational SEPIO model as implemented in its core ontology.
ClinGen-ACMG SEPIO Profile: Detailed look at a Profile created to represent rich evidence and provenance for clinical variant pathogenicity interpretations.

License

The SEPIO Ontology and Framework is an open source project, free to re-use and re-mix under a Creative Commons 3.0 BY license.

Development Context / Details:

The ontology is one manifestation of the underlying SEPIO Information Model. This information model is being developed as part of the GA4GH Variant Annotation (VA) workstream, to support data exchange across GA4GH systems, and has drifted from what is in the current SEPIO ontology found here. The ontology will be updated to reflect these changes in the near future (likely fall 2023). A link to documentation for the GA4GH VA Information Model that will the the basis for these updates will be provided here wiht the first release of this model (Summer/Fall 2023).

sepio-ontology's People

Contributors

Stargazers

Watchers

Forkers

bakjones bpow larrybabb color4 clingen-data-model david-w-millar shunsunsun standardgalactic pfabry tnavatar

sepio-ontology's Issues

Contribution entity

Will there be a primary Contribution entity added to the model (in the diagrams on the home page of the wiki, for instance)?

In the ClinGen model we discussed using the Contribution concept to capture the provenance information for Statements.

Can you clarify the plans for including this as a first class concept in the SEPIO model and if not, why?

fix clingen datamodel broken links in wiki & ontobee pages please

@mbrush FYI - the links you are using in the wiki and ontobee entrees (geno for example) to reference the clingen models are old and not working (sorry - we weren't able to reroute them on our side).

Anyway please change all datamodel.clinicalgenome.org references to dataexchange.clinicalgenome.org. Here's an example of a broken link, the first reference in the top paragraph of this page in the SEPIO wiki
https://github.com/monarch-initiative/SEPIO-ontology/wiki/ClinGen-ACMG-Variant-Interpretation

and here's an example of an old/broken link in geno
http://www.ontobee.org/ontology/GENO?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FGENO_0000890
(several throughout the annotation section).

Add obofoundry topic to repo metadata

The OBO Foundry has recently created a topics page at https://github.com/topics/obofoundry. It would be great if you could add the obofoundry topic to your repository, since it's currently listed as active in the OBO Foundry. If you're not sure how to do that, follow the instructions here. The main issue for this task is at OBOFoundry/OBOFoundry.github.io#1538 in case you want some more context or to join the larger discussion.

ClinVar review status

Could we add ClinVar's review status terms to SEPIO? https://www.ncbi.nlm.nih.gov/clinvar/docs/review_status/ cc @justaddcoffee @mbrush

Please clarify Readme

Hello!

Could you please descirbe "SEPIO" abbreiation plus add descitpion about meaning and possible applications of your ontology? Maybe provide some references.

Use of 'Activities' or 'Contributions' to describe and organize Entity provenance

Many efforts like ClinGen will have use cases that would benefit from explicitly describing the activities through which Entities in our model (Assertions, Supporting Information, Evidence Lines) are created.
ClinGen's main use case here is to organize provenance information into discrete, traceable objects for a variety of reasons related to the applications they must support, as detailed here.

I imagine other, similar use cases coming form other groups wanting to model evidence and provenance information in fine detail. With this in mind, we should consider possible approaches for allowing for richer description of provenance of Entities in a way that is compatible with and easily harmonized with more compact representations that will support most use cases (where Entities are directly linked to agents who contributed to them and dates of these contributions, using PAV-like relations).

We are currently exploring the creation and use of 'Contribution' objects - essentially representing reified contribution relationships between an agent and an Entity. This is similar in principle to the PROV notion of an 'Attribution' - but extended to allow time stamps and roles to be added in this context.

Alternatively, we could allow for a minimal representation of Activities, whose use is limited to describing provenance of entities, but which would be used to create activity-based paths through the data in the style of PROV (which would describe a VarianntInterpretation as a series of Activities with inputs and outputs and agents).

Diagrams of proposed patterns based on these approaches can be found in the cmap here.

Defining ontology classes for typing of EvidenceLines

In adopting SEPIO, ClinGen now creates EvidenceLine objects in their model. I propose we develop a type system for classifying these objects. This will provide an important level of interoperability with other data sets with minimal E/P metadata that use only ECO codes to define a 'type' of evidnece used.

Requirements here include:

Evidence classes need to be interoperable with the widespread use of evidence types from efforts like the Evidence and Conclusion Ontology (ECO). We will work with ECO as possible to extend their model to include evidence types relevant to variant pathogenicity interpretation.
Evidence classes need to accommodate evidence types that are implicit or explicit in frameworks like the ACMG guidelines and Invitae's Sherloc.
Evidence classes need to be consistent with the granular metadata being represented in models like ClinGen's - and created at a level that is easy to apply in practice.

SEPIO:0000111 deprecated

the term SEPIO:0000111 with label(s)
'is_assertion_supported_by' or
"is_assertion_supported_by_evidence"

has been published in code and is now absent

Please publish that it is deprecated and include alternative(s) to consider

PURL does not resolve

SEPIO doesn't seem to be available at http://purl.obolibrary.org/obo/sepio.owl. Is this intentional?

dcterms:source punned as both ObjectProperty and AnnotationProperty

sepio.owl includes http://purl.org/dc/terms/source as both an ObjectProperty and an AnnotationProperty

This leads to an error message when opening the ontology in Protege:

Illegal redeclarations of entities: reuse of entity http://purl.org/dc/terms/source in punning not allowed [Declaration(ObjectProperty(<http://purl.org/dc/terms/source>)), Declaration(AnnotationProperty(<http://purl.org/dc/terms/source>))]

And also throws a TypeError when trying to load using owlready2:

TypeError: Property 'http://purl.org/dc/terms/source' is both an ObjectProperty and an AnnotationProperty!

Make sure sepio.owl does not re-declare dc:description as a data property

https://github.com/monarch-initiative/SEPIO-ontology/blob/master/src/ontology/sepio.owl#L1706

That causes massive confusion for some downstream uses.

ClinGen cmap: assessment strength, outcome, confidence clarification

@mbrush Can we dig into the stength, confidence and outcome associations that have been drafted on the 3-15-17 ClinGen cmap at our next Tuesday meeting?

I'd like to sort this out.

Is confidence a generalizable association? and if so, should it hang off of evidence lines, assertions or data or some combination?
the outcome and strength associations were created by ClinGen originally as a single "outcome" concept to capture the assessor's evaluation of whether a particular acmg rule/criterion was met/unmet(insufficient)/refuting and if met, the strength and direction (eg. strong path, very strong path, supporting benign, moderate benign, etc...) of that particular rule and associated data based on that assessor's objectivity.

We discussed putting the strength on the evidence which is the product of the criterion assessment, but wouldn't that be yet another assertion and would it be done independently of the criterion assessment?

I suppose we can seperate the strength assessment from the criterion/data evaluation/assessment (i.e. outcome) but I'm not sure I want to add "specialized" strength codes to evidence lines. It seems more logical to capture this with the criterion assessment since that ends up being a unique form of data that can be used as evidence when making an interpretation.

Maybe the "confidence" (which seems more generic) should be the concept which is associated to evidence. It seems that this is something that all users of evidence would want to be able to make on their own, not relying on a third party to assess. Unless of course someone wanted to use someone else's evidence line as supporting data for their own evidence. Even so, the agent that defines the evidenceline should (IMO) be the one that owns the confidence call of that evidence line.

I'll defer to your judgement to help sort this out and settle this for our near term goals.

We do need another pass at this to get it in a draft state that we can start documenting (IMO).

Label and definition modifications for SEPIO:0000187 Confidence level

Confidence level is currently defined as: "A data item that quantifies the level of confidence an agent has that a particular piece of information is true."
A confidence level quantifying a level of confidence seems circular to us.

Moreover, since confidence level is asserted as a subclass of measurement datum, its definition should read: "A measurement datum that [...]"

Suggested label: confidence level measurement datum
This can avoid a confusion with a possible mental entity that would be the confidence level of some agent.

Include Confidence Information Ontology?

Hi,

do you plan to include a concept of "confidence"? See http://database.oxfordjournals.org/content/2015/bav043.long and https://github.com/BgeeDB/confidence-information-ontology

SEPIO:0000019

published in code, but no longer exists.
was ! 'created_at_location'

it is possible but not proven that
although defined, it may not be used now.

in sepio_developer.owl it is mentioned as:
! "x-created_at_location"

PURL returns invalid document

curl -L http://purl.obolibrary.org/obo/sepio.owl

Returns:

SignatureDoesNotMatchThe request signature we calculated does not match the signature you provided. Check your key and signing method.AKIAIWNJYAX4CSVEH53AAWS4-HMAC-SHA256
20200507T221904Z
20200507/us-east-1/s3/aws4_request
012342f73cd12b2b3172a527d54d10bf6224be2244b84cc8753f6f3957815dde587eb603ecfd48b688ac854e54a485d5c4f1cceb41e8e048a1e2d06b983c1f4341 57 53 34 2d 48 4d 41 43 2d 53 48 41 32 35 36 0a 32 30 32 30 30 35 30 37 54 32 32 31 39 30 34 5a 0a 32 30 32 30 30 35 30 37 2f 75 73 2d 65 61 73 74 2d 31 2f 73 33 2f 61 77 73 34 5f 72 65 71 75 65 73 74 0a 30 31 32 33 34 32 66 37 33 63 64 31 32 62 32 62 33 31 37 32 61 35 32 37 64 35 34 64 31 30 62 66 36 32 32 34 62 65 32 32 34 34 62 38 34 63 63 38 37 35 33 66 36 66 33 39 35 37 38 31 35 64 64 65GET
/53617485/62b3e300-c000-11e9-855f-9a8d07626370
X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20200507%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20200507T221904Z&X-Amz-Expires=300&X-Amz-SignedHeaders=host&actor_id=0&repo_id=53617485&response-content-disposition=attachment%3B%20filename%3Dsepio.owl&response-content-type=application%2Foctet-stream
host:github-production-release-asset-2e65be.s3.amazonaws.com

host
UNSIGNED-PAYLOAD47 45 54 0a 2f 35 33 36 31 37 34 38 35 2f 36 32 62 33 65 33 30 30 2d 63 30 30 30 2d 31 31 65 39 2d 38 35 35 66 2d 39 61 38 64 30 37 36 32 36 33 37 30 0a 58 2d 41 6d 7a 2d 41 6c 67 6f 72 69 74 68 6d 3d 41 57 53 34 2d 48 4d 41 43 2d 53 48 41 32 35 36 26 58 2d 41 6d 7a 2d 43 72 65 64 65 6e 74 69 61 6c 3d 41 4b 49 41 49 57 4e 4a 59 41 58 34 43 53 56 45 48 35 33 41 25 32 46 32 30 32 30 30 35 30 37 25 32 46 75 73 2d 65 61 73 74 2d 31 25 32 46 73 33 25 32 46 61 77 73 34 5f 72 65 71 75 65 73 74 26 58 2d 41 6d 7a 2d 44 61 74 65 3d 32 30 32 30 30 35 30 37 54 32 32 31 39 30 34 5a 26 58 2d 41 6d 7a 2d 45 78 70 69 72 65 73 3d 33 30 30 26 58 2d 41 6d 7a 2d 53 69 67 6e 65 64 48 65 61 64 65 72 73 3d 68 6f 73 74 26 61 63 74 6f 72 5f 69 64 3d 30 26 72 65 70 6f 5f 69 64 3d 35 33 36 31 37 34 38 35 26 72 65 73 70 6f 6e 73 65 2d 63 6f 6e 74 65 6e 74 2d 64 69 73 70 6f 73 69 74 69 6f 6e 3d 61 74 74 61 63 68 6d 65 6e 74 25 33 42 25 32 30 66 69 6c 65 6e 61 6d 65 25 33 44 73 65 70 69 6f 2e 6f 77 6c 26 72 65 73 70 6f 6e 73 65 2d 63 6f 6e 74 65 6e 74 2d 74 79 70 65 3d 61 70 70 6c 69 63 61 74 69 6f 6e 25 32 46 6f 63 74 65 74 2d 73 74 72 65 61 6d 0a 68 6f 73 74 3a 67 69 74 68 75 62 2d 70 72 6f 64 75 63 74 69 6f 6e 2d 72 65 6c 65 61 73 65 2d 61 73 73 65 74 2d 32 65 36 35 62 65 2e 73 33 2e 61 6d 61 7a 6f 6e 61 77 73 2e 63 6f 6d 0a 0a 68 6f 73 74 0a 55 4e 53 49 47 4e 45 44 2d 50 41 59 4c 4f 41 44F6F9557FC4D94B2DDWfrgYgcB4J7IrYQ3S6JRibncMhpEo+bD+KLzTgpawF51HQIbvsuuqjMVO0XXs8YicLvIcdxoJw=

Provenance: Agent, Contribution competing approaches

In order to have Agent more closely reflect the "Agent" concept from W3C Provenance's model we should consider

a recursive "actedOnBehalfOf" relationship between Agents to allow for agents that are part of a hierarchical organization structure.
the notion of a role that to specify how the agent played the role ( I realize this is in contribution, but I'm unclear how not having it in the direct Agent to Entity or Agent to Activity relationship is useful)
we should segregate "activity" entities from "entity" entities more strictly so as to not conflate "entities" with activity-like information such as date/times, either that or we should strictly define the built-in activity information "created", "modified", "approved/asserted", "published", "curated", etc...

Contribution is a better bundling of Agent and Activity concepts from W3C Prov (IMO) agent, role and date/time of the activity. But it is confusing when similar attributes are also provided on the entities themselves. I like flexibility, but not at the loss of consistency.

For example,
Assertion has (stated_by and date_stated) as well as (validated_by and date_validated) built in.
EvidenceLine has (evidence_strength_ assessed_by and date_evidence_strength_ assessed) built in.
EvidenceItem has (date and specifiedBy) built in (not sure if these go together).
Only EvidenceItem has a relationship with Activity - is that intentional?

If Assertion and EvidenceLine are to contain the precise provenance information for the agent and date/time of the activities "stating"/"validating" and "strength_assessing", respectively, then why do we have the Contribution entity? Is it for additional kinds of contributions? If so, then it should be clarified. I don't think it is right to have alternative approaches to representing the same information (at least not such significantly structural differences).

I recommend that we standardize how provenance data is structured throughout the model so that we can reduce complexity and confusion for adopters.

Draft Entity Attribute Tables

Testing format and content for these, which will live in Wiki. Would ultimately like these to be auto-generated from some central source of truth (perhaps the owl file itself).

Utility or capturing why an ACMG Criteria is unmet

In the course of writing the ticket below, I can to the conclusion that it is actually not useful to distinguish between a criterion being unmet due to insufficient data versus the criterion being definitive refuted - at least not from the perspective of using the refuted criterion as meaningful evidence for a target VariantInterpretation.

Feel free to skip to the punchline in the last paragraph - but I recorded my complete thoughts and path toward changing my tune here, as it may prove useful for others.

There has been recent debate about the practicality and utility of distinguishing between the notions of a criterion being unmet-insufficient (when there is not sufficient information to determine if the criterion is met), versus unmet-refuted (when there is sufficient information to definitively say that the criterion is not met).

One argument against making this distinction was that the ACMG calculator does not use info about unmet criteria so it is not necessary to capture in this context. My counter to this is that in a broader context, outside of this framework, this distinction provides very useful information that is relevant to evaluating the validity of a target VariantInterpretation. Capturing it presents a fuller picture of the evidence for a given VariantInterpretation, that can be used by other types of evaluation frameworks outside the ACMG - e.g. those being developed by Monarch/SEPIO.

The value of this distinction of course is that an unmet-refuted outcome means that the CrterionAssessment can be used as meaningful as evidence to evaluate the target VariantInterpretation represents - i.e. the outcome is strong enough to impact the probability of accepting the proposition put forth in the VariantInterpretation. For example, consider a VariantInterpretation asserting a variant to be pathogenic, and a PM2 (absent in control pops) CriterionAssessment that is unmet (because the variant was present at significant frequency in general populations). If we know that it is unmet definitively (i.e. unmet-refuted, rather than unmet because of insufficient information), then this becomes meaningful as evidence that disputes to some degree the validity of the pathogenic VariantInterpretation. If we only know that it is unmet, then we are unable to use this information as evidence in evaluating the VariantInterpretation.

Now of course, many criterion such as the PM2 example above have counterparts with the opposite default interpretation/direction. For example, PM2 (absent in control pops) has BA1 (present above 5% in population) as counterpart. Saying that PS2 is refuted is roughly the same as saying that BA1 is met - so recording both is duplicative. In such cases, it is less important to know that a P criterion is refuted because there will presumably being a corresponding B criterion that is met and gives us the same information. Other examples of paired counterparts are PP1 and BS4 (does/does not segregate with disease), and PS3 and BS3 (damaging/not damaging effect demonstrated by functional studies).

But many criterion don't have counterparts - such that we miss meaningful information not capturing that a criterion is definitively unmet (i.e. refuted).

OR DO WE?? . . . Below I evaluate each criteria that is not paired with a counterpart, do determine if refution of these criteria provides meaningful evidence against the criterion's default interpretation (i realize that 'refution' is not a word, but using it anyway :)

Criteria with no counterpart:

PVS1 - Null Variant in known LOF genes: refution based on not being predicted as 'null' would presumably be complemented with a B criteria that describes non-null impact. If refuted because not in a known LOF gene, this provides no definitive evidence because the gene may become linked to a disease in the future
PS1 - same as previous pathogenic amino acid change: refution provides no meaningful evidence against a pathogenic interpretation (just because the affected aa not currently known to be pathogenic, doesn’t mean if wont one day be so)
PS2/PM6 - confirmed de novo / assumed de novo: refution provides no meaningful evidence against a pathogenic interpretation . . . the fact that a variant is not de novo doesn’t provide evidence against pathogenicity
PM1 - Located in hot spot w/out benign variation: refution provides no meaningful evidence against a pathogenic interpretation (just because the affected region is not currently a known hotspot, doesn’t mean it isn't one. and even if it is not, this provides no meaningful evidence against pathogenicity (affecting a hotspot is not necessary condition for pathogenicity)
PM5 - Located in hot spot w/out benign variation: refution provides no meaningful evidence against a pathogenic interpretation (affecting a known pathogenic aa is not a necessary condition for pathogenicity - so no evidence provided by this being refuted)
PP4 - disease specific phenotype and family history: refution provides no meaningful evidence against a pathogenic interpretation (patient exhibiting disease specific phenotypes is not a necessary condition for a variant's pathogenicity, so no evidence provided by this being refuted)
BP5 - alternate cause of disease: refution provides no meaningful evidence against a pathogenic interpretation (finding an known alternate cause of the disease is not a necessary condition for benign-ness, so no evidence provided by this being refuted . . . i.e. the fact that no alternate cause was found does not meaningfully argue against the benign-ness of the variant)
BP7 - silent variant no splicing impact and low conservation: refution based on variant having such predicted impacts would presumably be complemented with a P criteria that describes such predicted impacts

Punchline: The take home from my analysis above of the ACMG criteria not paired with a counterpart in opposite direction (P vs B) is that the ACMG criteria are smart/complete enough that if the creators thought that that refution of a criteria signified evidence in the other direction (i.e. refuition of a P criteria provided meaningful evidence against pathogenicity), then a counterpart criteria was created to capture this. From my analysis - all criteria for which there was no counterpart are cases where refution of the criteria provides no meaningful evidence against the default interpretation (P vs B). If this is not the case, perhaps the solution is to create a new criteria. rather than force capture of refution.

SEPIO:0000412 lacks label

I'm pretty sure this is the 'Value Set' class we've included for our ClinGen extension, but I wanted to make you aware of this.

Consider use of SEPIO for representing evidence for expertise

VIVO ontology and projects is concerned with udnderstanding what evidence exists to suggest a person has expertise, and what degree of expertise, in a particular subject. Consider if aspects of the SEPIO model of evidence may be relevant here. See gdoc here: https://docs.google.com/document/d/1T92B9H7c7R7zrbyZ0DubAc49CxVzt__FNKk3K6bYpys/edit

Statement WIKI text sentence needs rewording.

In the Statements wiki page within the Scope & Usage Section under item 1. Study Findings, there is a sentence in the middle of the paragraph that reads poorly

...
This does not mean that the observations a Finding reports are necessarily accurate reflections of reality - only that they were indeed made in a particular study.
...

Please consider rewording.

Questions re Evidence Item and Statements?

Statements wiki page shows no super type but has subtypes of Study Finding and Assertion.

Q1 - Why is there no supertype on Statement?

According to the wiki page Evidence Item it has a supertype of ICE and subtypes of Evidence Statement and Evidence Data

Q2 - Can you clarify the differences and relationship of the following types?

Evidence Statement, Statement
Evidence Data, Study Finding

SEPIO cannot be released as OBO or obographs JSON due to various illegal statements

Prioritise after #29

new object property: is described in

Need a way of connecting an ECO (inference) with a publication IAO instance.

Was thinking named InverseOf(ioa:isAbout)

Bit perhaps something constrained to be a domain of an ECO class would be better.

SEPIO defining several dc properties as DPs, causing illegal punning when integrating

SEPIO is defining SEPIO defining several dc properties as DPs, causing illegal punning when integrating with other ontologies.

see also information-artifact-ontology/ontology-metadata#90 for a lengthy discussion why that is a problem.

Before we get into the details on how to fix this: @mbrush (or anyone else), what are the use cases for all of these different date properties? Whould it be a problem to redefine them as annotation properties?

Modeling 'direction' of evidence as a relationship vs an attribute of the evidence line

SEPIO needs to be able to describe evidence 'direction' - to indicate when a particular line of evidence is supporting, refuting, or provides inconclusive information relative to a target assertion. The current SEPIO model captures this information via the relationship is used to link an assertion to an evidence line (has_supporting_evidence, has_refuting_evidence, has_inconclusive evidence).

An alternate approach is to describe directionality as an attribute of an EvidenceLine. Here we would create attribute terms (Supporting, Refuting, Inconclusive) that can be hung from an EvidenceLine to describe the direction of support it provides its target assertion.

Characterization of the 'Evidence Line' concept

To date we have considered three perspectives on what an Evidence Line is and how it should be described in SEPIO: (1) as a collection of information; (2) as an argument ; and (3) as a reified relationship; and (4) an interpretation.

This ticket proposes definitions from each perspective, with the goal of settling on one as the primary perspective and definition for this concept. But we can still describe evidence lines form these other to further clarify and complement the primary definition.

1. Evidence Lines as Collections of Evidence Items

Def 1A = An Evidence Line represents a set of one or more Evidence Items interpreted together to provide an independent and meaningful argument for evaluating the validity of a particular Proposition.
Def 1B = An Evidence Line represents a grouping of Evidence Items evaluated together to make an independent argument for or against a particular Proposition.

2. Evidence Lines as 'Arguments'

Def 2A = An Evidence Line represents an independent and meaningful argument for or against a particular Proposition, that is based on the interpretation of one or more pieces of information as evidence.
Def 2B = An Evidence Line represents an independent, evidence-based argument about the validity of a particular Proposition.

3. Evidence Lines as Reified Relationships

Def 3A = An Evidence Line describes a relationship between a set of one or more Evidence Items and a particular Proposition, wherein the Evidence Items are interpreted together as one independent and meaningful argument for evaluating the Proposition's validity.
Def 3B = An Evidence Line describes a relationship between a set of one or more Evidence Items and a particular Proposition, where the set of Evidence Items are interpreted to make an independent and meaningful argument relevant to the Proposition's validity.

4. Evidence lines as Interpretations

Def 4 = An evidence line represents an agent's interpretation of one or more pieces of information as an independently meaningful argument relevant to the validity of a particular proposition.

Consider also the utility of the following descriptions and examples:

Description:
An Evidence Line is created through the interpretation of one or more pieces of information that collectively support a meaningful argument for or against a proposition. To qualify as an Evidence Line, this argument must be independently significant as evidence - i.e. it must be capable of affecting the probability of accepting the target proposition as true. This does not mean, however, that it is independently sufficient to establish belief in the Proposition, as additional Evidence Lines may be required to ultimately accept the Proposition as true.

For example, in the ACMG framework establishes 'absence in population databases' as a type of Evidence Line that can argue for the pathogenicty of a particular variant. But this argument alone is not considered sufficient to establish a variant's pathogencity, as the other types of evidence are additionally required to establish the truth of this Proposition (e.g. a line of evidence demonstrating the variant to have a deleterious effect on protein function, or showing it to segregate with disease features in a family tree).

Example:
An example of an evidence line would be the argument that a finding such as "Lepr1 KO mice exhibit lower blood glucose levels than matched WT controls" makes in support of the Proposition that "Lepr1 gene is involved in diabetes". The Evidence Items supporting this line of evidence could include experimental data from a study exploring blood glucose levels in Lpr1 KO mice, such as a 548.5 mg/dl measurement of blood glucose in a Lepr^tm1b/tmb1 mutant mouse, or a 1.3951e-24 p-value indicating this measure to be significantly different from wild-type mice.

Here, the finding and its supporting data exist independently of their use as evidence. An Evidence Line instance based on this finding comes into existence only when an agent interprets this it as an independently meaningful argument for a particular Proposition, in the act of making an Assertion.

A Note on Propositions:
Recall that "Propositions" represent the sharable meaning expressed in a particular Assertion. They are abstract entities that, like numbers, are independent of space and time. They represent the core fact or meaning that is put forth as true in an Assertion. They are 'sharable' in the sense that the same Proposition can be expressed in separate Assertions made by different Agents, on different occasions. So a single Proposition that "Lepr receptor inactivation causes increased blood glucose levels" can be put forth in separate Assertions made by Dr. Smith and Dr. Jones.

Evidence is evaluated to assess the truth of a Proposition that is put forth in a particular assertion - but we will often talk about evidence as used to evaluate an Assertion as a succinct way to express this sentiment.

ClinVar introduced MethodType "phenotyping only"

Previously all (provided) ClinVar accessions were typed as literature only
and we tagged them with SEPIO:0000080

with this release a new type appears in a half dozen instances
(3 RCV & 3 SCV)

 115657         <MethodType>literature only</MethodType>
      6         <MethodType>phenotyping only</MethodType>

I do not find and existing SEPIO term which seems appropriate for this new ClinVar method type.

For reference the ClinVar accession with the new type are:

RCV000490816
- SCV000502994
RCV000494689
- SCV000579222
RCV000494691
- SCV000574694

License info disparity

License info for SEPIO in README.md says Creative Commons 3.0 BY:

License
The SEPIO Ontology and Framework is an open source project, free to re-use and re-mix under a Creative Commons 3.0 BY license.

By the link in said text links to Creative Commons 2.0 BY
(i.e. https://creativecommons.org/licenses/by/2.0/)

What is the development status of SEPIO?

Dear SEPIO team,

for my ongoing work in the Academic Event Ontology I was looking into SEPIO, hoping to find terms and relations I could reuse for the modeling of contributions to academic events (planned processes). But from looking at the open issues and last commits, it seems as if this repo is not really worked on anymore. Am I correct with this assumtion?

Review and refactor modeling of ClinGen 'Data' objects

The ClinGen model currently describes ~ 20 types of 'Data' objects that are used to describe information curated from a specific source and used as evidence supporting CriterionAssessments. The semantics of these are informally described using large number of ad hoc properties to organize terms or data values under a given Data object. One issue for SEPIO alignment is that the ontological nature of these objects in the context of SEPIO is not clear. Specifically, we would want to characterize them as being either:

(1) 'Assertions' in cases where the object simple conveys a statement of purported fact , in the absence of more foundational evidence information supporting this statement;
or
(2) 'StudyFindings' in cases where the object represents the outcome of a specific study that is directly relevant to the validity of the target assertion (and often captures data/metadata from this study)

Making such a distinction would facilitate extension of the model to describe evidence for Data objects that map to Assertions, through addition of evidence lines that organize the underlying evidence for these claims.

We have started a google doc here that evaluates each 'Data' type to determine whether it would best be represented as an Assertion, or as more foundational Study Data.

Define rules for determining 'direction' of evidence provided by Criterion Assessments

In the ClinGen Interpretation data model that is currently being aligned with SEPIO, CriterionAssessments are assertions based on ACMG Criteria that are used as a first tier of evidence for Variant Interpretations. There are a number of different possibility that can result from a given CriterionAssessment, depending on whether the ACMG criterion used is: (1) determined to be met, unmet, or refuted; and (2) how the class of criterion used (P vs B) aligns with the final call made in the target Variantinterpretation.

In short, we will have to translate the outcome for a CriterionAssessment used as evidence (i.e. whether a B or P criterion is met or unmet for a VariantInterpretation of benign or pathogenic), into a directionality for the relationship used in linking the VariantInterpretation to its CA-supported EvidenceLine (i.e. whether the CA provides supporting, refuting, or inconclusive evidence).

We have had initial discussions on this topic, as documented in the 2-23-17 cmap here.

Update relationships between Assertions and Evidence Lines in figure in readme

The figure in the readme (https://github.com/monarch-initiative/SEPIO-ontology/blob/master/docs/SEPIO%20Wiki%201.jpg) is not in line with current thoughts on the overall model (at least with regard to discussions regarding the ClinGen interpretation model.

Two things that immediately come to mind:

The diagram only shows Assertions being related to Evidence Lines through Assertion Process nodes (which you've discussed eliminating from the model.
The relationships in point 1 above are each one-to-one, which lead to an implicit 1 to 1 relationship between an Assertion and an Evidence Line, but my understanding is that an assertion may be supported by multiple Evidence Lines

SEPIO use cases related to globalbioticinteractions.org / GloBI

Hi!

I was pretty excited to stumble across SEPIO and I was hoping to get some guidance on adopting SEPIO by sharing the following use cases:

modeling evidence supporting, refuting or capturing "reasonable doubt" related to species interaction statements: an example of a past discussion about a claim that Sea Otters (Enhydra) eat American Beavers (Castor canadensis) can be found here: globalbioticinteractions/globalbioticinteractions#118. Rather than describing the statements made in free-form text in an issue or encoded in some java code, I'd like to be able allow for modeling the statements and their relationships as part of our data model.
capturing taxonomic name correction and linking process: a big chunk of GloBI concerns taxonomic name / term matching. Currently, a rudimentary model exists in GloBI to record how a taxonomic name was originally described in a data publication and how the name was eventually linked to external name sources like itis.gov or ncbi's taxonomy. Rather than implicitly documenting the name matching, correction and linking process in source code and wiki page (e.g. https://github.com/jhpoelen/eol-globi-data/wiki/Taxonomy-Matching), I was hoping to explicitly encode the process and its outcome as part of the data model.

Please let me know if these use cases make sense whether they are suitable to help drive the ongoing development of SEPIO. I'd be happy to try and adopt the ontology, but will probably need some guidance in getting started.

Curious to hear your thoughts on possible next steps.