Coder Social home page Coder Social logo

geneontology / obographs Goto Github PK

View Code? Open in Web Editor NEW
62.0 19.0 12.0 13.37 MB

Basic and Advanced OBO Graphs: specification and reference implementation

Java 97.54% Shell 2.46%
obofoundry owl json ontology-standards ontologies graphs gene-ontology geneontology ontology

obographs's Introduction

Build Status Maven Central javadoc

OBO Graphs : Developer-friendly graph-oriented ontology JSON/YAML

This repo contains both a specification for a JSON/YAML format for ontology exchange, plus a reference java object model and OWL converter.

The core is a simple graph model allowing the expression of ontology relationships like forelimb SubClassOf limb:

  "nodes" : [
    {
      "id" : "UBERON:0002102",
      "lbl" : "forelimb"
    }, {
      "id" : "UBERON:0002101",
      "lbl" : "limb"
    }
  ],
  "edges" : [
    {
      "subj" : "UBERON:0002102",
      "pred" : "is_a",
      "obj" : "UBERON:0002101"
    }
  ]

Additional optional fields allow increased expressivity, without adding complexity to the core.

For more examples, see examples/ in this repo - or for real-world examples, this drive. Soon we hope to have this incorporated into release tools and visible at standard PURLs.

For the JSON Schema, see the schema/ folder

If you are familiar with OWL, skip straight to the OWL mapping specification

Motivation

Currently if a developer needs to add ontologies into a software framework or tool, there are two options for formats: obo-format and OWL (technically obo is an OWL syntax, but for pragmatic purposes we can separate these two).

This presents a number of problems: obo is simple, but employs its own syntax, resulting in a proliferation of ad-hoc parsers that are generally incomplete. It is also less expressive than OWL (but expressive enough for the majority of bioinformatics tasks). OWL is a W3 standard, but can be difficult to work with. Typically OWL is layered on RDF, but RDF level libraries can be too low-level to work with (additionally: rdflib for Python is very slow). For JVM languages, the OWLAPI can be used, but this can be abstruse for many routine tasks, leading to variety of simplifying facades each with their own assumptions (e.g. BRAIN).

Overview

OBO Graphs (OGs) are a graph-oriented way of representing ontologies or portions of ontologies in a developer-friendly JSON (or YAML) format. A typical consumer may be a Python developer using ontologies to enhance an analysis tool, database search/infrastructure etc.

The model can be understood as two levels: A basic level, that is intended to satisfy 99% of bioinformatics use cases, and is essentially a cytoscape-like nodes and edges model. On top of this is an expressive level that allows the representation of more esoteric OWL axioms.

Basic OBO Graphs (BOGs)

The core model is a property-labeled graph, comparable to the data model underlying graph databases such as Neo4j. The format is the same as BBOP-Graphs.

The basic form is:

"graphs": [
  {
     "nodes" : [...],
     "edges" : [
     ],
  },
  ...
]

Here is an example of a subgraph of Uberon consisting of four nodes, two part-of and two is_a edges:

{
  "nodes" : [
    {
      "id" : "UBERON:0002470",
      "lbl" : "autopod region"
    }, {
      "id" : "UBERON:0002102",
      "lbl" : "forelimb"
    }, {
      "id" : "UBERON:0002101",
      "lbl" : "limb"
    }, {
      "id" : "UBERON:0002398",
      "lbl" : "manus"
    }
  ],
  "edges" : [
    {
      "subj" : "UBERON:0002102",
      "pred" : "is_a",
      "obj" : "UBERON:0002101"
    }, {
      "subj" : "UBERON:0002398",
      "pred" : "part_of",
      "obj" : "UBERON:0002102"
    }, {
      "subj" : "UBERON:0002398",
      "pred" : "is_a",
      "obj" : "UBERON:0002470"
    }, {
      "subj" : "UBERON:0002470",
      "pred" : "part_of",
      "obj" : "UBERON:0002101"
    }
   ]
}

The short forms in the above (e.g. UBERON:0002470 and part_of) are mapped to unambiguous PURLs using a JSON-LD context (see below).

Edges can also be decorated with Meta objects (corresponding to reification in RDF/OWL, or edge properties in graph databases).

Formally, the set of edges correspond to OWL SubClassOf axioms of two forms:

  1. C SubClassOf D (aka is_a in obo-format)
  2. C SubClassOf P some D(aka relationship in obo-format)

For a full description, see the JSON Schema below

Nodes collect all OWL annotations about an entity.

Typically nodes will be OWL classes, but they can also be OWL individuals, or OWL properties (in which case edges can also correspond to SubPropertyOf axioms)

Nodes, edges and graphs can have optional meta objects for additional metadata (or annotations in OWL speak).

Here is an example of a meta object for a GO class (show in YAML, for compactness):

  - id: "http://purl.obolibrary.org/obo/GO_0044464"
    meta:
      definition:
        val: "Any constituent part of a cell, the basic structural and functional\
          \ unit of all organisms."
        xrefs:
        - "GOC:jl"
      subsets:
      - "http://purl.obolibrary.org/obo/go/subsets/nucleus#goantislim_grouping"
      - "http://purl.obolibrary.org/obo/go/subsets/nucleus#gosubset_prok"
      - "http://purl.obolibrary.org/obo/go/subsets/nucleus#goslim_pir"
      - "http://purl.obolibrary.org/obo/go/subsets/nucleus#gocheck_do_not_annotate"
      xrefs:
      - val: "NIF_Subcellular:sao628508602"
      synonyms:
      - pred: "hasExactSynonym"
        val: "cellular subcomponent"
        xrefs:
        - "NIF_Subcellular:sao628508602"
      - pred: "hasRelatedSynonym"
        val: "protoplast"
        xrefs:
        - "GOC:mah"
    type: "CLASS"
    lbl: "cell part"

Expressive OBO Graphs (ExOGs)

These provide ways of expressing logical axioms not covered in the subset above.

Currently the spec does not provide a complete translation of all OWL axioms. This will be driven by comments on the spec.

Currently two axiom patterns are defined:

  • equivalenceSet
  • logicalDefinitionAxiom

Note that these do not necessarily correspond 1:1 to OWL axiom types. The two above are different forms of equivalent classes axiom, the former suited to cases where we have multiple ontologies with the same concept represented using a different URI in each (for example, a DOID:nnn URI and a Orphanet:nnn URI with a direct equivalence axiom between them).

The latter is for so called 'cross-product' or 'genus-differentia' definitions found in most well-behaved bio-ontologies.

See README-owlmapping.md for mor details

Comparison with BBOP-Graphs

See bbop-graph

  • Top-level object in a bbop-graph is a graph object; in obographs a GraphDocument is a holder for multiple graphs
  • meta objects are underspecified in bbop-graphs

Comparison with SciGraph

See Neo4jMapping

The mapping is similar, particularly with respect to how SubClassOf axioms map to edges. However, for SciGraph, more advanced axioms such as EquivalenceAxioms are mapped to graph edges. In obographs, anything outside the BOG pattern is mapped to a custom object.

Note also that SciGraph returns bbop-graph objects by default from graph query operations.

Running the converter

mvn install
./bin/ogger  src/test/resources/basic.obo 

Note that the conversion will be rolled into tools like ROBOT obviating the need for this. We can also make it such that the JSON is available from a standard PURL, e.g.

Including obographs in your code:

The library is split into two modules - obographs-core which contains the model and code for reading and writing JSON and YAML graphs. The obographs-owlapi requires obographs-core and includes the owlapi and code for converting OWL to obographs.

Maven

<dependency>
    <groupId>org.geneontology.obographs</groupId>
    <artifactId>obographs-core</artifactId>
    <version>${project.version}</version>
</dependency>

Gradle

compile 'org.geneontology.obographs:obographs-core:${project.version}'

Installing a development snapshot

When developing against an unreleased snapshot version of the API, you can use Maven to install it in your local m2 repository:

mvn clean install

Developing obographs

If you find that your IDE cannot load any of the concrete classes e.g. Graph or GraphDocument you should check that your IDE has Annotation Processing enabled. Obographs uses the immutables library which requires annotation processing in the IDE. See https://immutables.github.io/apt.html for how to enable this in your IDE. You might need to restart your IDE or re-import the maven projects for this to work fully. It is not required for projects using obographs as a pre-built library.

Releasing to Central

mvn clean deploy -P release

Javascript

See bbop-graph

obographs's People

Contributors

althonos avatar balhoff avatar cmpich avatar cmungall avatar julesjacobsen avatar nathandunn avatar yy20716 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

obographs's Issues

Over-enthusiastic escaping of double quotes?

There are a few cases where it looks like double quote characters embedded in a term's definition are being escaped with a bit too much zeal.

For example, this from the definition of GO_0061232:

...is a specialized epithelial cell that contains \"feet\" that interdigitate with the \"feet\" of other glomerular epithelial cells in the mesonephros.

Other terms that appear to be similarly affected are:
GO:0061256
GO:0061257
GO:0061258

Gather feedback from developers of bioinformatics tools that consume ontology files

Many developers have written redundant obo parsers in various languages. Due to the details of the spec it is difficult to write a robust parser. OWL is very difficult to work with, many people use the wrong level of abstraction, e.g. XML parsing. obographs is designed with these developers in mind, see https://douroucouli.wordpress.com/2016/10/04/a-developer-friendly-json-exchange-format-for-ontologies/

This meta-ticket is for gathering feedback from knowledgeable bioinformatics coders who have struggled with obo or owl issues

Python

Perl

R

  • goTools
  • [MANY MORE, TO BE FILLED IN]

Ruby

Java

C

EquivalentNodesSet.representativeNodeId() is not used in production code

EquivalentNodesSet.representativeNodeId() is used in the unit test, but never set in the production code. The EquivalentNodesSet is only used in FromOwl:

// all classes in equivalence axiom are named
// TODO: merge pairwise assertions into a clique
EquivalentNodesSet enset =
new EquivalentNodesSet.Builder().nodeIds(xClassIds).build();

Is this method really required as a setter? Seems it could be a default method to just choose the first if never set.

running mvn package doesn't work on Java 11

Building obographs requires Java 8 at the moment. Running on Java 11 gives this error:

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-javadoc-plugin:2.9:jar (attach-javadocs) on project obographs: MavenReportException: Error while creating archive: Unable to resolve artifact:groupId = 'ch.raffael.pegdown-doclet'
[ERROR] artifactId = 'pegdown-doclet'
[ERROR] version = '1.1': Missing:
[ERROR] ----------
[ERROR] 1) com.sun.tools:tools:jar:11.0.4
[ERROR] 
[ERROR]   Try downloading the file manually from the project website.
[ERROR] 
[ERROR]   Then, install it using the command: 
[ERROR]       mvn install:install-file -DgroupId=com.sun.tools -DartifactId=tools -Dversion=11.0.4 -Dpackaging=jar -Dfile=/path/to/file
[ERROR] 
[ERROR]   Alternatively, if you host your own repository you can deploy the file there: 
[ERROR]       mvn deploy:deploy-file -DgroupId=com.sun.tools -DartifactId=tools -Dversion=11.0.4 -Dpackaging=jar -Dfile=/path/to/file -Durl=[url] -DrepositoryId=[id]
[ERROR] 
[ERROR]   Path to dependency: 
[ERROR]   	1) ch.raffael.pegdown-doclet:pegdown-doclet:jar:1.1
[ERROR]   	2) com.sun.tools:tools:jar:11.0.4
[ERROR] 
[ERROR] ----------
[ERROR] 1 required artifact is missing.
[ERROR] 
[ERROR] for artifact: 
[ERROR]   ch.raffael.pegdown-doclet:pegdown-doclet:jar:1.1
[ERROR] 
[ERROR] from the specified remote repositories:
[ERROR]   sonatype-nexus-snapshots (https://oss.sonatype.org/content/repositories/snapshots, releases=false, snapshots=true),
[ERROR]   central (https://repo.maven.apache.org/maven2, releases=true, snapshots=false)
[ERROR] -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException

Define behavior when punning is used

Current spec assumes no punning. E.g. the type field of a node is one of INDIVIDUAL, CLASS, ...

Should we explicitly forbid punning or explicitly say behavior is undefined?

Casting error needs to be more informative

This is unfortunately impossible to debug other than with binary search (deleting portions of the ontology until it works..)

uk.ac.manchester.cs.owl.owlapi.OWLAnonymousIndividualImpl cannot be cast to org.semanticweb.owlapi.model.OWLLiteral
java.lang.ClassCastException: uk.ac.manchester.cs.owl.owlapi.OWLAnonymousIndividualImpl cannot be cast to org.semanticweb.owlapi.model.OWLLiteral
	at org.geneontology.obographs.owlapi.FromOwl.generateGraph(FromOwl.java:477)
	at org.geneontology.obographs.owlapi.FromOwl.generateGraphDocument(FromOwl.java:110)
	at org.obolibrary.robot.IOHelper.saveOntologyFile(IOHelper.java:1214)
	at org.obolibrary.robot.IOHelper.saveOntology(IOHelper.java:638)
	at org.obolibrary.robot.CommandLineHelper.maybeSaveOutput(CommandLineHelper.java:663)
	at org.obolibrary.robot.ConvertCommand.execute(ConvertCommand.java:141)
	at org.obolibrary.robot.CommandManager.executeCommand(CommandManager.java:248)
	at org.obolibrary.robot.CommandManager.execute(CommandManager.java:192)
	at org.obolibrary.robot.CommandManager.main(CommandManager.java:139)
	at org.obolibrary.robot.CommandLineInterface.main(CommandLineInterface.java:56)

Check round-trip JSON <-> Object model is possible

Attempting to read a JSON ontology with Jackson like so:

ObjectMapper objectMapper = new ObjectMapper();
GraphDocument graphDocument = objectMapper.readValue(Files.newInputStream(Paths.get("src/test/resources/hp.json")), GraphDocument.class);

results in:

com.fasterxml.jackson.databind.JsonMappingException: Can not construct instance of org.geneontology.obographs.model.GraphDocument: no suitable constructor found, can not deserialize from Object value (missing default constructor or creator, or perhaps need to add/enable type information?)
 at [Source: sun.nio.ch.ChannelInputStream@471a9022; line: 2, column: 3]

This is likely due to model missing '@JsonDeserialize(builder = Node.Builder.class)' annotations.

Missing property nodes for preds in go basic

There appear to be some pred definitions in go basic that don't have corresponding property nodes. I believe there were also some val values that were missing property nodes as well that Seth found - IAO_0000277 I think? That could be wrong though.

go-basic was downloaded from obo foundry a few hours ago.

~/SCIENCE/ontology/GO$ ipython
Python 3.6.8 (default, Jan 14 2019, 11:02:34) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.4.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import json                                                             

In [2]: gobasic = json.loads(open('go-basic.json').read())                      

In [3]: g = gobasic['graphs'][0]                                                

In [4]: prop_map = {n['id']: n['lbl'] for n in g['nodes'] if n['type'] == 'PROPE
   ...: RTY'}                                                                   

In [5]: len(prop_map)                                                           
Out[5]: 24

In [6]: # checked all edge preds against prop_map, no issues. No edges have meta
   ...:                                                                         

In [7]: def recurse_obj(obj, preds): 
   ...:     for field in obj: 
   ...:         if field == 'pred': 
   ...:             preds.add(obj['pred']) 
   ...:         elif isinstance(obj[field], list): 
   ...:             for o in obj[field]: 
   ...:                 # no lists of lists in obograph schema 
   ...:                 if isinstance(o, dict): 
   ...:                     recurse_obj(o, preds) 
   ...:         elif isinstance(obj[field], dict): 
   ...:             recurse_obj(obj[field], preds) 
   ...:                                                                         

In [8]: preds = set()                                                           

In [9]: for o in g['nodes']: 
   ...:     recurse_obj(o, preds) 
   ...:                                                                         

In [10]: for p in preds: 
    ...:     print(p, ' -> ', prop_map.get(p)) 
    ...:                                                                        
http://www.geneontology.org/formats/oboInOwl#hasScope  ->  has_scope
http://www.geneontology.org/formats/oboInOwl#hasAlternativeId  ->  has_alternative_id
hasBroadSynonym  ->  None
http://www.geneontology.org/formats/oboInOwl#hasOBONamespace  ->  has_obo_namespace
http://purl.obolibrary.org/obo/IAO_0000231  ->  None
http://www.geneontology.org/formats/oboInOwl#consider  ->  consider
http://www.geneontology.org/formats/oboInOwl#shorthand  ->  shorthand
http://www.geneontology.org/formats/oboInOwl#is_metadata_tag  ->  None
http://www.geneontology.org/formats/oboInOwl#is_class_level  ->  None
http://purl.obolibrary.org/obo/IAO_0100001  ->  term replaced by
hasRelatedSynonym  ->  None
hasExactSynonym  ->  None
hasNarrowSynonym  ->  None

In [11]: # IAO_231 is missing, which seems concerning. Not sure if the missing i
    ...: s_* are as concerning    

Multiple inheritance not parsed to JSON

I converted the EFO ontology from OWL to JSON using obographs but I came across an issue when dealing with the representation of multiple inheritance in this ontology. For example, the class http://www.ebi.ac.uk/efo/EFO_0005117 has its parents represented as:

<owl:Class rdf:about="http://www.ebi.ac.uk/efo/EFO_0005117">
        <rdfs:subClassOf>
            <owl:Class>
                <owl:intersectionOf rdf:parseType="Collection">
                    <rdf:Description rdf:about="http://www.ebi.ac.uk/efo/EFO_0001444"/>
                    <owl:Restriction>
                        <owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/IAO_0000136"/>
                        <owl:someValuesFrom rdf:resource="http://purl.obolibrary.org/obo/GO_0006954"/>
                    </owl:Restriction>
                    <owl:Restriction>
                        <owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/IAO_0000136"/>
                        <owl:someValuesFrom rdf:resource="http://www.ebi.ac.uk/efo/EFO_0003786"/>
                    </owl:Restriction>
                </owl:intersectionOf>
            </owl:Class>
        </rdfs:subClassOf>
...

However, in the JSON file that obographs produces I can't find reference to these edges anywhere. Is the type of inheritance represented in some way that I'm not finding or is it not being parsed?

Commentary on obographs proposal (@kltm)

@cmungall I'm unsure of the format that you want, so I'm making an actionable super/epic (bleh); I can split these up as well if that works for you.

"type": [
             {
               "type": "class",
               "id": "CL:0000312",
               "label": "keratinocyte"
             },
  • As well, we allow for compound and nested types; you seem to just have a choice
    of three?
  • I was wondering about the caps enums: "enum" : [ "CLASS", "INDIVIDUAL", "PROPERTY" ]. We use lowercase.
  • We have annotations on entities like:
"annotations": [
          {
            "key": "contributor",
            "value": "http://orcid.org/0000-0001-7476-6306"
           },
  • I think ideally we could work out programmatic reusability, even if the exact specs were a little different. For example, we have quirks where we allow nested graphs and the like, which is not relevant for your use case.

ncbitaxon_import.owl missing some classes

The following classes are missing from ncbitaxon_import.owl:

NCBITaxon_147554
NCBITaxon_27896
NCBITaxon_28009
NCBITaxon_3312
NCBITaxon_3378
NCBITaxon_4895
NCBITaxon_4896
NCBITaxon_4930
NCBITaxon_4932
NCBITaxon_6237
NCBITaxon_8782

They appear to be classes that are referenced only in never_in_taxon constraints.

Also, there doesn't seem to be any definitions of any of the NCBITaxon_Union classes anywhere.

Improve documentation on IDs/URIs

https://github.com/geneontology/obographs/blob/master/README-owlmapping.md
"Every identifier in an OBO Graph is interpreted as a URI. At the JSON/YAML level, these may be compacted URIs. The expansion to a full URI is specified via a JSON-LD context object. Context objects may be at the level of the graph document or an individual graph."

This is a bit opaque. We should have a clear normative spec along the lines of:

  1. Every object is identified by an IRI http://www.ietf.org/rfc/rfc3987.txt
  2. IRIs can be written as CURIES. https://www.w3.org/TR/curie/
  3. Prefixes for CURIEs must be explicitly declared in the JSON using a JSON-LD context https://www.w3.org/TR/json-ld/

TBD: JSON-LD contexts can be recursive which complicates things for consumers. Should this be limited to a non-recursive subset?

We should also have an informative section for people not familiar with W3C specs, giving some background on the two ways of identifying things.

Implement ID/CURIE<->URI expansion and contraction

We want to be able to expand and contract URIs using either a simple yaml curie map or a JSON-LD file. The latter provides certain advantages such as being a standard, and also allow recursive application of rules.

Doing this using a yaml map should be very easy. However, we end up implementing this code multiple times, it might be worth abstracting this into a separate java library. This could live in the org.prefixcommons space. Having a common library might be useful in other ways - e.g. centralization of validation etc.

These are some places where this is already implemented:

Add some way to distinguish object properties from annotation properties

There is no obvious way to distinguish between OPs and APs in the current version of OBO graphs.

 {'id': 'http://purl.obolibrary.org/obo/fbbt#IUPAC_NAME',
  'lbl': 'IUPAC NAME',
  'type': 'PROPERTY'},
 {'id': 'http://purl.obolibrary.org/obo/RO_0002004',
  'lbl': 'tracheates',
  'meta': {'definition': {'val': 'The relationships that holds between a trachea or tracheole and an anatomical structure that is contained in (and so provides an oxygen supply to).',
    'xrefs': ['FBC:DOS']},
   'xrefs': [{'val': 'RO:0002004'}]},
  'type': 'PROPERTY'}

It would be very useful if there was. If it is too late to change the allowable contents of the 'type' slot, could this instead be done by adding an new field?

Support for individuals

Is it legal in obographs to include individuals in the 'existential graph'? Could just have edge translation to OWL depend on owl entity type of object and subject:

C:C
SubClassOf to name class
"subj" : "Class A",
"pred" : "Is_a",
"obj" : "Class B"

SubClassOf to simple anonymous class (A R some B)
"subj" : "Class A",
"pred" : "Is_a",
"obj" : "Class B"

I:C
Type statement to named class:
"subj" : "Individual A",
"pred" : "InstanceOf",
"obj" : "Class B"

Type statement, simple anonymous class (R some B):
"subj" : "Individual A",
"pred" : "part_of",
"obj" : "Class B"

I:I
FACT:
"subj" : "Individual A",
"pred" : "part_of",
"obj" : "Individual B"

C:I
Simple reification pattern (A R Value B)
"subj" : "Class A",
"pred" : "has_exemplar",
"obj" : "Individual B"

(Edited)

Help needed: oboInOwl spec vs obographs

I need a recommendation of what to do when people physically include the oboinowl ontology in their ontology; can I safely recommend to remove it in all cases?

The reason I am posting this here is that running

robot -vvv convert -I http://www.geneontology.org/formats/oboInOwl -f json -o out.json

causes #49

And I dont understand whether this is intentional (i.e. oboinowl ontology is somehow broken) or whether this is a limitation of obographs.. Thanks for the help! @dosumis @cmungall

GCIs in obographs

Any suggestions for how one might encode simple GCIs (e.g. R1 some C1 SubClassOf C2; R1 some C1 SubClassOf R2 some C2) in OBOgraphs?

Or is this a step too far?

Consider using immutables for core classes

http://immutables.github.io/

import org.immutables.value.Value;
// Define abstract value type
@Value.Immutable
public interface ValueObject {
  String name();
  List<Integer> counts();
  Optional<String> description();
}

// Use generated immutable implementation
ValueObject valueObject =
    ImmutableValueObject.builder()
        .name("My value")
        .addCounts(1)
        .addCounts(2)
        .build();

// Or you can configure different @Value.Style
@Value.Immutable
abstract class AbstractItem {
  abstract String getName();
  abstract Set<String> getTags();
  abstract Optional<String> getDescription();
}

// Use generated value object
Item namelessItem = Item.builder()
    .setName("Nameless")
    .addTags("important", "relevant")
    .setDescription("Description provided")
    .build();

Item namedValue = namelessItem.withName("Named");

Null pointer with Node.getType()

I am using phenol (https://github.com/monarch-initiative/phenol) to parse the example GO obo file in that repository. It seems to be picking up this as a Node:

synonymtypedef: syngo_official_label "label approved by the SynGO project"

This is leading to a null error, in the debugger, I can see that the "type" attribute of the corresponding node object is null.

obographs should distinguish between annotationProperty, objectProperty & dataProperty

Current specification for node types
has only:

                "type" : {
                  "type" : "string",
                  "enum" : [ "CLASS", "INDIVIDUAL", "PROPERTY" ]
                },

I guess that changing this enum now to distinguish OP, AP & DP is not possible as it would be a pretty major breaking change. Would it be possible instead to add a property_type key with enum:

["annotationProperty", "objectProperty", "dataProperty"]

?

(I'm sure I requested this several years ago, but not sure where - can't find ticket)

Failed tests: PrefixHelperTest.testPrefixHandling

This test is failing using Java "1.8.0_241".
See inline comments below.

$ mvn package

Failed tests:
  PrefixHelperTest.testPrefixHandling:78 Check JSON-LD expected:<{[
  "@context" : {
    "foo" : "http://example.com#"
  }]     **<-- this line is different syntactically from..**.
}> but was:<{[
  "@context" : {
    "foo" : "http://example.com#"
] }     **<-- this line. It should be "}]" instead of "]}"**
}>

Tests run: 23, Failures: 1, Errors: 0, Skipped: 0

serializing to JSON leads to OutOfMemory exception

Hi,
I have a 5.5 GB OWL file "umls.owl". When I run

robot convert --input umls.owl --output umls.json

on a machine with 392 GB of RAM (and with '-Xmx390G' passed to java), I get the following exception:

Exception in thread "main" java.lang.OutOfMemoryError
        at java.base/java.lang.AbstractStringBuilder.hugeCapacity(AbstractStringBuilder.java:188)
        at java.base/java.lang.AbstractStringBuilder.newCapacity(AbstractStringBuilder.java:180)
        at java.base/java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:147)
        at java.base/java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:660)
        at java.base/java.lang.StringBuilder.append(StringBuilder.java:195)
        at com.fasterxml.jackson.core.util.TextBuffer.contentsAsString(TextBuffer.java:399)
        at com.fasterxml.jackson.core.io.SegmentedStringWriter.getAndClear(SegmentedStringWriter.java:83)
        at com.fasterxml.jackson.databind.ObjectWriter.writeValueAsString(ObjectWriter.java:1037)
        at org.geneontology.obographs.io.OgJsonGenerator.prettyJsonString(OgJsonGenerator.java:18)
        at org.geneontology.obographs.io.OgJsonGenerator.render(OgJsonGenerator.java:11)
        at org.obolibrary.robot.IOHelper.saveOntologyFile(IOHelper.java:1122)
        at org.obolibrary.robot.IOHelper.saveOntology(IOHelper.java:583)
        at org.obolibrary.robot.IOHelper.saveOntology(IOHelper.java:524)
        at org.obolibrary.robot.ConvertCommand.execute(ConvertCommand.java:167)
        at org.obolibrary.robot.CommandManager.executeCommand(CommandManager.java:248)
        at org.obolibrary.robot.CommandManager.execute(CommandManager.java:192)
        at org.obolibrary.robot.CommandManager.main(CommandManager.java:139)
        at org.obolibrary.robot.CommandLineInterface.main(CommandLineInterface.java:55)

I wonder if the problem is that OgJsonGenerator.prettyJsonString is calling mapper.writerWithDefaultPrettyPrinter which may be trying to return a massive String object that is bigger than Java can handle (maximum size of 2147483647 bytes, I believe). Is there a way to circumvent this issue so robot can write out larger JSON files using the obographs package?

Term comments not present in go-plus.json

As well as the issue (reported elsewhere) of the "is_obsolete" flag not being present in the JSON representation, I've also spotted that term comments are missing too.

References associated with taxon constraints are missing from go-plus.json

References that are associated with taxon constraints, and which are visible in go-plus.obo, are not finding their way into go-plus.json

A couple of examples from go-plus.obo:

[Term]
id: GO:0045271
name: respiratory chain complex I
...
relationship: never_in_taxon NCBITaxon:4896 {id="GOTAX:0000525", source="PMID:21597881"} ! Schizosaccharomyces pombe
relationship: never_in_taxon NCBITaxon:4932 {id="GOTAX:0000524", source="PMID:21597881"} ! Saccharomyces cerevisiae

[Term]
id: GO:0019819
name: P1 peroxisome
...
relationship: only_in_taxon NCBITaxon:4952 {id="GOTAX:0000526", source="PMID:10629216", source="PMID:14504266"} ! Yarrowia lipolytica

Add a guide for command line tool hackers

One feature of obo is the ease of doing quick command line hacks, e.g.

While not to be encouraged in production pipelines, these are certainly useful for quick ad-hoc operations.

With obographs being json or yaml, it's much easier to write short robust programs that do the equivalent. Nevertheless sometimes it's handy to do a quick ad-hoc series of pipes on the command line. This ticket is for gather ideas on ways to do this. @kltm @dosumis @mcourtot any ideas? Seth, what is your favorite command line json query tool?

Inconsistency between go-plus.json and go-plus.owl/obo

In go-plus.json GO:0080039 is shown as deprecated, while is go-plus.owl/obo it still appears as a valid, live term.
{ "id" : "http://purl.obolibrary.org/obo/GO_0080039", "meta" : { "basicPropertyValues" : [ { "pred" : "http://purl.obolibrary.org/obo/IAO_0000231", "val" : "http://purl.obolibrary.org/obo/IAO_0000227" }, { "pred" : "http://purl.obolibrary.org/obo/IAO_0100001", "val" : "http://purl.obolibrary.org/obo/GO_0016762" } ], "deprecated" : true }, "type" : "CLASS" }

Improve semantics of synonym types and subset defs

Hi! This is related to my current work on fastobo-graphs to provide OBO JSON support in native Rust and Python.

Currently, the synonym type case is handled the following way in ROBOT: a new synonym type is declared as a new node in the OBO graph, and that node does not have a type, which is the only case of a missing node type I found in OBO graphs. Same goes for subset definitions in header frames.

I propose that:

  1. A new SYNONYMTYPE node type is added to the specification
  2. A new SUBSET node type is added to the spectification
  3. The node type is made a mandatory field on all nodes.

Consider unabbreviated field names for node labels and edge fields (lbl, sub, obj)

Moved discussion from #15 from @cmpich

Consider lbl->label, sub->subject, obj->object, pred->relation/predicate

Pros:

  • correspond to terms used in rdf/rdfs vocabulary
  • greater clarity

Cons:

  • breaks compatibility with js code that accepts bbop-graphs, and with software that produces bbop-graphs (e.g. scigraph)
  • increased size of payload

The compatibility is a larger concern. If payload is a concern there are better ways to decrease size of payload for simple bbop-graphs and in particular edge lists:

  • compression (bigger gains for larger subgraphs?)
  • use of protobuf, bson etc instead of json
  • a more compact graph representation

Consider the owlsim internal representation: each node is assigned an integer, sets of edges indexed by predicate, where each edge represented as an integer pair. This reduces a go bbop graph from 11m to 3.6m (the majority of which is labels, the graph itself is encoded in <1m).

This compact representation would be something negotiated between client and server, with the default being the human-readable but more voluminous form.

Property node missed for http://xmlns.com/foaf/0.1/depicts

This OWL file has the property: " http://xmlns.com/foaf/0.1/depicts"
http://purl.obolibrary.org/obo/fbbt/vfb/vfb_ext.owl

But it appears to be missing from the JSON rolled by owltools:

complete node set:

{'id': 'http://purl.obolibrary.org/obo/OBI_0000312',
  'lbl': 'is_specified_output_of',
  'type': 'PROPERTY'},
 {'id': 'http://purl.obolibrary.org/obo/CARO_0030002',
  'lbl': 'expression pattern',
  'type': 'CLASS'},
 {'id': 'http://purl.obolibrary.org/obo/RO_0002350',
  'lbl': 'member_of',
  'type': 'PROPERTY'},
 {'id': 'http://purl.obolibrary.org/obo/fbbt/vfb/VFBext_0000001',
  'lbl': 'has_channel',
  'type': 'PROPERTY'},
 {'id': 'http://purl.obolibrary.org/obo/RO_0002351',
  'lbl': 'has_member',
  'type': 'PROPERTY'},
 {'id': 'http://purl.obolibrary.org/obo/C888C3DB-AEFA-447F-BD4C-858DFE33DBE7',
  'lbl': 'has_exemplar',
  'type': 'PROPERTY'},
 {'id': 'http://purl.obolibrary.org/obo/fbbt/vfb/VFBext_0000002',
  'lbl': 'has_background_channel',
  'type': 'PROPERTY'},
 {'id': 'http://purl.obolibrary.org/obo/fbbt/vfb/VFBext_0000003',
  'lbl': 'has_signal_channel',
  'type': 'PROPERTY'},
 {'id': 'http://purl.obolibrary.org/obo/fbbt/vfb/VFBext_0000014',
  'lbl': 'channel',
  'type': 'CLASS'},
 {'id': 'http://purl.obolibrary.org/obo/c099d9d6-4ef3-11e3-9da7-b1ad5291e0b0',
  'lbl': 'exemplar_of',
  'type': 'PROPERTY'},
 {'id': 'http://purl.obolibrary.org/obo/fbbt/vfb/VFBext_0000005',
  'lbl': 'source_data_link',
  'type': 'PROPERTY'},
 {'id': 'http://purl.obolibrary.org/obo/fbbt/vfb/VFBext_0000006',
  'lbl': 'multi channel image',
  'type': 'CLASS'},
 {'id': 'http://purl.obolibrary.org/obo/fbbt/vfb/VFB_10000005',
  'lbl': 'cluster',
  'type': 'CLASS'}]

obographs 2 OWL conversion

Conversion from obo-graph to OWL would be very useful e.g. for a pure Python implementation of DOS-DPs (limited to suitable simple design patterns), or a simple, OWL-API independent OWL writer for the VFB KB. Any prospects for this?

FromOwl.generateGraph class cast exception

From @dosumis , migrated from ontodev/robot#260

I'd like to generate a JSON version of http://purl.obolibrary.org/obo/geno.owl (there doesn't appear to be a release with this conversion). JSON conversion works fine for the various OBO ontologies I've tried but geno.owl =>

java.lang.ClassCastException: uk.ac.manchester.cs.owl.owlapi.OWLAnonymousIndividualImpl cannot be cast to org.semanticweb.owlapi.model.OWLLiteral
at org.geneontology.obographs.owlapi.FromOwl.generateGraph(FromOwl.java:477)
at org.geneontology.obographs.owlapi.FromOwl.generateGraphDocument(FromOwl.java:110)
at org.obolibrary.robot.IOHelper.saveOntology(IOHelper.java:441)
at org.obolibrary.robot.IOHelper.saveOntology(IOHelper.java:414)
at org.obolibrary.robot.ConvertCommand.execute(ConvertCommand.java:130)
at org.obolibrary.robot.CommandManager.executeCommand(CommandManager.java:219)
at org.obolibrary.robot.CommandManager.execute(CommandManager.java:161)
at org.obolibrary.robot.CommandManager.main(CommandManager.java:121)
at org.obolibrary.robot.CommandLineInterface.main(CommandLineInterface.java:53)

About getting nodes in obographs

Suppose that we have the following owl snippet.

    <owl:Class rdf:resource="http://purl.obolibrary.org/obo/MONDO_0005017">
        <rdfs:label rdf:datatype="http://www.w3.org/2001/XMLSchema#string">d6</rdfs:label>
    </owl:Class>

    <owl:Class rdf:about="http://purl.obolibrary.org/obo/MONDO_0007648">
        <rdfs:subClassOf rdf:resource="http://purl.obolibrary.org/obo/MONDO_0005017"/>
        <rdfs:label rdf:datatype="http://www.w3.org/2001/XMLSchema#string">hereditary diffuse gastric cancer</rdfs:label>
    </owl:Class>

When I parse the above snippet using obograph and request the list of nodes from the obograph object, the list I got contains Subject node only, i.e. in this case MONDO_0007648 only without MONDO_0005017. I would like to know whether this is an intended behavior or I misunderstood something. Is this because of subClassOf? I am using the following codes to retrive node objects.

    OWLOntologyManager m = OWLManager.createOWLOntologyManager();
    OWLOntology ontology = m.loadOntologyFromOntologyDocument(file); // file here is a File object for an owl file.
    FromOwl fromOwl = new FromOwl();
    GraphDocument gd = fromOwl.generateGraphDocument(ontology);
    Graph obograph = gd.getGraphs().get(0);
    List<Node> gNodes = obograph.getNodes();

Any suggestions or tips would be appreciated.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.