Coder Social home page Coder Social logo

biothings / biothings_explorer Goto Github PK

View Code? Open in Web Editor NEW
8.0 8.0 9.0 9.98 MB

TRAPI service for BioThings Explorer

Home Page: https://explorer.biothings.io

License: Apache License 2.0

JavaScript 12.32% Dockerfile 1.56% HTML 19.59% Shell 3.01% Smarty 1.47% Vue 54.45% CSS 1.03% TypeScript 6.57%
ncats-translator biothings-explorer

biothings_explorer's Introduction

BioThings Explorer TRAPI API

Test with workspace codecov ci-cd

Introduction

This Repository serves as the development workspace for the TRAPI implementation of BioThings Explorer (BTE). BTE is an engine for autonomously querying a distributed knowledge graph. The distributed knowledge graph is made up of biomedical APIs that have been annotated with semantically-precise descriptions of their inputs and outputs in the SmartAPI registry. This project is primarily funded by the NCATS Translator project. There is also an older python version of BioThings Explorer that is now deprecated.

An older version of the meta knowledge graph that is consumed by BTE is in this figure (which, although older, gives a nice conceptual visualization of API interoperability):

BTE Meta-KG

What's TRAPI?

TRAPI stands for Translator Reasoner API. It is a standard defined for APIs developed within NCATS Biomedical Translator project to facilitate information exchange between resources. BTE exports results via TRAPI to maintain interoperability with other Translator tools. BTE can also consume knowledge resources that expose the TRAPI interface, but it also can consume APIs that have been annotated in the SmartAPI registry using the x-bte extension to the OpenAPI specification.

Trapi API Implementation

Below is a process diagram depicting BTE's internal workflow when processing a query.

sequenceDiagram
autonumber
participant I as index.js - query()
participant QG as query_graph.js
participant BEQ as batch_edge_query.js
participant Q2A as qedge2apiedge.js
participant R as query_results.js
participant C as call-apis module

note over I, R: query_graph_handler module

I->>QG: processQueryGraph()
QG->>QG: Process TRAPI Query Graph Object into <br/> internal qEdge and qXEdge representation
note right of QG: qEdge - Edge in TRAPI query graph <br/> qXEdge - Internal UpdatedExeEdge representation <br/> of a qEdge to be executed
QG->>I: return qXEdges

I->>I: Inferred Mode: create <br/> templated queries

loop Executing with Edge Manager
I->>I: while there are unexecuted qXEdges, <br/> get next qXEdge

I->>BEQ: BatchEdgeQueryHandler()
BEQ->>BEQ: NodesUpdateHandler(): get equivalent IDs
BEQ->>BEQ: cacheHandler(): fetch cached records

alt if there are uncached qXEdges
BEQ->>Q2A: QEdge2APIEdgeHandler()
Q2A->>Q2A: convert qXEdges into API calls by using <br/> metaKG to get metaEdges for qXEdge
Q2A->>BEQ: return metaXEdges
note right of BEQ: metaEdge - An edge in the metaKG <br/> metaXEdge - A metaEdge pair with a qXEdge
BEQ->>C: query()
C->>C: make API calls in batches <br/> and merge results
C->>BEQ: return records from APIs
end

BEQ->>BEQ: cacheHandler(): cache result records
note right of BEQ: record - A single unit of transformed <br/> data from a sub-query response

BEQ->>I: return records

I->>I: Store records/update edge manager
I->>I: Mark Edge as Executed
end

I->>R: trapiResultsAssembler
R->>R: assemble and convert records into <br/> final return results
R->>I: put results in bteGraph
note left of R: result - 1 item of the array in the <br/> TRAPI response (message.results)
I->>I: bteGraph: prune not fully connected <br/> results from graph
Loading

Try it Out!

Live TRAPI Instance

We maintain a live instance of this application at https://api.bte.ncats.io/ that can be used for testing. Query Examples can be found here.

Local installations

BTE can be used locally using Docker, or by installing the workspace for further tinkering. See the Installation documentation.

Usage

Using BTE can be as simple or complex as you'd like. See the Usage documentation to get started using BTE!

biothings_explorer's People

Contributors

andrewsu avatar ariutta avatar colleenxu avatar dependabot[bot] avatar ericz1803 avatar kannabhargav avatar kevinxin90 avatar marcodarko avatar mnarayan1 avatar newgene avatar pahmadi8740 avatar rjawesome avatar sengineer0 avatar tokebe avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

biothings_explorer's Issues

Investigate mongodb as a persistant data storage

It's a good feature to store user request persistently, so users can come back and look up their results just using the answer id we assign to them.

We could also hook this up with the web interface. Given an answer id, the UI can fetch results directly from mongodb and display the results as graph/table for exploration.

This query is not working

{
	"message": {
		"query_graph": {
			"nodes": {
				"n0": {
					"id": "MONDO:0005132",
					"category":"biolink:Disease"
				},
				"n1": {
					"category": "biolink:ChemicalSubstance"
				},
				"n2": {
					"id": "UMLS:C0032961",
					"category":"biolink:Disease"
				}
			},
			"edges": {
				"e01": {
					"subject": "n1",
					"object": "n0",
					"predicate":"biolink:treats"
				},
				"e02": {
					"subject": "n1",
					"object": "n2",
					"predicate": "biolink:contraindicated_for"
				}
			}
		}
	}
}

Add GO qualifiers to mygene.info record in SmartAPI

BTE is not correctly interpreting mygene.info output on GO annotations because it is ignoring the qualifiers. I believe the fix involves a modification of the mygene.info SmartAPI record (and hopefully TRAPI has a way of expressing qualifiers). Example below...

I issued this query to get BiologicalProcesses related to the gene VAMP2:

{
	"message": {
		"query_graph": {
			"nodes": {
				"n0": {
					"id": "NCBIGENE:6844",
					"category":"biolink:Gene"
				},
				"n1": {
					"category": "biolink:BiologicalProcess"
                }
			},
			"edges": {
				"e01": {
					"subject": "n0",
                    "object": "n1"
                }
			}
		}
	}
}

The following edge linking VAMP2 to neutrophil degranulation (GO:0043312) is returned in the output:

                "NCBIGENE:6844-GO:0043312-MyGene.info API-NCBI Gene": {
                    "predicate": "biolink:participates_in",
                    "subject": "NCBIGENE:6844",
                    "object": "GO:0043312",
                    "attributes": [
                        {
                            "name": "provided_by",
                            "value": "NCBI Gene",
                            "type": "biolink:provided_by"
                        },
                        {
                            "name": "api",
                            "value": "MyGene.info API",
                            "type": "bts:api"
                        },
                        {
                            "name": "evidence",
                            "value": "IMP",
                            "type": "bts:evidence"
                        },
                        {
                            "name": "publications",
                            "value": [
                                "PMID:16677249"
                            ],
                            "type": "biolink:publications"
                        }
                    ]
                },

The original content from http://mygene.info/v3/gene/6844?fields=go looks like this:

{
   "evidence": "IMP",
   "gocategory": "BP",
   "id": "GO:0043312",
   "pubmed": 16677249,
   "qualifier": "NOT",
   "term": "neutrophil degranulation"
},

Critically, the NOT qualifier in the mygene.info record is not being shown in the TRAPI BTE output, which completely reverses the interpretation.

Refactor load meta-kg

Screen Shot 2021-02-07 at 9 37 26 PM

As shown above, the meta-kg sometimes could take up to 20s to load. This is causing serious performance issue on BTE API end. Need to refactor the smartapi-kg package so that it can take a list of specs sending to it as a file instead of making real time API query.

Need also to implement cron job on TRAPI end to fetch SmartAPI specs periodically from SmartAPI API.

Query with unexpected exceptions

{
	"message": {
		"query_graph": {
			"nodes": {
				"n0": {
					"id": "WIKIPATHWAYS:Pathway:WP195",
					"category": "biolink:Pathway"
				},
				"n1": {
					"category": "biolink:Gene"
				},
				"n2": {
					"category": "biolink:ChemicalSubstance"
				}
			},
			"edges": {
				"e01": {
					"subject": "n0",
					"object": "n1"
				},
				"e02": {
					"subject": "n1",
					"object": "n2"
				}
			}
		}
	}
}

slice

Investigate Timeout Error

{
  "message": {
    "query_graph": {
      "nodes": {
        "n00": {
          "id": "name:Imatinib",
          "category": "biolink:ChemicalSubstance"
        },
        "n01": {
          "category": "biolink:Disease"
        },
        "n02": {
            "category": "biolink:Gene"
        }
      },
      "edges": {
        "e00": {
          "subject": "n00",
          "object": "n01",
          "predicate":"biolink:treats"
        },
        "e01": {
          "subject": "n01",
          "object": "n02",
          "predicate":"biolink:caused_by"
        }
      }
    }
  }
}

Above query results in a 504 timeout error in current BTE app. Need to investigate how that happens and how to set timeout on either express.js end or nginx end.

fix wrong url in CHANGELOG

right now, the commit url and compare url in CHANGELOG are wrong. Need to fix that as well as the .versionrc.json file which helps automatically generate them.

Handle explain type of query

{
    "message": {
        "query_graph": {
            "nodes": {
                "a": {
                    "category": "biolink:Disease",
                    "id": "MESH:D015464"
                },
                "b": {
                    "category": "biolink:ChemicalSubstance",
                    "id": "CHEBI:45783"
                },
                "c": {
                    "category": "biolink:Gene"
                }
            },
            "edges": {
                "ac": {
                    "subject": "a",
                    "object": "c"
                },
                "bc": {
                    "subject": "c",
                    "object": "b"
                }
            }
        }
    },
    "knowledge_graph": {
        "nodes": [],
        "edges": []
    },
    "results": []
}

Create regression testing infrastructure

We would like to create a regression testing framework to quantitatively assess BTE's performance. As a gold standard, we can use the orphan drug indication dataset mentioned in NCATSTranslator/Relay#123 or the mechanistic paths from https://sulab.github.io/DrugMechDB/. For each of those gold standards, we should create a TRAPI query (examples), send it to BTE using a small library of plausible metapaths focused on drug repurposing, and then assess whether BTE was able to retrieve the right drug among the results. (Later we can also assess where that drug ranked among all potential drugs retrieved.) We would want to execute this test on a regular basis (weekly?), and then have a simple web page where results can be viewed/browsed.

tagging @ariutta and @AlexanderPico

Need a nodejs package handling BioLink model

The package needs to be separate from current TRAPI code repo.

It should perform:

  1. Given a specific node type (e.g. biolink:GeneOrGeneProduct), return all descendants/ancestors of that node type.
  2. Given a specific node type, return all available ID Prefixes defined in BioLink model
  3. Given a specific ID Prefix, return all node types which can have this ID Prefix.
  4. Given a specific predicate, return all its descendants/ancestors predicates

Improve logging module

Current logging only provides how a TRAPI query is parsed and how SmartAPI kg is used. Should include additional information such as:

  1. what's the query made to API
  2. how many response do we get from each API call.
  3. How many response do we get after merging the results from different KPs.

Above need support from other bte related nodejs packages.

Include additional node attributes in TRAPI Knowledge Graph

  1. Chemical:
    • chembl_max_phase
    • chembl_molecule_type
    • chembl_drug_category
    • drugbank_class
    • drugbank_groups
    • drugbank_kingdom
    • drugbank_superclass
    • contraindications
    • mesh_pharmacology_class
    • fda_epc_pharmacology_class
  2. Gene:
    • interpro
    • type_of_gene
  3. Pathway:
    • number_of_participants
  4. BiologicalProcess:
    • number_of_participants
  5. CellularComponent:
    • number_of_participants
  6. MolecularActivity:
    • number_of_participants

Node related info should not appear in edge data in TRAPI response

Current behavior in edge response:

"attributes": [
                        {
                            "name": "provided_by",
                            "value": "Text Mining KP",
                            "type": "biolink:provided_by"
                        },
                        {
                            "name": "api",
                            "value": "Text Mining Targeted Association API",
                            "type": "bts:api"
                        },
                        {
                            "name": "CHEBI",
                            "value": "CHEBI:32630",
                            "type": "bts:CHEBI"
                        },
                        {
                            "name": "object_spans",
                            "value": [
                                "start: 91, end: 96",
                                "start: 62, end: 67"
                            ],
                            "type": "bts:object_spans"
                        },
                        {
                            "name": "relation_spans",
                            "value": [
                                "",
                                ""
                            ],
                            "type": "bts:relation_spans"
                        },
                        {
                            "name": "score",
                            "value": [
                                "0.9994468",
                                "0.97133327"
                            ],
                            "type": "bts:score"
                        },
                        {
                            "name": "sentence",
                            "value": [
                                "Dietary restriction of leucine for at least three days could result in the inactivation of Hsf-1, leading to a reduction in Hsp70 synthesis.",
                                "However, in cells that were leucine starved for 3 and 4 days, Hsf-1 activity and Hsp70 synthesis level was dramatically decreased."
                            ],
                            "type": "bts:sentence"
                        },
                        {
                            "name": "subject_spans",
                            "value": [
                                "start: 23, end: 30",
                                "start: 28, end: 35"
                            ],
                            "type": "bts:subject_spans"
                        },
                        {
                            "name": "publications",
                            "value": [
                                "PMID:31397439",
                                "PMID:31397439"
                            ],
                            "type": "biolink:publications"
                        }
                    ]

Information such as CHEBI does not belong here. Needs to be removed.

Query not working

{
  "message": {
    "query_graph": {
      "nodes": {
        "n00": {
          "id": "MONDO:0002715",
          "category": "biolink:Disease"
        },
        "n01": {
          "category": "biolink:ChemicalSubstance"
        },
        "n02": {
          "category": "biolink:Gene"
        }
      },
      "edges": {
        "e00": {
          "predicate": "biolink:correlated_with",
          "subject": "n00",
          "object": "n01"
        },
        "e01": {
          "predicate": "biolink:related_to",
          "subject": "n01",
          "object": "n02"
        }
      }
    }
  }
}

Error:

{
    "error": "TypeError: Cannot convert undefined or null to object"
}

Accessing LINCS data portal API thru BTE

Summary: I think BTE is making an error in setting up the API request for LINCS data portal API. We are required to provide the input ID as a curie, so I set it as a ChemicalSubstance with the id "LINCS:LSM-1023" (which is imatinib). The logs show that the LINCS API query is then (see the bold for the error):

    {
      "timestamp": "2021-03-24T04:11:46.587Z",
      "level": "DEBUG",
      "message": "call-apis: Succesfully made the following query: {\"url\":\"http://lincsportal.ccs.miami.edu/dcic/api/drugindication\",\**"params\":{\"id\":\"LINCS:LSM-1023\"}**,\"method\":\"get\",\"timeout\":50000}",
      "code": null
    },

Looking at the smartapi page for LINCS data portal, the id field should not have a prefix...it should only have the id "LSM-1023".


The situation: I tried to query the LINCS data portal API thru BTE's /v1/smartapi/{smartapi_id}/query endpoint.

The smartapi_id is 9ee398a738916a98b612068cc022454f, the request body is:

{
  "message": {
    "query_graph": {
      "edges": {
        "e00": {
          "object": "n01",
          "subject": "n00"
        }
      },
      "nodes": {
        "n00": {
          "category": "biolink:ChemicalSubstance",
          "id": "LINCS:LSM-1023"
        },
        "n01": {
          "category": "biolink:Disease"
        }
      }
    }
  }
}

It returns no hits.


However, if I query the LINCS Data portal endpoint directly with the id as "LSM-1023", I get multiple results like:

{"documents": [
{
"lsm_id":"LSM-1023",
"efo_id":"Orphanet:44890",
"efo_term":"GASTROINTESTINAL STROMAL TUMOR",
"max_fda_phase_for_ind":"4",
"mesh_heading":"GASTROINTESTINAL STROMAL TUMORS",
"mesh_id":"D046152"
}
,
{
"lsm_id":"LSM-1023",
"efo_id":"EFO:0000691",
"efo_term":"SARCOMA",
"max_fda_phase_for_ind":"3",
"mesh_heading":"SARCOMA",
"mesh_id":"D012509"
}

Note: I'm not sure if the BTE Python client has an issue with this API too, since it accepts only LINCS IDs and I'm not sure if BTE will ever end up querying it.

BTE doesn't handle predicate as a list

According to TRAPI: predicate is supported as list or as a string

However, current BTE implementation doesn't support list.

The following query fails:

{
  "message": {
    "query_graph": {
      "edges": {
        "e00": {
          "object": "n01",
          "subject": "n00",
          "predicate": ["biolink:physically_interacts_with"]
        }
      },
      "nodes": {
        "n00": {
          "category": "biolink:ChemicalSubstance",
          "id": "DRUGBANK:DB00188"
        },
        "n01": {
          "category": "biolink:Gene"
        }
      }
    }
  }
}

The error message is:

{
    "error": "TypeError: this.predicate.startsWith is not a function"
}

Use NodeNormalizer to resolve QNodes with only id specified

Currently, BTE use BioThings APIs to resolve identifiers, which requires category (e.g. Gene, ChemicalSubstance) to be specified.

TRAPI standard does allow user to specify a query without category info.

So in order to support that, we should include NodeNormalizer as a fallback.

Support handling list as value for category

One ID might belong to multiple semantic types,
e.g. UMLS:C0008780 can be mapped as a Disease or a PhenotypicFeature

So when user provide the following query:

{
  "message": {
    "query_graph": {
      "edges": {
        "e00": {
          "object": "n01",
          "subject": "n00"
        }
      },
      "nodes": {
        "n00": {
          "category": ["biolink:Disease", "biolink:PhenotypicFeature"],
          "id": "UMLS:C0008780"
        },
        "n01": {
          "category": "biolink:Gene"
        }
      }
    }
  }
}

We should look for Genes which related to UMLS:C0008780 as a Disease or as a PhenotypicFeature.

Create a module to transform TRAPI Query Graph

  1. Expand node by its id, e.g. if user provides a MONDO ID as input, we will traverse MONDO hierarchy to get all its descendants.
  2. Expand node by its category, e.g. if user provides a NamedThing category, we will traverse BioLink class hierarchy to get all descendants of NamedThing class.
  3. Expand predicate, e.g. if user provides a related_to predicate, we will traverse BioLink predicate hierarchy to get all descendants of related_to predicate.

Query returns unexpected exceptions

{
    "message": {
        "query_graph": {
            "edges": {
                "e00": {
                    "subject": "n00",
                    "object": "n01",
                    "category": "biolink:correlated_with"
                }
            },
            "nodes": {
                "n00": {
                    "category": "biolink:ChemicalSubstance",
                    "id": "CAS:121999-58-4"
                },
                "n01": {
                    "category": "biolink:ChemicalSubstance"
                }
            }
        }
    }
}

According to Ryan,

This query returns


{
    "error": "TypeError: Cannot read property 'slice' of undefined"
}

Test with clinical risk KP fails because data source changes

FAIL test/integration/TRAPIv1.test.js (97.982 s)
โ— Testing endpoints โ€บ POST /v1/query with clinical risk kp query

expect(received).toHaveProperty(path)

Expected path: "MONDO:0005249"
Received path: []

Received value: {}

  69 |                 expect(response.body.message.knowledge_graph).toHaveProperty("nodes");
  70 |                 expect(response.body.message.knowledge_graph).toHaveProperty("edges");
> 71 |                 expect(response.body.message.knowledge_graph.nodes).toHaveProperty("MONDO:0005249")
     |                                                                     ^
  72 |             })
  73 |     })
  74 | 

  at __test__/integration/TRAPIv1.test.js:71:69
  at Object.<anonymous> (__test__/integration/TRAPIv1.test.js:60:9)

Use Singleton Design Pattern for BioLink reversal class

Currently, the BioLink reversal class (include file read) has to be initiated every time when processing predicates. Need to modify to adapt Singleton Design Pattern, so it's only initiated once to speed the program up.

Missing type for node attributes

Screen Shot 2021-02-01 at 10 07 13 AM

Type is a required field for TRAPI 1.0 standard. Currently, we have type for all edge attributes, but we don't have type for node attributes.

Query Fails

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "category": "biolink:Drug",
                    "id": "RXCUI:466423"
                },
                "n3": {
                    "category": "biolink:Disease"
                }
            },
            "edges": {
                "e03": {
                    "subject": "n0",
                    "object": "n3"
                }
            }
        }
    }
}

Error message:


{
    "error": "TypeError: Cannot read property 'id' of undefined"
}

how to query by UniProtKB CURIE?

The issue at NCATSTranslator/testing#10 reports that BTE does not return any results for the following query:

{
	"message": {
		"query_graph": {
			"nodes": {
				"n0": {
					"id": "UniProtKB:P52788",
					"category":"biolink:Gene"
				},
				"n1": {
					"category": "biolink:ChemicalSubstance"
                }
			},
			"edges": {
				"e01": {
					"subject": "n0",
                                        "object": "n1"
                                }
			}
		}
	}
}

If I convert UniProtKB:P52788 to NCBIGENE:6611 (based on http://mygene.info/v3/query?q=P52788&fields=entrezgene,uniprot), the query returns many results as expected. I tried adjusting the category for n0 to biolink:Protein and biolink:GenomicEntity, but those queries also return zero results. What is the proper way to form a BTE TRAPI query for a UniProtKB CURIE?

Add additional node attributes including nodeDegree

  1. How many unique source KG nodes does this KG node connects from.
  2. How many unique target KG nodes does this KG node connects to.
  3. How many unique edges (source-predicate-target) does this KG node connects from.
  4. How many unique edges (source-predicate-target) does this kG node connects to.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.