biothings / biothings_explorer Goto Github PK

View Code? Open in Web Editor NEW

8.0 8.0 9.0 9.98 MB

TRAPI service for BioThings Explorer

Home Page: https://explorer.biothings.io

License: Apache License 2.0

JavaScript 12.32% Dockerfile 1.56% HTML 19.59% Shell 3.01% Smarty 1.47% Vue 54.45% CSS 1.03% TypeScript 6.57%

ncats-translator biothings-explorer

biothings_explorer's Introduction

BioThings Explorer TRAPI API

Introduction

This Repository serves as the development workspace for the TRAPI implementation of BioThings Explorer (BTE). BTE is an engine for autonomously querying a distributed knowledge graph. The distributed knowledge graph is made up of biomedical APIs that have been annotated with semantically-precise descriptions of their inputs and outputs in the SmartAPI registry. This project is primarily funded by the NCATS Translator project. There is also an older python version of BioThings Explorer that is now deprecated.

An older version of the meta knowledge graph that is consumed by BTE is in this figure (which, although older, gives a nice conceptual visualization of API interoperability):

What's TRAPI?

TRAPI stands for Translator Reasoner API. It is a standard defined for APIs developed within NCATS Biomedical Translator project to facilitate information exchange between resources. BTE exports results via TRAPI to maintain interoperability with other Translator tools. BTE can also consume knowledge resources that expose the TRAPI interface, but it also can consume APIs that have been annotated in the SmartAPI registry using the x-bte extension to the OpenAPI specification.

Trapi API Implementation

Below is a process diagram depicting BTE's internal workflow when processing a query.

sequenceDiagram
autonumber
participant I as index.js - query()
participant QG as query_graph.js
participant BEQ as batch_edge_query.js
participant Q2A as qedge2apiedge.js
participant R as query_results.js
participant C as call-apis module

note over I, R: query_graph_handler module

I->>QG: processQueryGraph()
QG->>QG: Process TRAPI Query Graph Object into <br/> internal qEdge and qXEdge representation
note right of QG: qEdge - Edge in TRAPI query graph <br/> qXEdge - Internal UpdatedExeEdge representation <br/> of a qEdge to be executed
QG->>I: return qXEdges

I->>I: Inferred Mode: create <br/> templated queries

loop Executing with Edge Manager
I->>I: while there are unexecuted qXEdges, <br/> get next qXEdge

I->>BEQ: BatchEdgeQueryHandler()
BEQ->>BEQ: NodesUpdateHandler(): get equivalent IDs
BEQ->>BEQ: cacheHandler(): fetch cached records

alt if there are uncached qXEdges
BEQ->>Q2A: QEdge2APIEdgeHandler()
Q2A->>Q2A: convert qXEdges into API calls by using <br/> metaKG to get metaEdges for qXEdge
Q2A->>BEQ: return metaXEdges
note right of BEQ: metaEdge - An edge in the metaKG <br/> metaXEdge - A metaEdge pair with a qXEdge
BEQ->>C: query()
C->>C: make API calls in batches <br/> and merge results
C->>BEQ: return records from APIs
end

BEQ->>BEQ: cacheHandler(): cache result records
note right of BEQ: record - A single unit of transformed <br/> data from a sub-query response

BEQ->>I: return records

I->>I: Store records/update edge manager
I->>I: Mark Edge as Executed
end

I->>R: trapiResultsAssembler
R->>R: assemble and convert records into <br/> final return results
R->>I: put results in bteGraph
note left of R: result - 1 item of the array in the <br/> TRAPI response (message.results)
I->>I: bteGraph: prune not fully connected <br/> results from graph

Try it Out!

Live TRAPI Instance

We maintain a live instance of this application at https://api.bte.ncats.io/ that can be used for testing. Query Examples can be found here.

Local installations

BTE can be used locally using Docker, or by installing the workspace for further tinkering. See the Installation documentation.

Usage

Using BTE can be as simple or complex as you'd like. See the Usage documentation to get started using BTE!

biothings_explorer's People

Contributors

Stargazers

Watchers

Forkers

kevinxin90 naveen584 ariutta ericz1803 newgene smartniz mnarayan1 pahmadi8740 andrewsu

biothings_explorer's Issues

Investigate mongodb as a persistant data storage

It's a good feature to store user request persistently, so users can come back and look up their results just using the answer id we assign to them.

We could also hook this up with the web interface. Given an answer id, the UI can fetch results directly from mongodb and display the results as graph/table for exploration.

Speed up the nodejs application

profiling: https://nodejs.org/en/docs/guides/simple-profiling/

This query is not working

{
	"message": {
		"query_graph": {
			"nodes": {
				"n0": {
					"id": "MONDO:0005132",
					"category":"biolink:Disease"
				},
				"n1": {
					"category": "biolink:ChemicalSubstance"
				},
				"n2": {
					"id": "UMLS:C0032961",
					"category":"biolink:Disease"
				}
			},
			"edges": {
				"e01": {
					"subject": "n1",
					"object": "n0",
					"predicate":"biolink:treats"
				},
				"e02": {
					"subject": "n1",
					"object": "n2",
					"predicate": "biolink:contraindicated_for"
				}
			}
		}
	}
}

Error: "TypeError: Promise.allSettled is not a function"

On initial installation in WSL, I got the following error when executing a test query: "TypeError: Promise.allSettled is not a function"

CI test should run on all branches

Knowledge Graph Edges should be grouped based on subject-predicate-object

Currently edges are grouped by (subject-object-api-source)

Add GO qualifiers to mygene.info record in SmartAPI

BTE is not correctly interpreting mygene.info output on GO annotations because it is ignoring the qualifiers. I believe the fix involves a modification of the mygene.info SmartAPI record (and hopefully TRAPI has a way of expressing qualifiers). Example below...

I issued this query to get BiologicalProcesses related to the gene VAMP2:

{
	"message": {
		"query_graph": {
			"nodes": {
				"n0": {
					"id": "NCBIGENE:6844",
					"category":"biolink:Gene"
				},
				"n1": {
					"category": "biolink:BiologicalProcess"
                }
			},
			"edges": {
				"e01": {
					"subject": "n0",
                    "object": "n1"
                }
			}
		}
	}
}

The following edge linking VAMP2 to neutrophil degranulation (GO:0043312) is returned in the output:

                "NCBIGENE:6844-GO:0043312-MyGene.info API-NCBI Gene": {
                    "predicate": "biolink:participates_in",
                    "subject": "NCBIGENE:6844",
                    "object": "GO:0043312",
                    "attributes": [
                        {
                            "name": "provided_by",
                            "value": "NCBI Gene",
                            "type": "biolink:provided_by"
                        },
                        {
                            "name": "api",
                            "value": "MyGene.info API",
                            "type": "bts:api"
                        },
                        {
                            "name": "evidence",
                            "value": "IMP",
                            "type": "bts:evidence"
                        },
                        {
                            "name": "publications",
                            "value": [
                                "PMID:16677249"
                            ],
                            "type": "biolink:publications"
                        }
                    ]
                },

The original content from http://mygene.info/v3/gene/6844?fields=go looks like this:

{
   "evidence": "IMP",
   "gocategory": "BP",
   "id": "GO:0043312",
   "pubmed": 16677249,
   "qualifier": "NOT",
   "term": "neutrophil degranulation"
},

Critically, the NOT qualifier in the mygene.info record is not being shown in the TRAPI BTE output, which completely reverses the interpretation.

Disable ID Resolution for Text Mining KPs

Individual SmartAPI TRAPI interface should enable id resolution by default.

If the SmartAPI is from text mining teams, disable the id resolution module.

use git stash to shelf any changes made on the prod/dev server so git pull wouldn't fail

Refactor load meta-kg

As shown above, the meta-kg sometimes could take up to 20s to load. This is causing serious performance issue on BTE API end. Need to refactor the smartapi-kg package so that it can take a list of specs sending to it as a file instead of making real time API query.

Need also to implement cron job on TRAPI end to fetch SmartAPI specs periodically from SmartAPI API.

Query with unexpected exceptions

{
	"message": {
		"query_graph": {
			"nodes": {
				"n0": {
					"id": "WIKIPATHWAYS:Pathway:WP195",
					"category": "biolink:Pathway"
				},
				"n1": {
					"category": "biolink:Gene"
				},
				"n2": {
					"category": "biolink:ChemicalSubstance"
				}
			},
			"edges": {
				"e01": {
					"subject": "n0",
					"object": "n1"
				},
				"e02": {
					"subject": "n1",
					"object": "n2"
				}
			}
		}
	}
}

slice

Deprecate TRAPI v0.9.2 support

Should have /query endpoint have the same implementation as /v1/query.

Probably should use regex when specifying routing. e.g. (v1)?/query

See expressjs routing mechanism: https://expressjs.com/en/guide/routing.html

Investigate Timeout Error

{
  "message": {
    "query_graph": {
      "nodes": {
        "n00": {
          "id": "name:Imatinib",
          "category": "biolink:ChemicalSubstance"
        },
        "n01": {
          "category": "biolink:Disease"
        },
        "n02": {
            "category": "biolink:Gene"
        }
      },
      "edges": {
        "e00": {
          "subject": "n00",
          "object": "n01",
          "predicate":"biolink:treats"
        },
        "e01": {
          "subject": "n01",
          "object": "n02",
          "predicate":"biolink:caused_by"
        }
      }
    }
  }
}

Above query results in a 504 timeout error in current BTE app. Need to investigate how that happens and how to set timeout on either express.js end or nginx end.

fix wrong url in CHANGELOG

right now, the commit url and compare url in CHANGELOG are wrong. Need to fix that as well as the .versionrc.json file which helps automatically generate them.

Handle explain type of query

{
    "message": {
        "query_graph": {
            "nodes": {
                "a": {
                    "category": "biolink:Disease",
                    "id": "MESH:D015464"
                },
                "b": {
                    "category": "biolink:ChemicalSubstance",
                    "id": "CHEBI:45783"
                },
                "c": {
                    "category": "biolink:Gene"
                }
            },
            "edges": {
                "ac": {
                    "subject": "a",
                    "object": "c"
                },
                "bc": {
                    "subject": "c",
                    "object": "b"
                }
            }
        }
    },
    "knowledge_graph": {
        "nodes": [],
        "edges": []
    },
    "results": []
}

Create regression testing infrastructure

We would like to create a regression testing framework to quantitatively assess BTE's performance. As a gold standard, we can use the orphan drug indication dataset mentioned in NCATSTranslator/Relay#123 or the mechanistic paths from https://sulab.github.io/DrugMechDB/. For each of those gold standards, we should create a TRAPI query (examples), send it to BTE using a small library of plausible metapaths focused on drug repurposing, and then assess whether BTE was able to retrieve the right drug among the results. (Later we can also assess where that drug ranked among all potential drugs retrieved.) We would want to execute this test on a regular basis (weekly?), and then have a simple web page where results can be viewed/browsed.

tagging @ariutta and @AlexanderPico

Need a nodejs package handling BioLink model

The package needs to be separate from current TRAPI code repo.

It should perform:

Given a specific node type (e.g. biolink:GeneOrGeneProduct), return all descendants/ancestors of that node type.
Given a specific node type, return all available ID Prefixes defined in BioLink model
Given a specific ID Prefix, return all node types which can have this ID Prefix.
Given a specific predicate, return all its descendants/ancestors predicates

Add feature to allow user to test local SmartAPI specs

/query endpoint should fetch SmartAPI Specs dynamically

Current /query endpoint use a static copy of SmartAPI Specs from smartapi-kg nodejs package.

It should dynamically query SmartAPI API for specs at run time.

Expand Query Graph node (without curie) based on BioLink model hierarchy

See details in this issue: NCATSTranslator/testing#12

Improve logging module

Current logging only provides how a TRAPI query is parsed and how SmartAPI kg is used. Should include additional information such as:

what's the query made to API
how many response do we get from each API call.
How many response do we get after merging the results from different KPs.

Above need support from other bte related nodejs packages.

Disease(s) Treated By Drug

see: NCATSTranslator/testing#20

Add support for symmetric biolink predicate

Include additional node attributes in TRAPI Knowledge Graph

Chemical:
- chembl_max_phase
- chembl_molecule_type
- chembl_drug_category
- drugbank_class
- drugbank_groups
- drugbank_kingdom
- drugbank_superclass
- contraindications
- mesh_pharmacology_class
- fda_epc_pharmacology_class
Gene:
- interpro
- type_of_gene
Pathway:
- number_of_participants
BiologicalProcess:
- number_of_participants
CellularComponent:
- number_of_participants
MolecularActivity:
- number_of_participants

Add support for reverse biolink predicate

Node related info should not appear in edge data in TRAPI response

Current behavior in edge response:

"attributes": [
                        {
                            "name": "provided_by",
                            "value": "Text Mining KP",
                            "type": "biolink:provided_by"
                        },
                        {
                            "name": "api",
                            "value": "Text Mining Targeted Association API",
                            "type": "bts:api"
                        },
                        {
                            "name": "CHEBI",
                            "value": "CHEBI:32630",
                            "type": "bts:CHEBI"
                        },
                        {
                            "name": "object_spans",
                            "value": [
                                "start: 91, end: 96",
                                "start: 62, end: 67"
                            ],
                            "type": "bts:object_spans"
                        },
                        {
                            "name": "relation_spans",
                            "value": [
                                "",
                                ""
                            ],
                            "type": "bts:relation_spans"
                        },
                        {
                            "name": "score",
                            "value": [
                                "0.9994468",
                                "0.97133327"
                            ],
                            "type": "bts:score"
                        },
                        {
                            "name": "sentence",
                            "value": [
                                "Dietary restriction of leucine for at least three days could result in the inactivation of Hsf-1, leading to a reduction in Hsp70 synthesis.",
                                "However, in cells that were leucine starved for 3 and 4 days, Hsf-1 activity and Hsp70 synthesis level was dramatically decreased."
                            ],
                            "type": "bts:sentence"
                        },
                        {
                            "name": "subject_spans",
                            "value": [
                                "start: 23, end: 30",
                                "start: 28, end: 35"
                            ],
                            "type": "bts:subject_spans"
                        },
                        {
                            "name": "publications",
                            "value": [
                                "PMID:31397439",
                                "PMID:31397439"
                            ],
                            "type": "biolink:publications"
                        }
                    ]

Information such as CHEBI does not belong here. Needs to be removed.

Query not working

{
  "message": {
    "query_graph": {
      "nodes": {
        "n00": {
          "id": "MONDO:0002715",
          "category": "biolink:Disease"
        },
        "n01": {
          "category": "biolink:ChemicalSubstance"
        },
        "n02": {
          "category": "biolink:Gene"
        }
      },
      "edges": {
        "e00": {
          "predicate": "biolink:correlated_with",
          "subject": "n00",
          "object": "n01"
        },
        "e01": {
          "predicate": "biolink:related_to",
          "subject": "n01",
          "object": "n02"
        }
      }
    }
  }
}

Error:

{
    "error": "TypeError: Cannot convert undefined or null to object"
}

Refactor KnowledgeGraph module

See if the spread operation to update kg object cause performance issue.

Accessing LINCS data portal API thru BTE

Summary: I think BTE is making an error in setting up the API request for LINCS data portal API. We are required to provide the input ID as a curie, so I set it as a ChemicalSubstance with the id "LINCS:LSM-1023" (which is imatinib). The logs show that the LINCS API query is then (see the bold for the error):

    {
      "timestamp": "2021-03-24T04:11:46.587Z",
      "level": "DEBUG",
      "message": "call-apis: Succesfully made the following query: {\"url\":\"http://lincsportal.ccs.miami.edu/dcic/api/drugindication\",\**"params\":{\"id\":\"LINCS:LSM-1023\"}**,\"method\":\"get\",\"timeout\":50000}",
      "code": null
    },

Looking at the smartapi page for LINCS data portal, the id field should not have a prefix...it should only have the id "LSM-1023".

The situation: I tried to query the LINCS data portal API thru BTE's /v1/smartapi/{smartapi_id}/query endpoint.

The smartapi_id is 9ee398a738916a98b612068cc022454f, the request body is:

{
  "message": {
    "query_graph": {
      "edges": {
        "e00": {
          "object": "n01",
          "subject": "n00"
        }
      },
      "nodes": {
        "n00": {
          "category": "biolink:ChemicalSubstance",
          "id": "LINCS:LSM-1023"
        },
        "n01": {
          "category": "biolink:Disease"
        }
      }
    }
  }
}

It returns no hits.

However, if I query the LINCS Data portal endpoint directly with the id as "LSM-1023", I get multiple results like:

{"documents": [
{
"lsm_id":"LSM-1023",
"efo_id":"Orphanet:44890",
"efo_term":"GASTROINTESTINAL STROMAL TUMOR",
"max_fda_phase_for_ind":"4",
"mesh_heading":"GASTROINTESTINAL STROMAL TUMORS",
"mesh_id":"D046152"
}
,
{
"lsm_id":"LSM-1023",
"efo_id":"EFO:0000691",
"efo_term":"SARCOMA",
"max_fda_phase_for_ind":"3",
"mesh_heading":"SARCOMA",
"mesh_id":"D012509"
}

Note: I'm not sure if the BTE Python client has an issue with this API too, since it accepts only LINCS IDs and I'm not sure if BTE will ever end up querying it.

Add test /v1/team/{team_name}/query endpoint

Current no test implemented for /v1/team/{team_name}/query endpoint. We need to implement tests to ensure it's working correctly.

BTE doesn't handle predicate as a list

According to TRAPI: predicate is supported as list or as a string

However, current BTE implementation doesn't support list.

The following query fails:

{
  "message": {
    "query_graph": {
      "edges": {
        "e00": {
          "object": "n01",
          "subject": "n00",
          "predicate": ["biolink:physically_interacts_with"]
        }
      },
      "nodes": {
        "n00": {
          "category": "biolink:ChemicalSubstance",
          "id": "DRUGBANK:DB00188"
        },
        "n01": {
          "category": "biolink:Gene"
        }
      }
    }
  }
}

The error message is:

{
    "error": "TypeError: this.predicate.startsWith is not a function"
}

Use NodeNormalizer to resolve QNodes with only id specified

Currently, BTE use BioThings APIs to resolve identifiers, which requires category (e.g. Gene, ChemicalSubstance) to be specified.

TRAPI standard does allow user to specify a query without category info.

So in order to support that, we should include NodeNormalizer as a fallback.

Investigate redis in memory database for cacheing query results

This is helpful to speed up nodejs app when there're multiple queries asking for the same edge.

Use redis docker image for easier deployment.

Use .env to store redis url/password info.

Support handling list as value for category

One ID might belong to multiple semantic types,
e.g. UMLS:C0008780 can be mapped as a Disease or a PhenotypicFeature

So when user provide the following query:

{
  "message": {
    "query_graph": {
      "edges": {
        "e00": {
          "object": "n01",
          "subject": "n00"
        }
      },
      "nodes": {
        "n00": {
          "category": ["biolink:Disease", "biolink:PhenotypicFeature"],
          "id": "UMLS:C0008780"
        },
        "n01": {
          "category": "biolink:Gene"
        }
      }
    }
  }
}

We should look for Genes which related to UMLS:C0008780 as a Disease or as a PhenotypicFeature.

Create a module to transform TRAPI Query Graph

Expand node by its id, e.g. if user provides a MONDO ID as input, we will traverse MONDO hierarchy to get all its descendants.
Expand node by its category, e.g. if user provides a NamedThing category, we will traverse BioLink class hierarchy to get all descendants of NamedThing class.
Expand predicate, e.g. if user provides a related_to predicate, we will traverse BioLink predicate hierarchy to get all descendants of related_to predicate.

/performance endpoint is showing path not found

Query returns unexpected exceptions

{
    "message": {
        "query_graph": {
            "edges": {
                "e00": {
                    "subject": "n00",
                    "object": "n01",
                    "category": "biolink:correlated_with"
                }
            },
            "nodes": {
                "n00": {
                    "category": "biolink:ChemicalSubstance",
                    "id": "CAS:121999-58-4"
                },
                "n01": {
                    "category": "biolink:ChemicalSubstance"
                }
            }
        }
    }
}

According to Ryan,

This query returns


{
    "error": "TypeError: Cannot read property 'slice' of undefined"
}

Add an optional parameter to export results as a csv table

Test with clinical risk KP fails because data source changes

FAIL test/integration/TRAPIv1.test.js (97.982 s)
● Testing endpoints › POST /v1/query with clinical risk kp query

expect(received).toHaveProperty(path)

Expected path: "MONDO:0005249"
Received path: []

Received value: {}

  69 |                 expect(response.body.message.knowledge_graph).toHaveProperty("nodes");
  70 |                 expect(response.body.message.knowledge_graph).toHaveProperty("edges");
> 71 |                 expect(response.body.message.knowledge_graph.nodes).toHaveProperty("MONDO:0005249")
     |                                                                     ^
  72 |             })
  73 |     })
  74 | 

  at __test__/integration/TRAPIv1.test.js:71:69
  at Object.<anonymous> (__test__/integration/TRAPIv1.test.js:60:9)

Use Singleton Design Pattern for BioLink reversal class

Currently, the BioLink reversal class (include file read) has to be initiated every time when processing predicates. Need to modify to adapt Singleton Design Pattern, so it's only initiated once to speed the program up.

hook up with new OOP designed id resolver module

Missing type for node attributes

Type is a required field for TRAPI 1.0 standard. Currently, we have type for all edge attributes, but we don't have type for node attributes.

Query Fails

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "category": "biolink:Drug",
                    "id": "RXCUI:466423"
                },
                "n3": {
                    "category": "biolink:Disease"
                }
            },
            "edges": {
                "e03": {
                    "subject": "n0",
                    "object": "n3"
                }
            }
        }
    }
}

Error message:


{
    "error": "TypeError: Cannot read property 'id' of undefined"
}

Add UserID, groupID environment variable in Docker Compose file.

The TRAPI service needs to access and modify ./log folder. Need to set the UID & GID to be the same as the UID & GID for ./log folder in our server.

Performance test should be run on test server on Github actions instead of dev/prod server

use scp to transfer to test results: https://github.com/appleboy/scp-action

set up ci to deploy to test server with a commit hash id

Set up Development branch and deploy to the dev.api.bte.ncats.io server

how to query by UniProtKB CURIE?

The issue at NCATSTranslator/testing#10 reports that BTE does not return any results for the following query:

{
	"message": {
		"query_graph": {
			"nodes": {
				"n0": {
					"id": "UniProtKB:P52788",
					"category":"biolink:Gene"
				},
				"n1": {
					"category": "biolink:ChemicalSubstance"
                }
			},
			"edges": {
				"e01": {
					"subject": "n0",
                                        "object": "n1"
                                }
			}
		}
	}
}

If I convert UniProtKB:P52788 to NCBIGENE:6611 (based on http://mygene.info/v3/query?q=P52788&fields=entrezgene,uniprot), the query returns many results as expected. I tried adjusting the category for n0 to biolink:Protein and biolink:GenomicEntity, but those queries also return zero results. What is the proper way to form a BTE TRAPI query for a UniProtKB CURIE?

Separate out the logics of TRAPI Query Graph Handling from this repo

Add additional node attributes including nodeDegree

How many unique source KG nodes does this KG node connects from.
How many unique target KG nodes does this KG node connects to.
How many unique edges (source-predicate-target) does this KG node connects from.
How many unique edges (source-predicate-target) does this kG node connects to.