Comments (11)
Daniel, it was great catching up with you. Would you also please let us know if you come across a database with aggregated queries or views. Your help with our project is highly appreciated.
Thank you!
from hetionet.
Here is a query you can run on the Hetionet online browser at https://neo4j.het.io/browser/.
The query investigates which biological processes the drug Topiramate may effect. It's looking for paths where Topirmate binds a Gene which participates in a biological process. Each path receives a different weight, called a PDP or path-degree product, based on its specificity. The BINDS_CbG
has pubmed_ids
metadata. Therefore you could assign each path to zero or more source studies based on these pubmed_ids. Here's the query.
// Search for CbGpBP paths starting with Topiramate
MATCH path = (n0:Compound)-[e1:BINDS_CbG]-(n1)-[:PARTICIPATES_GpBP]-(n2:BiologicalProcess)
WHERE n0.name = 'Topiramate'
// Implement the DWPC to adjust for node degree along paths
WITH
[
size((n0)-[:BINDS_CbG]-()),
size(()-[:BINDS_CbG]-(n1)),
size((n1)-[:PARTICIPATES_GpBP]-()),
size(()-[:PARTICIPATES_GpBP]-(n2))
] AS degrees, e1, path, n1, n2
RETURN
// Return the GO Process ID and name
n1.name AS gene_symbol,
n2.name AS biological_process,
e1.pubmed_ids AS pubmed_ids,
// Compute the path-degree product
reduce(pdp = 1.0, d in degrees| pdp * d ^ -0.4) AS PDP,
// Count the number of genes in the GO Process
size((n2)-[:PARTICIPATES_GpBP]-()) AS n_genes
ORDER BY PDP DESC
What we usually do is to aggregate all PDPs for the same target node (in this case biological process). We sum PDPs to compute DWPCs (degree-weighted path counts).
Here's a query for the top five DWPCs:
// Search for CbGpBP paths starting with Topiramate
MATCH path = (n0:Compound)-[e1:BINDS_CbG]-(n1)-[:PARTICIPATES_GpBP]-(n2:BiologicalProcess)
WHERE n0.name = 'Topiramate'
// Implement the DWPC to adjust for node degree along paths
WITH
[
size((n0)-[:BINDS_CbG]-()),
size(()-[:BINDS_CbG]-(n1)),
size((n1)-[:PARTICIPATES_GpBP]-()),
size(()-[:PARTICIPATES_GpBP]-(n2))
] AS degrees, e1, path, n2
WITH
// Return the GO Process ID and name
n2.identifier AS go_id,
n2.name AS go_name,
count(path) AS PC,
collect(e1.pubmed_ids) AS pubmed_ids,
// Compute the DWPC
sum(reduce(pdp = 1.0, d in degrees| pdp * d ^ -0.4)) AS DWPC,
// Count the number of genes in the GO Process
size((n2)-[:PARTICIPATES_GpBP]-()) AS n_genes
WHERE n_genes >= 5 AND PC >= 2
RETURN
go_id, go_name, pubmed_ids, PC, DWPC, n_genes
ORDER BY DWPC DESC
LIMIT 5
If you want to see the paths that get aggregated to compute DWPCs for these top five biological process you can run the following query:
// Search for CbGpBP paths starting with Topiramate
MATCH path = (n0:Compound)-[e1:BINDS_CbG]-(n1)-[:PARTICIPATES_GpBP]-(n2:BiologicalProcess)
WHERE n0.name = 'Topiramate'
// Implement the DWPC to adjust for node degree along paths
WITH
[
size((n0)-[:BINDS_CbG]-()),
size(()-[:BINDS_CbG]-(n1)),
size((n1)-[:PARTICIPATES_GpBP]-()),
size(()-[:PARTICIPATES_GpBP]-(n2))
] AS degrees, e1, path, n2
WITH
n2.name AS biological_process,
count(path) AS PC,
// Compute the DWPC
sum(reduce(pdp = 1.0, d in degrees| pdp * d ^ -0.4)) AS DWPC,
// Collects paths
collect(path) as paths,
// Count the number of genes in the GO Process
size((n2)-[:PARTICIPATES_GpBP]-()) AS n_genes
WHERE n_genes >= 5 AND PC >= 2
RETURN
paths
ORDER BY DWPC DESC
LIMIT 5
The result looks like:
Each path is based on different pubmed_ids for its BINDS_CbG
. You could then assign each source the weight of the PDP divided by the total number of pubmed_ids for that path.
from hetionet.
Oh, yes, that is our recently paper which is still under review. I attached the paper here just in case you want to read it early: vldb-2019-conference.pdf.
Thanks!
from hetionet.
Thanks very much. We really appreciate it
from hetionet.
Hello, Dr. Daniel Himmelstein! Sorry for disturbing again.. Thanks again for your information provided last time. It is very helpful!
Currently we are using the data and query that you provided, which will be very important for our experiments. Our goal is to make our work more convincing. So is it possible for you to provide more aggregate user queries against this database OR to provide any other databases that you know or you are working with where aggregated queries exist?
Thanks in advance for your help!
from hetionet.
@thuwuyinjun to make sure I'm spending my time providing actually useful examples, can you be more specific about what you exactly you would like. What characteristics would you like these aggregation queries to have?
from hetionet.
Thanks for the quick response! I think the aggregate queries that we want should be very similar to the one that you provided to us last time, which should have one important characteristic, i.e. the query result should be some curated data or linked to some citation information like DOIs.
Our goal is simply want more aggregate queries so that we can convince readers of the applicability of our techniques. I think the information provided by you will be very helpful for it.
Thanks in advance for your help.
from hetionet.
Aggregated GWAS assocaitions
This file named gene-associations.tsv
contains gene-disease associations from GWAS. GWAS measures disease associations with SNPs. This file aggregates SNP associations to genes. Some disease-gene associations have multiple GWAS studies reporting significant p-values, which are given in the pubmed_ids
column. Perhaps it would be nice to weight the contribution of each study by its p-value... you'd have to play with the source code to output that information. The algorithm to aggregate these associations is rather complex.
from hetionet.
Cool, thanks very much! I will figure it out.
from hetionet.
You could do something similar with Drug-binds-Protein relationships from BindingDB. See this dataset named bindings-drugbank-collapsed.tsv
.
from hetionet.
I got a Google Scholar notification about "ProvCite: Provenance-based Data Citation", which I assume is related to this, but it appears to have been crawled from the academic social network that shall not be named and is no longer available there.
Assuming this will become available from elsewhere some point in the future? Looking forward to reading and possibly presenting it at the Greene Lab journal club.
from hetionet.
Related Issues (20)
- Local files HOT 2
- Multiple Match Queries Not Working HOT 2
- Question About Hetionet's Dictionary HOT 3
- How to add new disease and anatomy nodes HOT 2
- Providing a dump version of Hetionet HOT 11
- http://neo4j.het.io/browser/ time out HOT 4
- Neo4J instance down (?) HOT 7
- Updated TSV version HOT 6
- graph.db database offline in neo4j HOT 3
- neo4j website down HOT 6
- Hetionet Browser is down HOT 4
- Mapping to original databases HOT 2
- Cannot map non-existing file HOT 5
- Do any relations imply another relation? HOT 1
- Connectivity Search Automated Query Question HOT 8
- Docker compatibility question HOT 4
- Question on metrics HOT 1
- What does it mean if something up regulates a disease in this context? HOT 3
- Speeding up data import to Neo4j v5 and CSV format data HOT 2
- Inquiry about metapaths from 2017 Paper "Systematic Integration of Biomedical Knowledge Prioritizes Drugs for Repurposing" HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hetionet.