Could you please provide some queries like the aggregated query to the path from targe

Here is a query you can run on the Hetionet online browser at <a href="https://neo4j.h

Aggregated GWAS assocaitions This file named <a href="https://gith

Are there any good aggregated queries on the database? about hetionet HOT 11 CLOSED

hetio commented on August 27, 2024 1

Are there any good aggregated queries on the database?

from hetionet.

Comments (11)

alawinia commented on August 27, 2024 1

Daniel, it was great catching up with you. Would you also please let us know if you come across a database with aggregated queries or views. Your help with our project is highly appreciated.
Thank you!

from hetionet.

dhimmel commented on August 27, 2024 1

Here is a query you can run on the Hetionet online browser at https://neo4j.het.io/browser/.

The query investigates which biological processes the drug Topiramate may effect. It's looking for paths where Topirmate binds a Gene which participates in a biological process. Each path receives a different weight, called a PDP or path-degree product, based on its specificity. The BINDS_CbG has pubmed_ids metadata. Therefore you could assign each path to zero or more source studies based on these pubmed_ids. Here's the query.

// Search for CbGpBP paths starting with Topiramate
MATCH path = (n0:Compound)-[e1:BINDS_CbG]-(n1)-[:PARTICIPATES_GpBP]-(n2:BiologicalProcess)
WHERE n0.name = 'Topiramate'
// Implement the DWPC to adjust for node degree along paths
WITH
[
  size((n0)-[:BINDS_CbG]-()),
  size(()-[:BINDS_CbG]-(n1)),
  size((n1)-[:PARTICIPATES_GpBP]-()),
  size(()-[:PARTICIPATES_GpBP]-(n2))
] AS degrees, e1, path, n1, n2
RETURN
  // Return the GO Process ID and name
  n1.name AS gene_symbol,
  n2.name AS biological_process,
  e1.pubmed_ids AS pubmed_ids,
  // Compute the path-degree product
  reduce(pdp = 1.0, d in degrees| pdp * d ^ -0.4) AS PDP,
  // Count the number of genes in the GO Process
  size((n2)-[:PARTICIPATES_GpBP]-()) AS n_genes
ORDER BY PDP DESC

What we usually do is to aggregate all PDPs for the same target node (in this case biological process). We sum PDPs to compute DWPCs (degree-weighted path counts).

Here's a query for the top five DWPCs:

// Search for CbGpBP paths starting with Topiramate
MATCH path = (n0:Compound)-[e1:BINDS_CbG]-(n1)-[:PARTICIPATES_GpBP]-(n2:BiologicalProcess)
WHERE n0.name = 'Topiramate'
// Implement the DWPC to adjust for node degree along paths
WITH
[
  size((n0)-[:BINDS_CbG]-()),
  size(()-[:BINDS_CbG]-(n1)),
  size((n1)-[:PARTICIPATES_GpBP]-()),
  size(()-[:PARTICIPATES_GpBP]-(n2))
] AS degrees, e1, path, n2
WITH
  // Return the GO Process ID and name
  n2.identifier AS go_id,
  n2.name AS go_name,
  count(path) AS PC,
  collect(e1.pubmed_ids) AS pubmed_ids,
  // Compute the DWPC
  sum(reduce(pdp = 1.0, d in degrees| pdp * d ^ -0.4)) AS DWPC,
  // Count the number of genes in the GO Process
  size((n2)-[:PARTICIPATES_GpBP]-()) AS n_genes
  WHERE n_genes >= 5 AND PC >= 2
RETURN
  go_id, go_name, pubmed_ids, PC, DWPC, n_genes
ORDER BY DWPC DESC
LIMIT 5

If you want to see the paths that get aggregated to compute DWPCs for these top five biological process you can run the following query:

// Search for CbGpBP paths starting with Topiramate
MATCH path = (n0:Compound)-[e1:BINDS_CbG]-(n1)-[:PARTICIPATES_GpBP]-(n2:BiologicalProcess)
WHERE n0.name = 'Topiramate'
// Implement the DWPC to adjust for node degree along paths
WITH
[
  size((n0)-[:BINDS_CbG]-()),
  size(()-[:BINDS_CbG]-(n1)),
  size((n1)-[:PARTICIPATES_GpBP]-()),
  size(()-[:PARTICIPATES_GpBP]-(n2))
] AS degrees, e1, path, n2
WITH
  n2.name AS biological_process,
  count(path) AS PC,
  // Compute the DWPC
  sum(reduce(pdp = 1.0, d in degrees| pdp * d ^ -0.4)) AS DWPC,
  // Collects paths
  collect(path) as paths,
  // Count the number of genes in the GO Process
  size((n2)-[:PARTICIPATES_GpBP]-()) AS n_genes
  WHERE n_genes >= 5 AND PC >= 2
RETURN
  paths
ORDER BY DWPC DESC
LIMIT 5

The result looks like:

Each path is based on different pubmed_ids for its BINDS_CbG. You could then assign each source the weight of the PDP divided by the total number of pubmed_ids for that path.

from hetionet.

wuyinjun-1993 commented on August 27, 2024 1

Oh, yes, that is our recently paper which is still under review. I attached the paper here just in case you want to read it early: vldb-2019-conference.pdf.

Thanks!

from hetionet.

wuyinjun-1993 commented on August 27, 2024

Thanks very much. We really appreciate it

from hetionet.

wuyinjun-1993 commented on August 27, 2024

Hello, Dr. Daniel Himmelstein! Sorry for disturbing again.. Thanks again for your information provided last time. It is very helpful!

Currently we are using the data and query that you provided, which will be very important for our experiments. Our goal is to make our work more convincing. So is it possible for you to provide more aggregate user queries against this database OR to provide any other databases that you know or you are working with where aggregated queries exist?

Thanks in advance for your help!

from hetionet.

dhimmel commented on August 27, 2024

@thuwuyinjun to make sure I'm spending my time providing actually useful examples, can you be more specific about what you exactly you would like. What characteristics would you like these aggregation queries to have?

from hetionet.

wuyinjun-1993 commented on August 27, 2024

Thanks for the quick response! I think the aggregate queries that we want should be very similar to the one that you provided to us last time, which should have one important characteristic, i.e. the query result should be some curated data or linked to some citation information like DOIs.

Our goal is simply want more aggregate queries so that we can convince readers of the applicability of our techniques. I think the information provided by you will be very helpful for it.

Thanks in advance for your help.

from hetionet.

dhimmel commented on August 27, 2024

Aggregated GWAS assocaitions

This file named gene-associations.tsv contains gene-disease associations from GWAS. GWAS measures disease associations with SNPs. This file aggregates SNP associations to genes. Some disease-gene associations have multiple GWAS studies reporting significant p-values, which are given in the pubmed_ids column. Perhaps it would be nice to weight the contribution of each study by its p-value... you'd have to play with the source code to output that information. The algorithm to aggregate these associations is rather complex.

from hetionet.

wuyinjun-1993 commented on August 27, 2024

Cool, thanks very much! I will figure it out.

from hetionet.

dhimmel commented on August 27, 2024

You could do something similar with Drug-binds-Protein relationships from BindingDB. See this dataset named bindings-drugbank-collapsed.tsv.

from hetionet.

dhimmel commented on August 27, 2024

I got a Google Scholar notification about "ProvCite: Provenance-based Data Citation", which I assume is related to this, but it appears to have been crawled from the academic social network that shall not be named and is no longer available there.

Assuming this will become available from elsewhere some point in the future? Looking forward to reading and possibly presenting it at the Greene Lab journal club.

from hetionet.

Are there any good aggregated queries on the database? about hetionet HOT 11 CLOSED

Comments (11)

Aggregated GWAS assocaitions

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent