Coder Social home page Coder Social logo

Comments (11)

alawinia avatar alawinia commented on August 27, 2024 1

Daniel, it was great catching up with you. Would you also please let us know if you come across a database with aggregated queries or views. Your help with our project is highly appreciated.
Thank you!

from hetionet.

dhimmel avatar dhimmel commented on August 27, 2024 1

Here is a query you can run on the Hetionet online browser at https://neo4j.het.io/browser/.

The query investigates which biological processes the drug Topiramate may effect. It's looking for paths where Topirmate binds a Gene which participates in a biological process. Each path receives a different weight, called a PDP or path-degree product, based on its specificity. The BINDS_CbG has pubmed_ids metadata. Therefore you could assign each path to zero or more source studies based on these pubmed_ids. Here's the query.

// Search for CbGpBP paths starting with Topiramate
MATCH path = (n0:Compound)-[e1:BINDS_CbG]-(n1)-[:PARTICIPATES_GpBP]-(n2:BiologicalProcess)
WHERE n0.name = 'Topiramate'
// Implement the DWPC to adjust for node degree along paths
WITH
[
  size((n0)-[:BINDS_CbG]-()),
  size(()-[:BINDS_CbG]-(n1)),
  size((n1)-[:PARTICIPATES_GpBP]-()),
  size(()-[:PARTICIPATES_GpBP]-(n2))
] AS degrees, e1, path, n1, n2
RETURN
  // Return the GO Process ID and name
  n1.name AS gene_symbol,
  n2.name AS biological_process,
  e1.pubmed_ids AS pubmed_ids,
  // Compute the path-degree product
  reduce(pdp = 1.0, d in degrees| pdp * d ^ -0.4) AS PDP,
  // Count the number of genes in the GO Process
  size((n2)-[:PARTICIPATES_GpBP]-()) AS n_genes
ORDER BY PDP DESC

What we usually do is to aggregate all PDPs for the same target node (in this case biological process). We sum PDPs to compute DWPCs (degree-weighted path counts).

Here's a query for the top five DWPCs:

// Search for CbGpBP paths starting with Topiramate
MATCH path = (n0:Compound)-[e1:BINDS_CbG]-(n1)-[:PARTICIPATES_GpBP]-(n2:BiologicalProcess)
WHERE n0.name = 'Topiramate'
// Implement the DWPC to adjust for node degree along paths
WITH
[
  size((n0)-[:BINDS_CbG]-()),
  size(()-[:BINDS_CbG]-(n1)),
  size((n1)-[:PARTICIPATES_GpBP]-()),
  size(()-[:PARTICIPATES_GpBP]-(n2))
] AS degrees, e1, path, n2
WITH
  // Return the GO Process ID and name
  n2.identifier AS go_id,
  n2.name AS go_name,
  count(path) AS PC,
  collect(e1.pubmed_ids) AS pubmed_ids,
  // Compute the DWPC
  sum(reduce(pdp = 1.0, d in degrees| pdp * d ^ -0.4)) AS DWPC,
  // Count the number of genes in the GO Process
  size((n2)-[:PARTICIPATES_GpBP]-()) AS n_genes
  WHERE n_genes >= 5 AND PC >= 2
RETURN
  go_id, go_name, pubmed_ids, PC, DWPC, n_genes
ORDER BY DWPC DESC
LIMIT 5

If you want to see the paths that get aggregated to compute DWPCs for these top five biological process you can run the following query:

// Search for CbGpBP paths starting with Topiramate
MATCH path = (n0:Compound)-[e1:BINDS_CbG]-(n1)-[:PARTICIPATES_GpBP]-(n2:BiologicalProcess)
WHERE n0.name = 'Topiramate'
// Implement the DWPC to adjust for node degree along paths
WITH
[
  size((n0)-[:BINDS_CbG]-()),
  size(()-[:BINDS_CbG]-(n1)),
  size((n1)-[:PARTICIPATES_GpBP]-()),
  size(()-[:PARTICIPATES_GpBP]-(n2))
] AS degrees, e1, path, n2
WITH
  n2.name AS biological_process,
  count(path) AS PC,
  // Compute the DWPC
  sum(reduce(pdp = 1.0, d in degrees| pdp * d ^ -0.4)) AS DWPC,
  // Collects paths
  collect(path) as paths,
  // Count the number of genes in the GO Process
  size((n2)-[:PARTICIPATES_GpBP]-()) AS n_genes
  WHERE n_genes >= 5 AND PC >= 2
RETURN
  paths
ORDER BY DWPC DESC
LIMIT 5

The result looks like:

topiramate

Each path is based on different pubmed_ids for its BINDS_CbG. You could then assign each source the weight of the PDP divided by the total number of pubmed_ids for that path.

from hetionet.

wuyinjun-1993 avatar wuyinjun-1993 commented on August 27, 2024 1

Oh, yes, that is our recently paper which is still under review. I attached the paper here just in case you want to read it early: vldb-2019-conference.pdf.

Thanks!

from hetionet.

wuyinjun-1993 avatar wuyinjun-1993 commented on August 27, 2024

Thanks very much. We really appreciate it

from hetionet.

wuyinjun-1993 avatar wuyinjun-1993 commented on August 27, 2024

Hello, Dr. Daniel Himmelstein! Sorry for disturbing again.. Thanks again for your information provided last time. It is very helpful!

Currently we are using the data and query that you provided, which will be very important for our experiments. Our goal is to make our work more convincing. So is it possible for you to provide more aggregate user queries against this database OR to provide any other databases that you know or you are working with where aggregated queries exist?

Thanks in advance for your help!

from hetionet.

dhimmel avatar dhimmel commented on August 27, 2024

@thuwuyinjun to make sure I'm spending my time providing actually useful examples, can you be more specific about what you exactly you would like. What characteristics would you like these aggregation queries to have?

from hetionet.

wuyinjun-1993 avatar wuyinjun-1993 commented on August 27, 2024

Thanks for the quick response! I think the aggregate queries that we want should be very similar to the one that you provided to us last time, which should have one important characteristic, i.e. the query result should be some curated data or linked to some citation information like DOIs.

Our goal is simply want more aggregate queries so that we can convince readers of the applicability of our techniques. I think the information provided by you will be very helpful for it.

Thanks in advance for your help.

from hetionet.

dhimmel avatar dhimmel commented on August 27, 2024

Aggregated GWAS assocaitions

This file named gene-associations.tsv contains gene-disease associations from GWAS. GWAS measures disease associations with SNPs. This file aggregates SNP associations to genes. Some disease-gene associations have multiple GWAS studies reporting significant p-values, which are given in the pubmed_ids column. Perhaps it would be nice to weight the contribution of each study by its p-value... you'd have to play with the source code to output that information. The algorithm to aggregate these associations is rather complex.

from hetionet.

wuyinjun-1993 avatar wuyinjun-1993 commented on August 27, 2024

Cool, thanks very much! I will figure it out.

from hetionet.

dhimmel avatar dhimmel commented on August 27, 2024

You could do something similar with Drug-binds-Protein relationships from BindingDB. See this dataset named bindings-drugbank-collapsed.tsv.

from hetionet.

dhimmel avatar dhimmel commented on August 27, 2024

I got a Google Scholar notification about "ProvCite: Provenance-based Data Citation", which I assume is related to this, but it appears to have been crawled from the academic social network that shall not be named and is no longer available there.

Assuming this will become available from elsewhere some point in the future? Looking forward to reading and possibly presenting it at the Greene Lab journal club.

from hetionet.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.