Coder Social home page Coder Social logo

aplbrain / dotmotif Goto Github PK

View Code? Open in Web Editor NEW
80.0 11.0 9.0 858 KB

A performant, powerful query framework to search for network motifs

Home Page: https://bossdb.org/tools/dotmotif

License: Apache License 2.0

Python 100.00%
graph query-language motifs motif-discovery python connectomics connectome neo4j graph-database subgraph-isomorphism graph-matching networkx vf2 aplbrain bossdb dotmotif

dotmotif's Introduction

d o t m o t i f

Find graph motifs using intuitive notation

PyPI Codecov


DotMotif is a library that identifies subgraphs or motifs in a large graph. It looks like this:

# Look for all motifs of the form,

# Neuron A synapses on Neuron B:
A -> B
# ...and B inhibits C:
B -> C [type = "inhibitory"]

Examples

Notebook Description
Open In Colab Looking for motifs in the IARPA MICrONS Pinky100 Dataset
Open In Colab Motif search in a custom graph
Open In Colab Subgraph search in the Janelia Hemibrain dataset

Get Started

If you have DotMotif, a NetworkX graph, and a curious mind, you already have everything you need to start using DotMotif:

from dotmotif import Motif, GrandIsoExecutor

executor = GrandIsoExecutor(graph=my_networkx_graph)

triangle = Motif("""
A -> B
B -> C
C -> A
""")

results = executor.find(triangle)

Parameters

You can also pass optional parameters into the constructor for the dotmotif object. Those arguments are:

Argument Type, Default Behavior
ignore_direction bool: False Whether to disregard direction when generating the database query
limit int: None A limit (if any) to impose on the query results
enforce_inequality bool: False Whether to enforce inequality; in other words, whether two nodes should be permitted to be aliases for the same node. For example, in A->B->C; if A!=C, then set to True
exclude_automorphisms bool: False Whether to return only a single example for each detected automorphism. See more in the documentation

For more details on how to write a query, see Getting Started.


Citing

If this tool is helpful to your research, please consider citing it with:

# https://doi.org/10.1038/s41598-021-91025-5
@article{Matelsky_Motifs_2021, 
    title={{DotMotif: an open-source tool for connectome subgraph isomorphism search and graph queries}},
    volume={11}, 
    ISSN={2045-2322}, 
    url={http://dx.doi.org/10.1038/s41598-021-91025-5}, 
    DOI={10.1038/s41598-021-91025-5}, 
    number={1}, 
    journal={Scientific Reports}, 
    publisher={Springer Science and Business Media LLC}, 
    author={Matelsky, Jordan K. and Reilly, Elizabeth P. and Johnson, Erik C. and Stiso, Jennifer and Bassett, Danielle S. and Wester, Brock A. and Gray-Roncal, William},
    year={2021}, 
    month={Jun}
}

dotmotif's People

Contributors

dependabot[bot] avatar j6k4m8 avatar jakobtroidl avatar jtpdowns avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dotmotif's Issues

Add Impossible Constraints validator

We should be able to automatically catch things like this:

A.type = 4
A.type != 4

Right now, we'll catch them in certain instances, but not when constraints are inherited from automorphisms (see #118). Getting smarter about this will likely improve runtime considerably.

Macro attribute support

Separate from #16 and #19:

Macro support is a bit more complicated, because it means that "virtual" edges need to support attributes, which are hardened down to real edges when the macro is instantiated.

Remove named variables from negative edges

We don't need to assign a variable / memory-allocation to negative edges (i.e. A~>B in the Neo4jExecutor cypher generator. Doing so also introduces a spare "WHERE" clause that is unnecessary.

Move special imports to `extras_require`

You should be able to use DotMotif without installing neo4j- and docker-flavored dependencies.

Proposal:

# Only install core dotmotif for networkx and grandiso:
pip install dotmotif

# Install cypher extensions for neuPrint, and docker 
# extensions for creating a local neo4j database:
pip install 'dotmotif[neo4j]'

Non-edit pip install fails to import

Installing with pip using the -e (edit) flag allows the user to import dotmotif from another directory. However, installing with pip without the -e flag results in an error message, specifically: ModuleNotFoundError: No module named 'dotmotif.parsers'.

Error in motif calculation

I was running the notebook DotMotif-Search-in-Pinky100.ipynb and the following cell gives error.

motif = Motif("""
A -> B [spine_vol_um3 > 0.25]
B -> C
C -> A
""")

Output:

UnexpectedCharacters                      Traceback (most recent call last)
[<ipython-input-9-eaabd7ae9c73>](https://localhost:8080/#) in <cell line: 1>()
----> 1 motif = Motif("""
      2 A -> B [spine_vol_um3 > 0.25]
      3 B -> C
      4 C -> A
      5 """)

7 frames
[/usr/local/lib/python3.10/dist-packages/lark/parsers/xearley.py](https://localhost:8080/#) in scan(i, to_scan)
    116             if not next_set and not delayed_matches and not next_to_scan:
    117                 considered_rules = list(sorted(to_scan, key=lambda key: key.rule.origin.name))
--> 118                 raise UnexpectedCharacters(stream, i, text_line, text_column, {item.expect.name for item in to_scan},
    119                                            set(to_scan), state=frozenset(i.s for i in to_scan),
    120                                            considered_rules=considered_rules

UnexpectedCharacters: No terminal matches '_' in the current parser context, at line 2 col 14

A -> B [spine_vol_um3 > 0.25]
             ^
Expected one of: 
	* IN
	* OPERATOR
	* __ANON_5
	* __ANON_6
	* CONTAINS

Removing [spine_vol_um3 > 0.25] fixes the issue . Perhaps a recent update havent been tested for such filtering ?

Propagate constraints on formally-notated node symmetries

If I formally declare that two nodes are automorphic to one another, constraints from one should propagate to the other.

A -> B
B -> A

A === B

A.size = "big"

That is, B.size should also have the constraint of B.size = "big".

This should happen BEFORE constraint validation steps, since there may be collisions:

A -> B
B -> A

A === B

A.size = "big"
B.size = "small" # collision!

Neuprint Executor - Labeling Edges by ROI

Hi Jordan,

Do you see an easy way to assign ROI labels to edges in the neuprint executor? Let's say I want to query something like this:

A -> B [weight > 20, ROI == "CX"]
A -> B [weight > 30, ROI == "CRE(L)"] 

So basically, there are two things here—multigraphs, which you address already in the docs, and encoding edge ROIs. I wonder if that's rather a hard thing to do or not. The data should be there as neuprint-python fetch_synapse_connections returns something like this

    bodyId_pre  bodyId_post roi_pre roi_post  x_pre  y_pre  z_pre  x_post  y_post  z_post  confidence_pre  confidence_post
0    792368888    754547386  PED(R)   PED(R)  14013  27747  19307   13992   27720   19313           0.996         0.401035
1    792368888    612742248  PED(R)   PED(R)  14049  27681  19417   14044   27662   19408           0.921         0.881487
2    792368888   5901225361  PED(R)   PED(R)  14049  27681  19417   14055   27653   19420
...

According to this issue it looks like it's possible. My observation is that the physical location of a connection between two neurons is an important feature of a motif. Looking forward to hearing what you say.

EDIT: Maybe an indirect way to support multiple edges between two nodes is by grouping edge attributes. Does something like this seem plausible. You are doing smth similar in the multigraph docs already: A -> B [synapse_count > 2]. But what exactly is synapse_count?

A -> B [[weight >= 20, ROI == "CX"], [weight > 30, ROI == "CRE(L)"]]

Best, Jakob

Upgrade grandiso version to use limits and iterable

In grandiso v1.1.0 and above, there is an optional limit argument to the find_motifs call which short-circuits motif counting if a certain number of valid mappings are found.

Right now, NetworkX and GrandIso executors implement the dotmotif limit parameter by finding all motifs and then downselecting, which is super inefficient and lame. We could pretty substantially improve performance by supporting the GrandIso limit arg.

A notable challenge: We perform an additional downselect after running grandiso (to double-check attribute filters). So we may need to store a list of mappings temporarily in order to backfill the results list if candidate mappings are filtered out.

Node- and edge-attribute support in DSL

Proposed syntax concepts:

Nodes

Inline maplike:

Node1 { type="GABA", z<12 } -> Node2

Pros:

  • Succinct

Cons:

  • Possible duplication or conflicting attributes if map is included on multiple lines for the same node

Postfix where-like:

Node1 -> Node2 | Node1.type = "GABA", Node1.z < 12

Pros:

  • Succinct

Cons:

  • Possible duplication or conflicting attributes if attrs are included on multiple lines for the same node

Footnote constraints

Node1 -> Node2

Node1.type = "GABA"
Node1.z < 12

Pros:

  • Reduces possibility of conflicting constraints
  • Clear syntax; can be standalone in its own macro

Cons:

  • Linecount verbose
  • Decouples attributes from connectivity clauses

Edges

Inline maplike:

A ->{type: "excitatory", neurotransmitter: "ACh"} B

Pros:

  • Inline

Cons:

  • Reduces clarity of language

Postfix where-like:

A -> B | [type: "excitatory", neurotransmitter: "ACh"]

Pros:

  • Inline

Cons:

  • Reduces clarity of language

Infix maplike:

A -[type: "excitatory", neurotransmitter: "ACh"]> B

Pros:

  • Inline

Cons:

  • Reduces clarity of language

Performance on large graphs

Hello, I have a huge graph i want to calculate motifs from but for years i have been struggling to find a scalable library to calculate motifs. How does dotmotif scale ? Is there a Big O complexity analysis for motifs with participation and without participation ?

CSVImport performance improvements

In many cases, edges may be imported in parallel (though not always!). In these cases, it would be useful to have a (default-True) flag to parallelize graph construction.

Aliased edges and inter-edge comparisons

# Alias is of the form `as <name>`
A -> B as AB_edge
# other constraints go before alias:
C -> B [foo=bar] as CB_otheredge

# Comparison constraints:
AB_edge.length > CB_otheredge.length

# You can also use constraints on non-comparisons:
AB_edge != 12

my_edge(a, b) {
    a -> b as ab
    b -> a as ba

    # comparison constraints in macros:
    ab.length > ba.length
}
foo_node -> bar_node [length > 5] as my_new_edge

Filtering By Properties w/ Invalid Characters in the Name

Hey There,
I'm using dotmotif to query the neuPrint dataset and have found some of the neurons have properties that aren't accepted in the query string format
e.g. 'AVLP(R)': True,

Is there a way to still query w/ these params?
I tried adding directly to the _node_constraints but that doesn't seem to work either e.g.

motif._node_constraints['A']['AVLP(R)'] = {}
motif._node_constraints['A']['AVLP(R)']['='] = [True]

Variable `R` not defined (line 2, column 83 (offset: 156))
"    WHERE B.status = "Traced" AND A.status = "Orphan" AND A.INP = True AND A.AVLP(R) = True"

Error on first query

Tried to run the query from the tutorial:

motif = Motif("""
# My Awesome Motif

Nose_Cell -> Brain_Cell
Brain_Cell -> Arm_Cell
""")

But got this error:

FileNotFoundError                         Traceback (most recent call last)
<ipython-input-1-3a88159c0a0c> in <module>
----> 1 import dotmotif
      2 import networkx
      3 
      4 motif = Motif("""
      5 # My Awesome Motif

~\anaconda3\lib\site-packages\dotmotif\__init__.py in <module>
     24 from networkx.algorithms import isomorphism
     25 
---> 26 from .parsers.v2 import ParserV2
     27 from .validators import DisagreeingEdgesValidator
     28 

~\anaconda3\lib\site-packages\dotmotif\parsers\v2\__init__.py in <module>
     11 
     12 
---> 13 dm_parser = Lark(open(os.path.join(os.path.dirname(__file__), "grammar.lark"), "r"))
     14 
     15 

FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\xxxx\\anaconda3\\lib\\site-packages\\dotmotif\\parsers\\v2\\grammar.lark'

Non-string ids not supported by Neo4jExecutor

Ingesting a NetworkX graph with integer ids results in an error: ValueError: Could not export graph: unsupported operand type(s) for +: 'int' and 'str'. It should be straightforward to handle integers, though A node can be any hashable Python object except None. Maybe just cast with repr.

Better type-guessing for Neo4j imports

Right now, certain edge-cases fail when Python infers a type that doesn't match a Neo4j-friendly type (e.g. 0x0 is not a string). Investigate better type-guessing libraries.

Module Not Foud

Hi,

I am getting following error while executing the following code:

from dotmotif import Motif
from dotmotif.executors.NeuPrintExecutor import NeuPrintExecutor

ModuleNotFoundError Traceback (most recent call last)
in <cell line: 2>()
1 from dotmotif import Motif
----> 2 from dotmotif.executors.NeuPrintExecutor import NeuPrintExecutor

/usr/local/lib/python3.10/dist-packages/dotmotif/executors/NeuPrintExecutor.py in
1 import pandas as pd
----> 2 from neuprint import Client
3 from neuprint import fetch_all_rois
4
5 from .. import Motif

ModuleNotFoundError: No module named 'neuprint'

Add docker-compose instructions

If dotmotif were to be deployed in a Docker container, the default no-Neo4j-instance behavior would be to try to launch another container inside of the container. It would be good to describe the required steps to orchestrate an application container with a Neo4j container using docker-compose, which would allow you to network the two together and avoid the container-in-container problem.

Throw exception for incompatible kwargs for Neo4j

The Neo4jExecutor accepts a graph keyword and also db_bolt_uri and password, but it handles the two cases separately. Passing in a graph will cause the constructor to ignore the db_bolt_uri and password at present, so it should throw an exception if all three are provided.

Handle memory limits

When a user has insufficient memory dedicated to the docker container to initialize the jvm, the neo4j instance seems to crash without logging a failure. This results in an infinite hang on the user side. It seems like maybe a user should be given a heads up or time estimate when the docker container is spun up, or perhaps we should provide a bold note in the documentation.

Graph Backend Master List

This is a single issue to track graph databases for which we may consider support. Comments/edits in this thread are better than standalone issues.

GDB QL Comments
Cayley
Nebula nGQL Written in C++, custom query language.
Neptune Gremlin As of the time of writing, there is no way to run this cost-effectively without provisioning and managing VM-equivalents, which is out-of-scope for this work.
python-igraph (No QL) Does it make sense to have two Python libraries supported (vs supporting another gdb)?
GDB QL Comments
neuPrint Cypher Supported July 2020, #76
Neo4j Cypher Supported
networkx (No QL) Supported

More graph databases here.

Anonymous motif participants

Anonymous motif participants:

A -> _hidden
_hidden -> B

Anonymous node participants in macros:

two_hop(A, B) {
    A -> _i
    _i -> B
}

two_hop(neuron1, neuron2)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.