aplbrain / dotmotif Goto Github PK

A performant, powerful query framework to search for network motifs

Home Page: https://bossdb.org/tools/dotmotif

License: Apache License 2.0

Python 100.00%

graph query-language motifs motif-discovery python connectomics connectome neo4j graph-database subgraph-isomorphism graph-matching networkx vf2 aplbrain bossdb dotmotif

dotmotif's Introduction

d o t m o t i f

Find graph motifs using intuitive notation

DotMotif is a library that identifies subgraphs or motifs in a large graph. It looks like this:

# Look for all motifs of the form,

# Neuron A synapses on Neuron B:
A -> B
# ...and B inhibits C:
B -> C [type = "inhibitory"]

Examples

Notebook	Description
	Looking for motifs in the IARPA MICrONS Pinky100 Dataset
	Motif search in a custom graph
	Subgraph search in the Janelia Hemibrain dataset

Get Started

If you have DotMotif, a NetworkX graph, and a curious mind, you already have everything you need to start using DotMotif:

from dotmotif import Motif, GrandIsoExecutor

executor = GrandIsoExecutor(graph=my_networkx_graph)

triangle = Motif("""
A -> B
B -> C
C -> A
""")

results = executor.find(triangle)

Parameters

You can also pass optional parameters into the constructor for the dotmotif object. Those arguments are:

Argument	Type, Default	Behavior
`ignore_direction`	`bool`: `False`	Whether to disregard direction when generating the database query
`limit`	`int`: `None`	A limit (if any) to impose on the query results
`enforce_inequality`	`bool`: `False`	Whether to enforce inequality; in other words, whether two nodes should be permitted to be aliases for the same node. For example, in `A->B->C`; if `A!=C`, then set to `True`
`exclude_automorphisms`	`bool`: `False`	Whether to return only a single example for each detected automorphism. See more in the documentation

For more details on how to write a query, see Getting Started.

Citing

If this tool is helpful to your research, please consider citing it with:

# https://doi.org/10.1038/s41598-021-91025-5
@article{Matelsky_Motifs_2021, 
    title={{DotMotif: an open-source tool for connectome subgraph isomorphism search and graph queries}},
    volume={11}, 
    ISSN={2045-2322}, 
    url={http://dx.doi.org/10.1038/s41598-021-91025-5}, 
    DOI={10.1038/s41598-021-91025-5}, 
    number={1}, 
    journal={Scientific Reports}, 
    publisher={Springer Science and Business Media LLC}, 
    author={Matelsky, Jordan K. and Reilly, Elizabeth P. and Johnson, Erik C. and Stiso, Jennifer and Bassett, Danielle S. and Wester, Brock A. and Gray-Roncal, William},
    year={2021}, 
    month={Jun}
}

dotmotif's People

Contributors

Stargazers

Watchers

Forkers

zkkxu sailfish009 trendingtechnology reimerlab juttapig vishalbelsare sunmeal jakobtroidl moomoofarm1

dotmotif's Issues

Negation for "contains" and "in" operators

"Foo" in "Foobar", but what if I want to search for "Bar" !in "Foobar"?

Add ! negation operator to the contains and in edge/node attribute constraint operators.

Add Impossible Constraints validator

We should be able to automatically catch things like this:

A.type = 4
A.type != 4

Right now, we'll catch them in certain instances, but not when constraints are inherited from automorphisms (see #118). Getting smarter about this will likely improve runtime considerably.

Automatically enrich node attribute constraints when an automorphism is listed

In the following case:

A -> C
B -> C

A.size >= 4

A === B

We should automatically enrich the B.size attribute constraints accordingly. This is also an opportunity for validators to sanity-check constraints:

A -> C
B -> C

A.size >= 4
B.size < 4

A === B

...should throw a validation error.

Macro attribute support

Separate from #16 and #19:

Macro support is a bit more complicated, because it means that "virtual" edges need to support attributes, which are hardened down to real edges when the macro is instantiated.

Container doesn't pause incoming connections when it crashes

To reproduce:

Break a container by asking for a big (out-of-ram) response
Try any other request within a few seconds (pre-reboot)

Remove named variables from negative edges

We don't need to assign a variable / memory-allocation to negative edges (i.e. A~>B in the Neo4jExecutor cypher generator. Doing so also introduces a spare "WHERE" clause that is unnecessary.

Simplify motif construction code

Constructor can take a string param instead of **kwargs.

Support n constraints on each edge value-operator pair

Currently, the parser overwrites previous operators if it's redefined:

A -> B [value<=5, value<=2]

...will yield a constraint operator of

{ "value": { "lte": 2.0 } }

(i.e. overwriting the first rule).

enforce_inequality WHERE clause conflicts with constraint clauses

Add "vanilla" triangle and n-clique count aliases from Hočevar & Demšar 2017

I‘ve no idea how to output the motifs

Move special imports to `extras_require`

You should be able to use DotMotif without installing neo4j- and docker-flavored dependencies.

Proposal:

# Only install core dotmotif for networkx and grandiso:
pip install dotmotif

# Install cypher extensions for neuPrint, and docker 
# extensions for creating a local neo4j database:
pip install 'dotmotif[neo4j]'

Improve utils#draw_motif

perhaps a nx API change(?) has resulted in no edges being rendered with default params

No WHERE when enforce_inequality and no constraints

The reverse of #23; enforce_inequality=True with no constraints fails to add a WHERE to the cypher query, which is wrong.

Add circleCI test suite to GitHub repo

Support numbers in variable names

All words of non-space characters should be valid.

Non-edit pip install fails to import

Installing with pip using the -e (edit) flag allows the user to import dotmotif from another directory. However, installing with pip without the -e flag results in an error message, specifically: ModuleNotFoundError: No module named 'dotmotif.parsers'.

Add offset/limit depagination support to Neo4jExecutor

Don't require 'action' etc when creating a motif from_nx

cc @morganschuyler

Error in motif calculation

I was running the notebook DotMotif-Search-in-Pinky100.ipynb and the following cell gives error.

motif = Motif("""
A -> B [spine_vol_um3 > 0.25]
B -> C
C -> A
""")

Output:

UnexpectedCharacters                      Traceback (most recent call last)
[<ipython-input-9-eaabd7ae9c73>](https://localhost:8080/#) in <cell line: 1>()
----> 1 motif = Motif("""
      2 A -> B [spine_vol_um3 > 0.25]
      3 B -> C
      4 C -> A
      5 """)

7 frames
[/usr/local/lib/python3.10/dist-packages/lark/parsers/xearley.py](https://localhost:8080/#) in scan(i, to_scan)
    116             if not next_set and not delayed_matches and not next_to_scan:
    117                 considered_rules = list(sorted(to_scan, key=lambda key: key.rule.origin.name))
--> 118                 raise UnexpectedCharacters(stream, i, text_line, text_column, {item.expect.name for item in to_scan},
    119                                            set(to_scan), state=frozenset(i.s for i in to_scan),
    120                                            considered_rules=considered_rules

UnexpectedCharacters: No terminal matches '_' in the current parser context, at line 2 col 14

A -> B [spine_vol_um3 > 0.25]
             ^
Expected one of: 
	* IN
	* OPERATOR
	* __ANON_5
	* __ANON_6
	* CONTAINS

Removing [spine_vol_um3 > 0.25] fixes the issue . Perhaps a recent update havent been tested for such filtering ?

Neo4j: User-defined attribute indices

Allow the user to specify node attributes to be set as indices:

Neo4jExecutor(..., index=["neurotransmitter", "radius"])

Expose Neo4jExecutor from default package imports

Propagate constraints on formally-notated node symmetries

If I formally declare that two nodes are automorphic to one another, constraints from one should propagate to the other.

A -> B
B -> A

A === B

A.size = "big"

That is, B.size should also have the constraint of B.size = "big".

This should happen BEFORE constraint validation steps, since there may be collisions:

A -> B
B -> A

A === B

A.size = "big"
B.size = "small" # collision!

Neuprint Executor - Labeling Edges by ROI

Hi Jordan,

Do you see an easy way to assign ROI labels to edges in the neuprint executor? Let's say I want to query something like this:

A -> B [weight > 20, ROI == "CX"]
A -> B [weight > 30, ROI == "CRE(L)"]

So basically, there are two things here—multigraphs, which you address already in the docs, and encoding edge ROIs. I wonder if that's rather a hard thing to do or not. The data should be there as neuprint-python fetch_synapse_connections returns something like this

    bodyId_pre  bodyId_post roi_pre roi_post  x_pre  y_pre  z_pre  x_post  y_post  z_post  confidence_pre  confidence_post
0    792368888    754547386  PED(R)   PED(R)  14013  27747  19307   13992   27720   19313           0.996         0.401035
1    792368888    612742248  PED(R)   PED(R)  14049  27681  19417   14044   27662   19408           0.921         0.881487
2    792368888   5901225361  PED(R)   PED(R)  14049  27681  19417   14055   27653   19420
...

According to this issue it looks like it's possible. My observation is that the physical location of a connection between two neurons is an important feature of a motif. Looking forward to hearing what you say.

EDIT: Maybe an indirect way to support multiple edges between two nodes is by grouping edge attributes. Does something like this seem plausible. You are doing smth similar in the multigraph docs already: A -> B [synapse_count > 2]. But what exactly is synapse_count?

A -> B [[weight >= 20, ROI == "CX"], [weight > 30, ROI == "CRE(L)"]]

Best, Jakob

MultiDiGraph support

Hey. Is it possible to add support for multidigraph?

homogenize code for automorphisms in count/find/to_cypher

Upgrade grandiso version to use limits and iterable

In grandiso v1.1.0 and above, there is an optional limit argument to the find_motifs call which short-circuits motif counting if a certain number of valid mappings are found.

Right now, NetworkX and GrandIso executors implement the dotmotif limit parameter by finding all motifs and then downselecting, which is super inefficient and lame. We could pretty substantially improve performance by supporting the GrandIso limit arg.

A notable challenge: We perform an additional downselect after running grandiso (to double-check attribute filters). So we may need to store a list of mappings temporarily in order to backfill the results list if candidate mappings are filtered out.

Node- and edge-attribute support in DSL

Proposed syntax concepts:

Nodes

Inline maplike:

Node1 { type="GABA", z<12 } -> Node2

Pros:

Succinct

Cons:

Possible duplication or conflicting attributes if map is included on multiple lines for the same node

Postfix where-like:

Node1 -> Node2 | Node1.type = "GABA", Node1.z < 12

Pros:

Succinct

Cons:

Possible duplication or conflicting attributes if attrs are included on multiple lines for the same node

Footnote constraints

Node1 -> Node2

Node1.type = "GABA"
Node1.z < 12

Pros:

Reduces possibility of conflicting constraints
Clear syntax; can be standalone in its own macro

Cons:

Linecount verbose
Decouples attributes from connectivity clauses

Edges

Inline maplike:

A ->{type: "excitatory", neurotransmitter: "ACh"} B

Pros:

Inline

Cons:

Reduces clarity of language

Postfix where-like:

A -> B | [type: "excitatory", neurotransmitter: "ACh"]

Pros:

Inline

Cons:

Reduces clarity of language

Infix maplike:

A -[type: "excitatory", neurotransmitter: "ACh"]> B

Pros:

Inline

Cons:

Reduces clarity of language

Performance on large graphs

Hello, I have a huge graph i want to calculate motifs from but for years i have been struggling to find a scalable library to calculate motifs. How does dotmotif scale ? Is there a Big O complexity analysis for motifs with participation and without participation ?

CSVImport performance improvements

In many cases, edges may be imported in parallel (though not always!). In these cases, it would be useful to have a (default-True) flag to parallelize graph construction.

Upgrade to latest version of networkx API

Rewrite tests to parametrize executors

To avoid test duplication.

See grand codebase for good example.

Aliased edges and inter-edge comparisons

# Alias is of the form `as <name>`
A -> B as AB_edge
# other constraints go before alias:
C -> B [foo=bar] as CB_otheredge

# Comparison constraints:
AB_edge.length > CB_otheredge.length

# You can also use constraints on non-comparisons:
AB_edge != 12

my_edge(a, b) {
    a -> b as ab
    b -> a as ba

    # comparison constraints in macros:
    ab.length > ba.length
}

foo_node -> bar_node [length > 5] as my_new_edge

Filtering By Properties w/ Invalid Characters in the Name

Hey There,
I'm using dotmotif to query the neuPrint dataset and have found some of the neurons have properties that aren't accepted in the query string format
e.g. 'AVLP(R)': True,

Is there a way to still query w/ these params?
I tried adding directly to the _node_constraints but that doesn't seem to work either e.g.

motif._node_constraints['A']['AVLP(R)'] = {}
motif._node_constraints['A']['AVLP(R)']['='] = [True]

Variable `R` not defined (line 2, column 83 (offset: 156))
"    WHERE B.status = "Traced" AND A.status = "Orphan" AND A.INP = True AND A.AVLP(R) = True"

Don't cast ingest datatypes to string

dotmotif/dotmotif/ingest/__init__.py

Line 44 in 7c5a21c

dtype={u_id_column: str, v_id_column: str},

This can result in very-large-numbers being converted into scientific notation and losing precision.

Error on first query

Tried to run the query from the tutorial:

motif = Motif("""
# My Awesome Motif

Nose_Cell -> Brain_Cell
Brain_Cell -> Arm_Cell
""")

But got this error:

FileNotFoundError                         Traceback (most recent call last)
<ipython-input-1-3a88159c0a0c> in <module>
----> 1 import dotmotif
      2 import networkx
      3 
      4 motif = Motif("""
      5 # My Awesome Motif

~\anaconda3\lib\site-packages\dotmotif\__init__.py in <module>
     24 from networkx.algorithms import isomorphism
     25 
---> 26 from .parsers.v2 import ParserV2
     27 from .validators import DisagreeingEdgesValidator
     28 

~\anaconda3\lib\site-packages\dotmotif\parsers\v2\__init__.py in <module>
     11 
     12 
---> 13 dm_parser = Lark(open(os.path.join(os.path.dirname(__file__), "grammar.lark"), "r"))
     14 
     15 

FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\xxxx\\anaconda3\\lib\\site-packages\\dotmotif\\parsers\\v2\\grammar.lark'

Non-string ids not supported by Neo4jExecutor

Ingesting a NetworkX graph with integer ids results in an error: ValueError: Could not export graph: unsupported operand type(s) for +: 'int' and 'str'. It should be straightforward to handle integers, though A node can be any hashable Python object except None. Maybe just cast with repr.

Better type-guessing for Neo4j imports

Right now, certain edge-cases fail when Python infers a type that doesn't match a Neo4j-friendly type (e.g. 0x0 is not a string). Investigate better type-guessing libraries.

Module Not Foud

Hi,

I am getting following error while executing the following code:

from dotmotif import Motif
from dotmotif.executors.NeuPrintExecutor import NeuPrintExecutor

ModuleNotFoundError Traceback (most recent call last)
in <cell line: 2>()
1 from dotmotif import Motif
----> 2 from dotmotif.executors.NeuPrintExecutor import NeuPrintExecutor

/usr/local/lib/python3.10/dist-packages/dotmotif/executors/NeuPrintExecutor.py in
1 import pandas as pd
----> 2 from neuprint import Client
3 from neuprint import fetch_all_rois
4
5 from .. import Motif

ModuleNotFoundError: No module named 'neuprint'

Guess which executor to use based on graph size

This will have to get smarter once multiple graph databases are supported out of the box, but for now, small things can be run in memory and all larger things will need to use Neo4j.

Add docker-compose instructions

If dotmotif were to be deployed in a Docker container, the default no-Neo4j-instance behavior would be to try to launch another container inside of the container. It would be good to describe the required steps to orchestrate an application container with a Neo4j container using docker-compose, which would allow you to network the two together and avoid the container-in-container problem.

Support multiple simultaneous running Neo4j docker containers

Currently, only one Neo4j executor can run at a time. Users may want multiple (on HCP, perhaps.)

Issue'ing now as a document of awareness.

NetworkXExecutor occasionally does not identify any isomorphisms

import networkx as nx
from dotmotif import dotmotif, NetworkXExecutor

E = NetworkXExecutor(graph=nx.complete_graph(20, nx.Graph))
motif = dotmotif().from_motif("A -> B")
E.count(motif)

The above returns 0.

Throw exception for incompatible kwargs for Neo4j

The Neo4jExecutor accepts a graph keyword and also db_bolt_uri and password, but it handles the two cases separately. Passing in a graph will cause the constructor to ignore the db_bolt_uri and password at present, so it should throw an exception if all three are provided.

Upgrade neo4j

Bring neo4j executor default version up to date

Handle memory limits

When a user has insufficient memory dedicated to the docker container to initialize the jvm, the neo4j instance seems to crash without logging a failure. This results in an infinite hang on the user side. It seems like maybe a user should be given a heads up or time estimate when the docker container is spun up, or perhaps we should provide a bold note in the documentation.

Graph Backend Master List

This is a single issue to track graph databases for which we may consider support. Comments/edits in this thread are better than standalone issues.

GDB	QL	Comments
Cayley
Nebula	nGQL	Written in C++, custom query language.
Neptune	Gremlin	As of the time of writing, there is no way to run this cost-effectively without provisioning and managing VM-equivalents, which is out-of-scope for this work.
python-igraph	(No QL)	Does it make sense to have two Python libraries supported (vs supporting another gdb)?

GDB	QL	Comments
neuPrint	Cypher	Supported July 2020, #76
Neo4j	Cypher	Supported
networkx	(No QL)	Supported

More graph databases here.

Anonymous motif participants

Anonymous motif participants:

A -> _hidden
_hidden -> B

Anonymous node participants in macros:

two_hop(A, B) {
    A -> _i
    _i -> B
}

two_hop(neuron1, neuron2)

aplbrain / dotmotif Goto Github PK

dotmotif's Introduction

d o t m o t i f

Examples

Get Started

Parameters

Citing

dotmotif's People

Contributors

Stargazers

Watchers

Forkers

dotmotif's Issues

Proposal:

Nodes

Inline maplike:

Postfix where-like:

Footnote constraints

Edges

Inline maplike:

Postfix where-like:

Infix maplike:

Recommend Projects

Recommend Topics

Recommend Org