jcrozum / biobalm Goto Github PK
View Code? Open in Web Editor NEWThe biologist's Boolean attractor landscape mapper, building Waddington landscapes from Boolean networks.
Home Page: https://jcrozum.github.io/biobalm/
License: MIT License
The biologist's Boolean attractor landscape mapper, building Waddington landscapes from Boolean networks.
Home Page: https://jcrozum.github.io/biobalm/
License: MIT License
A few ideas for how to improve UX:
SuccessionDiagram
object. We want to knowbiodivine_aeon
to import rules.I am not sure whether or not the control code properly handles the possibility of overriding constant nodes (see #74). An example that should clarify this is:
A, B
B, A
C, A | D
D, C | E
E, false
The question is: when driving toward A=B=E=0, C=D=1
can E=1
be used to enter the C=D=1
trap space or not? It should be a valid intervention, but I suspect the code will not find it. This needs more testing.
This was mentioned in #28
Motif avoidant attractors must lie in the terminal restriction space, which satisfies:
See #55 for context.
This sub-section should define "full" succession diagrams and the notion of a "partially expanded" (or "partial" for short) succession diagram.
Basically, the idea is that such "partial" diagram must have nodes corresponding to all minimal trap spaces and all such nodes are reachable from the root, but it can have nodes that are not expanded, and as such have no outgoing edges ("stubs").
The question right now is whether these "stub" nodes should have some "fast-forward" edges that lead to the largest fully-expanded node which is their proper subspace. This might be useful for things like control (it is immediately obvious what are the reachable trap spaces, and by transition attractors (mostly), of a specific stub node), but complicates things slightly because it introduces another special type of edge. Furthermore, for the purposes of control, this information can be easily computed by the actual control algorithm (instead of tracking this inside the diagram).
See find_single_drivers(all_traps)
in motif_avoidant.py
.
A note on optimizing this step:
It may be wise to compute and store the LDOI (logical percolation) of every node state so we don't have to recompute for every trap space. Note that this can be done for the initial network, and then the LDOIs can be "grown" as we consider successive reduced networks.
We need to create a data structure to store the succession diagram, along with any metadata that is potentially relevant for control.
As the algorithm runs, the succession diagram object will be expanded.
This object should also have methods for control and for converting between reduced-network-based and stable-motif-based representations (basically, the edges of one representation are the nodes of the other and vice versa).
This issue relies on robust signed interaction graph computation (see issue #1).
Given a signed interaction graph and an FVS, compute an NFVS that is a subset of the FVS. This must be done in static.py
, and possibly elsewhere.
This is a "tracking issue" for everything that needs to be completed in order to (hopefully) have a nice paper about nfvs-motifs
in the end. I'll try to gradually create smaller issues for things that are not blocked by other tasks. Please use this issue for "high level" comments about paper structure and goals. I'll probably forget to add some things, so if you see that something is missing, feel free to edit my post and add it (github should let you do it as repo admins).
We might need to alter the structure to accommodate journal constraints (e.g. move some of this to the supplement). But until we have the actual submission guidelines, I propose the following high level structure.
trappist
/ASP.Right now, I haven't given any contributions/experiments regarding control. I'll think about what we can add in this regard. Some kind of benchmark is definitely possible, but something more "biological" would be nice too...
I've fixed a lot of the type hinting stuff so that we can be quite strict with our type checkers, but there are still a couple of issues to work out:
space_utils.expression_to_space_list
the types of the pyeda
objects are still mysterious to me. I just ignore them during type checking for now.In particular, we want to know how common are:
The problem is that we probably shouldn't use syntax for this, because the function expressions can be very weird depending on format. So we actually need some more sophisticated method (this should be a fast test with on the BDD representation once we figure out the right conditions, but maybe there is an easier way to test this even if it takes more time to run it).
There are some missing tests from #20 by @daemontus
I still don't have any tests for the reverse_time, ensure_subspace and avoid_subspaces parameters. Before merging, I'd like to add these as well, at least in some limited form, as I'm struggling to come up with a truly general test that could be performed for any network.
Also, I need a test that would check that maximal trap spaces are correctly computed when the network has inputs (i.e. they should just immediately fix inputs and not enumerate all combinations of everything).
I can use PyBoolNet and some functions from pystablemotifs to make an 'answer sheet' for the reverse_time. The result of Trappist can be checked with this.
I think this is fixed in this branch: infinite-succession-loop-fix, but I need to do more testing.
There were two issues: 1) some 0
and 1
values were still written as "0"
and "1"
, and 2) the percolation was only percolating the stable motif currently considered, and not the "upstream" stable motifs.
The fix I have in the branch above is a little bit inelegant; I basically just make it percolate everything from scratch. A better solution would probably be to add an option to the percolate_network
option to remove constant nodes.
See get_self_neg_tr_trap_spaces(tr_trap_spaces)
in motif_avoidant.py
. See also issue #3.
Note that self-negation is not a property that is restricted to time reversal trap spaces, so it is probably a good idea to build a general-purpose function to identify whether a set of variable states is self-negating by logical percolation.
Since we use Pint https://loicpauleve.name/pint/doc/index.html to check the reachability in ABNs, we need to create an asynchronous automaton corresponding to the original ABN. This asynchronous automaton is the required input of Pint. The structure of an asynchronous automaton is similar to a Petri net. Hence, we can try to convert the Petri net encoding of the ABN into the asynchronous automaton.
This issue is required for #9.
See #55 for context.
This should cover important properties of influence graphs. So far, I believe this means:
See #55 for context.
This is a sub-section of the Theory where we introduce basic concepts about Boolean networks:
Other things that I'm not 100% sure we need but will be probably useful:
I was testing to see if I could speed up the percolation by using a bdd-based approach instead of an expression-based approach, and I noticed that using the two approaches sometimes results in a different succession diagram size, meaning there is an error in one or the other. However, both versions pass all the tests. The new version is in the branch discussed in #49, and the old version is commented out in space_utils.py
. I will investigate further. I need to
We can improve the terminal restriction space calculation (in some cases dramatically) by identifying the single node drivers of stable motifs. I think we already have most of the functions necessary to do this, it's just a matter of putting them all together in the expand_attractors
function.
See static.py
. This step is necessary for computation of an NFVS. Currently, redundant inputs are not necessarily removed. We must also be careful about how ambiguous signs are handled (e.g., in the xor
function).
One approach we have discussed is to use BDDs to remove redundant inputs.
Currently, some files (e.g., trappist.py
) have type-hints, and some (e.g., motif_avoidant.py
) do not. We should add type hints to all the files. At the very least, new functions should use type hints.
I want to raise a "meta issue" regarding data structures. As we are adding new code, we should probably discuss a "reasonable" representation for commonly used concepts.
In particular, I want to talk about states and spaces. I started representing spaces as dictionaries mapping a variable name
to "0"
and "1"
strings (with missing key indicating "any" value). Meanwhile, PR #24 represents states as dictionaries mapping a variable name
to 0
/1
integers (with the implicit assumption that a state has all network variables specified).
This is a rather unfortunate dichotomy, since states and spaces are very closely related :D As such, I would propose to unify this using the integer representation (admittedly, integers should be a bit more efficient than strings):
state
and space
are represented as dict[str, int]
.state
must have all variable names specified.space
can have missing variables, which implicitly mean that the value can be arbitrary.0
/1
integers are allowed.So my first discussion topic is: What do you think about this? Do you also consider this to be an issue? Do you have a different idea for representing states/spaces?
The other concern I have right now is the Petri net encoding. We use the DiGraph
data structure with "canonical" names and attributes for places and transitions. However, it seems that we will actually want to perform a lot of operations on this Petri net (e.g. PR #23). As such, it might be a bit unsafe and impractical to work with just the DiGraph
?
I am imagining two options: Either (a) have a wrapper around the DiGraph
object that actually safely implements the more involved PN transformations. Or (b), create a custom PetriNetEncoding
class which will implement the encoding and transformations.
Personally, I find PetriNetEncoding
as slightly more appealing, because right now we are doing a lot of stuff where the DiGraph
is largely redundant. First, we build the BDDs for update functions, then we enumerate the implicants from these BDDs, then we create PN transitions inside the DiGraph
from these implicants, and finally, we use the transitions to create ASP rules. But the concept of "transition" is almost entirely redundant. The ASP rules can be built directly from the implicants. Similarly, if we want to "restrict" the PN to a particular subspace, it is easier to implement this transformation at the level of implicants instead of generic PN transitions.
So, again, discussion topic: Would you prefer a PetriNetEncoding
class that implements these things, or are you fine with a DiGraph
representation? Any other ideas about representation?
Do you foresee some similar discussions happening for other data structures?
In some cases, source SCCs (Strongly connnected components with no incoming edges) completely determine the attractors and the succession diagram. Even when this is not true, treating the source SCCs first can greatly reduce the SD size by reducing feed-forward loops that come from different orders of independent choices (of fixing stable motif in each source SCC).
This an extension from the current practice of fixing source node combinations. Expected benefits:
As shown, in many models, the SD of the source SCCs completely determine the whole SD.
In the above example, I only used the source SCCs of the original model. However, once the attractors of the source SCCs are found, we can percolate them and then find the new source SCCs in each reduced model. My guess is that these iterations will greatly reduce possible permutations in the SD.
The algorithm would work like this:
Note: Motif avoidance will be treated in the expansion of the SDs of the source SCCs, or in the remaining expansion of the full SD. If there is motif avoidance in source SCCs, they are simply ignored.
The control algorithm can be improved slightly.
Point 1 requires knowing what attractors are reachable from each SD node, so it's only viable if we pass in an SD that already has all the attractors identified.
See compute_candidate_set(U_neg: List[str], B, trs)
in motif_avoidant.py
.
Please also provide additional details to this issue, as I am not sure I understand what exactly needs to be done here.
We already have all the real-world models and an implementation that seems to work reasonably well on them. Which means we can start computing preliminary results.
The core issue is that right now, we only test one input combination for each model. In many cases, this does not really matter. However, there are models where this causes drastic differences in attractors.
The goal of this issue is to compute the succession diagram and attractors for as many input combinations as possible (for many networks, we can just enumerate all of them, for some we'll need to sample randomly).
Ideally, we should actually save the results somewhere. So the goal is to also figure out how to print the succession diagram in some sensible way. For the actual attractors, if we only have a single "seed" state, we can print that. If we have the larger symbolic set, we can dump the whole BDD (AEON supports this). The advantage is that we don't have to recompute the attractors ever again once we have these somewhere.
We will definitely need to visualise succession diagrams at some point. It would help tremendously if we could just dump a succession diagram into a valid graphviz or tikz file.
Ideally, the method should be also able to output the stable motifs on each edge, and if there are attractors, some kind of meta info about these (if a node has attractors, how many, how large, etc.)
See PreprocessingSSF(F, all_traps)
in motif_avoidant.py
.
As @daemontus noted in the latest PR, Pint does not have Windows support. I propose we look into replacing the Pint requirement with something else, if possible.
What alternatives are available?
It appears we are only using pint in motif_avoidant.py
right now.
See pnml_to_asp(name: str) -> str
in trappist.py
.
... or prove that such network does not exist.
Currently we have no unit tests set up. The earlier we do this, the easier it will be to debug the code going forward.
I can start by making a few simple tests, then everyone can add to them as needed. Unless someone objects, I will use pytest.
Function find_minimum_NFVS
may not return the same NFVS for the same network.
As long as everything else in the algorithm works, this should not be a problem for correctness. However, this makes it impossible to reliably test/benchmark the algorithm because:
detect_motif_avoidant_attractors
in particular).I don't consider this to be super high priority, but at least for testing we shouldn't use this method otherwise our results won't be comparable.
Here are some thoughts on what we want to look at when we construct SDs and find the attractors.
This means that there is a motif avoidant attractor
So far we have not found a single motif avoidant attractor.
This means there are more than 2 attractors in a single minimal trapspace. So far we found none, as number of attractors always matched the number of minimal trapspaces.
This means that the attractor does not span the whole minimal trapspace. I think there will be many such cases, but I don't know how to find and count them.
NOTE: even for the full expansion, source nodes are treated separately. This may make it difficult to compare random models with empirical models, because the source nodes are already fixed in empirical models.
We found that empirical models do not show significant correlation between N and size/depth/n_att, at least in the range of N = 10~300.
We found that the size and n_att scale exponentially with depth. i.e. d ~ log(size) and d ~ log(n_att). Depth tend to be larger than log2(n_att) in both the empirical models and the random models. This means that the models' decisions tend to be inefficient compared to a binary search.
We need size, depth, n_att for larger random NK models and random NCF models, to build better figure for the succession diagram scaling. This should be done with full expansion.
We need n_att and depth for larger random NK models, to reproduce n_att vs N graph made with pystablemotifs. Also we can look for correlations between depth and N for the random NK model. Since there is correlation between #att and N for random NK models (n_att ~ N0.12), we can expect d ~ 0.12 log(N). We can use source SCC expansion for this.
We are almost certain that the succession diagram can come in any shape. Hence the properties of the SD should be coming from the properties of the ensembles.
Decision points
Suppose we do a breath first search from each attractor. For each pair of attractors, the lowest trapspace where they meet could be called the decision point of the pair. What's the portion of the nodes that are decision points?
For example, the simplified SD of the cell cycle model
The green nodes are the decision points.
If the system is monostable, there will be no decision points.
If the system only consists of sources (and if we don't fix sources when we construct the SD), every node in the SD will be decision points.
Reducing nodes with out-degree 1
These nodes will not be decision points. Maybe reducing them will reveal more?
An issue regarding how percolation (LDOI) is computed has arisen in #28. Specifically, how should constant nodes be handled in space_utils.percolate_space
? As an example, consider
When percolating, it is useful to to see that
As the title says, building a bdd from a state dict (or list of state dicts) is very slow. I have a branch state-bdd-optimizations that tries to deal with this. So far I have tried two improvements:
@lru_cache
decorator)motif_avoidant._preprocess_candidates
to use the candidates
dictionary directly, without converting to a bddI'm still testing, but preliminary results look very good. On the network I'm using for tests (148.bnet), I go from ~20-30 seconds to ~2-3 seconds.
I still need to carefully check that I haven't broken the motif_avoidant._preprocess_candidates
function, but it seems OK because I'm still passing the tests.
Even with just the caching (improvement 1), I'm down to ~8 seconds on this network, so if I have to scrap improvement 2, it will likely still be a substantial improvement.
This is due to rule construction on line ~250. If the node is constant, there are no conditions and there is a trailing when
attached to it.
See get_tr_trap_spaces(petri_net, subspace, source_nodes)
in motif_avoidant.py
For #67 and for all future analysis of succession diagrams including control and so on,
It'll be nice to have the SDs of bbms and random models stored somewhere.
With my machine, some of the SDs take unreasonably long time to compute.
First I will focus on full SDs, but we can later on store lazy SD, partial SD for control, and so on.
graphml
Probably the easiest is to store the networkx.DiGraph in graphml. We may want to visualize some of the SDs in the future, so it'll be nice to have them. I can work on this.
Pickle SuccessionDiagram class
https://docs.python.org/3/library/pickle.html
I'm not familiar with this, and @daemontus mentioned that there should be some modifications to allow Rust data get pickled. Not a big rush, but could be quite useful.
When I use
terminal_restriction_space = get_terminal_restriction_space(child_spaces,
self.network,
ensure_subspace=node_space,
use_single_node_drivers=True,
use_tr_trapspaces=True)
to find the terminal restriction space, it takes several seconds and I eventually get the following error:
$ python3 bench_attractor_search.py bbm-bnet-inputs-true/145.bnet
[0] Expanding: 1 fixed vars.
Found minimum trap space: {'v_UVB': 1}.
Total expanded: 1/1. Fixed vars 1/62 at depth 0.
Succession diagram size: 1
[id=0;children=0] Candidates: 2
Traceback (most recent call last):
File "/mnt/c/Users/jcroz/github/nfvs-motifs/nfvs-motifs/bench_attractor_search.py", line 30, in <module>
attr = sd.expand_attractors(i)
File "/mnt/c/Users/jcroz/github/nfvs-motifs/nfvs-motifs/nfvsmotifs/SuccessionDiagram.py", line 365, in expand_attractors
attractors = detect_motif_avoidant_attractors(
File "/mnt/c/Users/jcroz/github/nfvs-motifs/nfvs-motifs/nfvsmotifs/motif_avoidant.py", line 57, in detect_motif_avoidant_attractors
return _filter_candidates(petri_net, candidates, terminal_restriction_space)
File "/mnt/c/Users/jcroz/github/nfvs-motifs/nfvs-motifs/nfvsmotifs/motif_avoidant.py", line 163, in _filter_candidates
if _Pint_reachability(petri_net, state, avoid_states) == False:
File "/mnt/c/Users/jcroz/github/nfvs-motifs/nfvs-motifs/nfvsmotifs/motif_avoidant.py", line 193, in _Pint_reachability
return pint_model.reachability(goal=goal, fallback='mole')
File "/usr/local/lib/python3.10/dist-packages/pypint/tools.py", line 561, in reachability
cp = _run_tool("pint-reach", goal, *sa_args, input_model=self, timeout=timeout)
File "/usr/local/lib/python3.10/dist-packages/pypint/tools.py", line 77, in _run_tool
return subprocess.run(args, **run_opts)
File "/usr/lib/python3.10/subprocess.py", line 501, in run
with Popen(*popenargs, **kwargs) as process:
File "/usr/lib/python3.10/subprocess.py", line 969, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "/usr/lib/python3.10/subprocess.py", line 1845, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
OSError: [Errno 7] Argument list too long: 'pint-reach'
Setting use_single_node_drivers=False
and use_tr_trapspaces=False
does not give the error, and it runs much faster.
... or prove that such network cannot exist.
See FilteringProcess(F, all_traps)
in motif_avoidant.py
.
We need a generator for random networks that satisfies our requirements. We can definitely adapt something existing, but I am not aware of anything that would implement all of this. If you know about something that would be a good start, please comment below.
Of course, this does not have to be a single generator. We just need the networks.
From attractors(network)
in main.py
:
# TODO: This is not correct, because it will look for sets of states that are
# terminal within the candidates set. I.e. if there is a transition out from
# candidates into a state that is not in candidates, this will not find that transition.
I am not sure what "This" refers to, and I don't quite understand what the error is. Please provide additional information on this issue here, and feel free to update the labels, etc. as appropriate.
See #55 for context.
This should describe various random models of Boolean networks that we are considering.
The basis are N-K networks. However, we can probably just fix K=2 from the beginning, as this is where the critical transition occurs.
However, we need to differentiate between networks with constant in-degree and exponential/Poisson in-degree.
The second aspect is the considered update functions. The "baseline" are completely random functions. However, we would like to also test something more biologically realistic like locally-monotonic functions. Since these are hard to generate randomly, we might want to consider something like threshold old nested-canalising functions.
See set_retained_set(U_neg: List[str], trs)
in motif_avoidant.py
.
Currently, the retained set is chosen randomly.
I plan to implement the control in a few stages:
I will update this plan as appropriate.
Models with SD depth 0 are expected to have SD size 1.
They do indeed have SD size 1 after #71, but used to have bigger SD size in the older version for the full expansions or the DFS.
My hope and guess is that this issue is already solved and do not require any further action.
name | full | DFS | depth |
---|---|---|---|
149.bnet | size=7 | size =7 | 0 |
157.bnet | size=3 | size =3 | 0 |
161.bnet | size=12 | size =8 | 0 |
176.bnet | size=2 | size =2 | 0 |
190.bnet | size=2 | size =2 | 0 |
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.