sybila / biodivine-lib-bdd Goto Github PK

View Code? Open in Web Editor NEW

17.0 17.0 4.0 2.3 MB

A small library for BDD manipulation in Rust. Part of the BioDivine toolset.

License: MIT License

HTML 0.83% Rust 99.17%

biodivine-lib-bdd's People

Contributors

Stargazers

Watchers

Forkers

tangentforks lengyijun

biodivine-lib-bdd's Issues

Random valuation/path picking.

Add methods which will accept a random number generator and allow to randomly pick a a witness path/valuation from a Bdd.

Specifically, var_pick_random, pick_random, random_valuation, and random_path.

Improve documentation for advanced Bdd functions.

Things like pick/projection/select and BddValuation iteration should have much more exhaustive documentation with a section in the tutorial.

Allow BDD operations to execute as interruptible async functions

Most BDD operations can be potentially long lasting. This can cause issues for larger deployments where we run many BDD operations concurrently (or in parallel) and potentially need to cancel some of them mid-flight. We might want to integrate with standard rust async/await mechanisms to enable safe cancellation of BDD operations.

The other advantage is that we can evaluate BDD operations lazily or in parallel through async, opening up to simple optimizations that would otherwise need to be implemented by hand. For now, this issue is meant as a place to store the ideas and resources regarding this change.

support if_then_else op

Implementation of `infer_regulatory_graph` can use too much memory

The problem is that infer_regulatory_graph must technically completely rewrite the Boolean expressions that describe the update functions. The reason is that the update functions may use non-observable variables that are no longer used by the regulatory graph.

However, for some reasonably large update functions, the BDD translates to a very ugly Boolean expression even if the function is described in a reasonable format in the original model. As such, we need to come up with a different way of computing this equivalent Boolean expression.

A potential way I can imagine now is building a DNF of the BDD, but I can't seem to think of an algorithm that would produce a reasonably small enough DNF for now. Other option is to use the current implication-based scheme, but instead of using the BDD variable ordering, pick one dynamically using some greedy optimisation.

For now, this issue is mitigated by first trying the old update function and then stringifying only if the old function is reported as invalid in the new network.

Bdd valuation manipulation and iteration.

Extend Bdd API with the following:

is_single_valuation for testing if a Bdd is exactly one valuation.
is_single_path for testing if a Bdd is exactly one "path" (i.e. BddPartialValuation).
min_valuation, max_valuation to obtain least/greatest valuation in terms of lexicographic ordering.
most_positive_valuation/path, most_negative_valuation/path for getting valuation/path with most positive/negative literals. Note that the most positive/negative valuation does not have to be a max/min valuation because lexicographic ordering favours ones in high places.
most_fixed_path and least_fixed_path for getting a path with most/least variables fixed.
Add a BddPathIterator which will go through all paths as BddPartialValuations.
A rework of BddValuationIterator so that it is based on BddPathIterator instead of the current situation. Also, it should be possible to create an iterator just from a partial valuation and variable count.
A tutorial text which explains BddValuation and how it can be extracted from a Bdd and then converted back into a Bdd.

Explicitly test Bdd graph ordering

Currently, we do not test anywhere that the BddNodes are really stored in the order we are claiming. We are testing that Bdds adhere to syntactic equality rules, so it is safe to say the order should be deterministic, but we don't have explicit tests for this.

`cargo test` failed

Could you please fix them?

Basic under/over approximation operations.

An over/under approximation of a BDD can be easily obtained by erasing the smallest cube. Other heuristics can be considered as well, for example the cube can be pseudo-randomly selected (probably with some weight given to the size of the cube). This reduces the size of a BDD while minimising loss of information, which is useful for approximative techniques.

Reconsider `Ord` trait on `BddValuation`

Right now, this is a "derived" trait implementation. But this may be slightly unintuitive, because it is just a "lexicographic" ordering. On surface, this is reasonable because it follows the normal iteration order. But there are a few considerations to be made:

Is this even useful? Do we expect people to compare valuations like this?
Should ordering fail on incompatible valuations? (i.e. different variable count).
Shouldn't there rather be a PartialOrd between BDD valuations and partial valuations that is instead based on "inclusion"? Wouldn't this be a more natural ordering on valuations?

Support computation of exact valuation count.

At the moment, we can only measure the number of valuations in a BDD in floating point numbers. That quickly starts to fail with 1000+ variables. We should at least have to option to compute the number exactly.

Add algorithm to build BDD satisfied by valuations of size `k`.

In some cases, it is useful to have a BDD that is satisfied by valuations with exactly k (or up to k) variables from a given subset. For example, this is can be useful when running search with some size/cardinality criteria (e.g. all perturbations of size k that work in a Boolean network B).

The "naive" way to build such BDD is to run through all combinations of k variables and build a DNF-like formula based on these. But this is obviously very inefficient. Instead, there is a much faster algorithm, which is not particularly hard to implement, but can be tricky to figure out for anyone not well versed in BDD manipulation. We should provide this algorithm as part of BddVariableSet.

The algorithm is as follows:

variables = ... input ...
bdd = [BDD of a conjunctive clause where all variables are false]
for _i in 1..(k+1):
    new_bdd = false
    for var in variables:
       var_is_false = [BDD with var=false]
       x = Bdd::binary_with_flip(bdd, var_is_false, flip_output=var, op=and)
       new_bdd = new_bdd | x
    bdd = new_bdd

For "up to k" variant, simply use bdd = bdd.union(new_bdd) instead of bdd = new_bdd.

Question about order in `Bdd`.

if flip_out_if == Some(decision_var) {
    // If we are flipping output, we have to compute subtasks in the right order.
    if new_high.is_none() {
        stack.push(comp_high);
    }
    if new_low.is_none() {
        stack.push(comp_low);
    }
} else {
    if new_low.is_none() {
        stack.push(comp_low);
    }
    if new_high.is_none() {
        stack.push(comp_high);
    }
}

You seem to keep the order in Bdd.0. But I think except the first two and the last one(root), other BddNode's order doesn't matter.

Motivates

I'm trying to rewrite project function to improve performance. But my approach will mess up BddNode's order.
I want to know if the order is useful in some certain cases?

Functions `read_as_string` a `write_as_string` are private

Not sure why these two functions aren't public as they are the only way to parse a Bdd from string while actually getting an error string. Probably just an oversight in the past.

Consider alternative names for `BddVariableSet`.

In other BDD libraries, the terms support is sometimes used to describe the variables of the BDD. Maybe we can unify this notation and bring it into this library.

Add strict Travis rules and git hooks

Apply changes from sybila/biodivine#2 and sybila/biodivine#1

(This will need some minor code style changes)

Enable conversion between valuation and Bdd

Overall, just make conversion between bitvactors, valuations and Bdds easier.

Isses with BDD serialisation on buffered reader/writer

Right now, "raw" serialisation is written such that it must always consume input in 10-byte chunks. However, some writers will not return continuous 10-byte chunks. For example, a buffered writer will only return 10-byte chunks if its buffer is a multiple of 10. Otherwise it will clip the "last" chunk to match the length of the buffer, in which case the reading can still continue, but the loop in our de-serialisation routine will stop becuase it was expecting a 10-byte chunk.

Support for prime implicants (cubes) / minterms (valuations?)

In logic, we refer to the path in the BDD as prime implicant (or cube), and to a valuation as minterm. We should be able to add support for this notation into the library. At the moment, we can iterate over valuations, but we cannot iterate over cubes. We should be also able to select largest/smallest cube and minterm. A cube should be iterable (yielding minterms) and a there should be a conversion process between minterms, cubes and BDDs (this now exists only partially).

BddValuation as BitVector in std-lib

Currently, BddValuation is just a fix-length vector of boolean values. This is a very common data structure, also called bitvector. We need to consider adding bitvector support directly to biodivine standard library and then just using bitvectors.

Related to #1.

Improve Bdd export to BooleanExpression

Currently, export to BooleanExpression from Bdd is a trivial graph traversal algorithm. There is a lot of extra techniques that can improve this algorithm:

If we know all valid paths have x set to true, we can start the formula with x & ... and later skip x completely (this can be also done for each smaller Bdd in each expansion step). Technically, we are looking for a variable ordering that minimises the number of nodes in this specific Bdd.
We can try to find implication/equivalence pairs. That is, if a <=> b on all paths, we can put this as a condition at the beginning and then skip b completely.

Remove guarantees on node ordering from `Bdd` representation.

In some cases, it is much more beneficial to produce Bdd nodes in a non-DFS order (for example #24, but other "custom" operations as well). Therefore in the future we should implement operations in such a way that they do not assume the Bdd to be sorted based on DFS post-order.

The other question is whether we can also lift the requirement of a Bdd being canonical, but that seems to be much harder to do, because some algorithms are easy exactly because Bdd is canonical. So for now, we only want to remove the ordering requirement.

In particular, this requires equivalence checks to be re-implemented, and to audit other algorithms to check whether they don't have a problem with different orderings.

A greedy `pick` implementation

At the moment var_pick and pick prefer as witnesses valuations with variables set to false. However, in most use-cases for pick, the intention is actually to minimise the number of unique witness valuations.

This is a hard problem to solve exactly, but we should be able to at least give a greedy heuristic which for each variable fixes a Boolean value that is in the given context the less prominent one (resulting in less witnesses overall). Globally this may not be optimal, but it could be interesting to test how this affects algorithms that depend on pick.

BooleanExpression using a binary operation "action enum".

BooleanExpression should use a BinaryOp for the binary boolean operations instead of explicitly specifying all five options. Also, it might be a good idea to move BooleanExpression to standard library entirely, because it can be used by other crates that need to work with formulas as well. In fact, standard library should provide some kind of extensible generic parser where one can define boolean expressions with custom atomic propositions.

Fast CNF/DNF algorithm.

At the moment, we can create CNF/DNF using clauses, where creation of each clause is fast, but the creation of the formula can be slow. It is however possible to implement a faster algorithm based on #19. This could be an interesting addition to speed up serialisation/deserialisation.

Rethink invert_input

Currently, we have an experimental invert_input operation on Bdds, but we need to rethink how this should be described and presented to the user. Also, the algorithm breaks the DFS order of the Bdd - this needs to be reworked to also be valid.

Add a restrict algorithm

I am missing a restrict functionality, where one can restrict one variable to a given truth value.
The var_select method is not doing that, as it computes B \land (var <=> val).

A couple of easy examples:

a xor b with a -> true, b -> false should result in true
a or b with a -> false should result in b
a and b with a -> true, b -> true should result in true
a and b with a -> false, b-> false should result in false

Use `AsRef` to accept both `Bdd` and `&Bdd` in core methods

In many methods for Bdd manipulation, we often expect a reference to another Bdd (like the and function). This makes the API slightly weird to use because we have to put & everywhere. However, there is a useful built in trait AsRef that we can implement in Bdd that can be used for this easily.

Faster projection/selection/restriction algorithms.

At the moment, projection, selection, restriction and picking are implemented using standard logical operations (albeit with minor optimisations using fused flip-ops). However, it should be possible to design specialised algorithms for these problems which should significantly outperform current solutions.

Python bindings.

We should provide python bindings that would fit the style of https://github.com/tulip-control/dd and primarily benchmark the library using these bindings instead of the built-in benchmarks (i.e. there can be microbenchmarks in rust, but comparison with other libraries should be done via Python).

Implement `From BddValuation for BddPartialValuation`

It should be clear what needs to be done here.

Ensure uniform sampling for randomized operations

At the moment, the distributions with which valuations are sampled in randomized operations depends on the structure of the BDD. Instead, we should adjust the algorithm to ensure that the sampling is uniform across the valuations of the BDD.

Currently, the randomized operations use a "fair coin" to pick a variable value (i.e. each pick is equally likely). This over-represents results from branches with few valuations. To make this "uniform" across all valuations, the coin should be biased based on the number of valuations in each branch. We can easily compute this number of valuations (it's the same algorithm as for the total cardinality number). We just need to bias the coin correctly based on this number.

Nevertheless... it could be interesting to also provide different sampling strategies. Because sometimes, we don't actually want to be uniform. For example, we might rather sample from the more "extreme" values...