tno / gltsdiff Goto Github PK
View Code? Open in Web Editor NEWGeneralized LTS differencing
License: MIT License
Generalized LTS differencing
License: MIT License
All scorers, matchers, etc. have separate getLhs/getRhs methods. Why do these need to be stored in fields? Wouldn't it make more sense to pass these as arguments to the compute methods?
Think about this, and if needed, refactor the code accordingly.
Current, getStates
exists, but getTransitions
does not.
Currently, LTS
is the base class of all representations (essentially gLTS
from the paper we're writing). In the theory, we can have any types of state properties, but the implementation requires initial states are part of that type.
We should generalize it further, to not have initial states in the most basic form, and add them for LTSs (and by inheritance also NFA, etc).
This will require that all places where we use initial states, this becomes configurable as well. This includes scorers, writers, etc.
Scorers could default for LTSs to include the initial states, and consider state combinability, and the surrounding network of transitions, etc. But not in the most basic form, initial states are not part of this.
Some matchers, like the brute force matcher, don't consider initial states yet, but should do so.
After contributing our existing tests (#12), we should add more of them to get better coverage.
If combinable(x, -)
and combinable(-, y)
, then by transitivity also combinable(x, y)
must hold. But that depends on combinability as defined for the type of x
and y
. We should change the optional combiner to always allow combining optional values, but requiring that the concrete values are then always combinable. That is, for an optional combiner of Optional<T>
, the combiner for T
must have a combinable
notion that always returns true
.
The Combiner::includes
method got removed from Combiner
as part of #4, and was moved to the post processing rewriters instead. We should think of what we want to do with the inclusion relation. We could make it a separate operator, like combiners, hiders, etc. We could completely remove it, if it's only useful for the post-processing rewriters (which at the moment may not make effective use of such a relation). Perhaps it fits somewhere else somehow?
Probably first we should have a look at the mathematics of the post-processing rewriters, before deciding what to do here.
The current examples require quite a bit of code to set everything up. Ideally, we find a way to make that easier and shorter.
See here:
// The 'leftTarget' and 'rightTarget' must have combinable properties.
if (!statePropertyCombiner.areCombinable(leftSource.getProperty(), rightSource.getProperty())) {
return false;
}
I spotted some mistakes that a good warnings configuration will spot. Also, a bunch of files have trailing whitespace, that is easily eliminated with cleanup on save actions. Etc. I propose to use the ones from Eclipse ESCET, which I'm personally quite content with.
We need a way to be able to set up a development environment for people that want to use and/or develop gLTSdiff.
Currently, the glts
package has a bunch of classes. If we add more representations, especially ones with many classes like EFAs, we may want more structure. For instance:
glts
glts.automaton
glts.automaton.diff
glts.automaton.efa
Or so.
Makes it easier to read/review/understand the code.
Currently, examples
is not a source folder. FeatureModelTest
does not compile, which is now hidden as it is not compiled.
WalkinshawGlobalScorer
indicates big LTSs
with a few dozes
states. Since we become more and more efficient, what is big changes over time. Maybe change it to bigger LTSs
with more
states or so?
Maybe we have similar statements elsewhere?
Currently, it is hard-coded. Allow to configure it.
The behavioral comparison results sometimes contain "tangle states". These are states colored black that only have red and green incoming/outgoing transitions. Such tangle states are strange, in the sense that they clearly don't match (otherwise they would have black incoming/outgoing transitions as well), but are nevertheless still matched. For this reason, a disentangle operation should be introduced that split such tangle states into a red and green state.
This issue proposes to:
The WalkinshawMatcher
has several fields that can't be set from the constructor, like landmarkThreshold
and landmarkRatio
. Change this class to allow setting these fields. (Perhaps also at other classes?).
Some time ago we changed 'scoring' to 'scorer' for the algorithms that produce a 'scoring'. But we still refer to 'scorers' as 'scorings' in various places, at least in the matchers
package. For instance, some examples from ScoringMatcher
:
SimilarityScorer<S, T, U> scoring
-> SimilarityScorer<S, T, U> scorer`scoring The algorithm
-> scorer The algorithm
SimilarityScorer<S, T, U> scoring
-> SimilarityScorer<S, T, U> scorer
We should check the entire matchers
package, and the rest of the whole repo.
Not sure whether we want 'scores' or 'scoring' or 'scoring function'. But all are used for the results that scorers produce. We may also make that consistent.
An optimization to the Walkinshaw scoring algorithms will be introduced by #2. However, there is not a clean way to disable any of these pre-processing optimizations, other than manually changing a few lines of code in the WalkinshawGlobalScorer
. There should be a better (more configurable) way to disable/enable these optimizations, possibly via a Boolean flag (or by providing such functionality as a lambda, etc.)
Make it easy for users to render the generated dot files.
Eclipse warns that it is missing.
The re-assignment of state identifiers in LTS::removeState
is done incorrectly. This should be fixed.
Improve the README
file:
Probably it shouldn't all be in the readme, but refer to some sub-documentation files. Maybe we need a docs
dir or so?
Release:
The BruteForceMatcher
tries all matchings, considering only transitions for scorings, not states. In case of a tangle, the tangle has the same number of red and green transitions as a non-tangle version. Since it ignores matchings with the same score as the current best matching, it depends on the order in which tangle vs non-tangle matchings are encountered, which one will be chosen. We could improve this by detecting if the scores are equal, which matching has less tangles, and considering that one better. Essentially, we first score/compare on number of red/green, and secondly on number of tangles.
If you know certain matches, you want them to be taken into account. This avoids having to compute all alternatives, score them, etc. Note that you want the matches to be taken into account when scoring (for score-based matchers). This differs from omitting the matches from the input, as then those states are not present in the input, as are the transitions to/from those states, which would influence the scores of other state pairs.
Similarly, if you know the scores of some pairs of states, you want to be able to specify them. These state pairs then don't have to be scored. But, they should still be taken into account when scoring other state pairs.
Strictly speaking, for uncombinable transitions, LTSDiff counts unique symbols (not number of transitions), and we count properties (not equivalence classes of uncombinable transitions/properties). Is that on purpose? Note that LTSDiff does use all possible combinations of common transitions, but not for uncommon ones, which could be considered inconsistent.
This should be changed. We want to update the encoding of the equation proposed by Walkinshaw et al., to be only on the level of transitions. (Now the equations seem to consider a combination of transitions and transition properties.) Moreover, we may consider using lists instead of sets, in considering transitions, to also counts for any duplicates if that's useful.
Currently the DefaultMerger
has a comment saying "Note: all matched state pairs have combinable state properties (otherwise there is a bug)". It would be better to actually check this, via an extra assertion.
We depend on 3.6.1, but colleagues are using older versions, and gLTSdiff compiles fine against that. We can change the minimum required version to 3.5.0.
It seems the output by gLTSdiff is sometimes non-deterministic. After some investigation it seems that BruteForceMatcher
and WalkinshawMatcher
use Collectors.toMap(...)
, which gives a collector that collects in a HashMap
. We should make a version that writes to a LinkedHashMap
instead.
Currently DynamicMatcher
is less flexible than DynamicScorer
, since the former has a fixed implementation determining whether certain thresholds are being met, while the latter is not fixed. Moreover, they are inconsistent. They should be made consistent.
In issue #13, a development environment with Checkstyle is configured. Configure Checkstyle such that it is actually used.
LTSUtils
.hasCommonIncoming
/OutgoingTransitions
could be sped up, as one don't need to compute all incoming/outgoing transitions (it is possible to short circuit).
I propose to add:
@Override
public String toString() {
return Integer.toString(id);
}
This allows debugging, to get for instance the following output:
bestScoringPairs=[[[0, 0], 0.48419500145306593], [[1, 1], 0.48155090574445414]]
landmarks=[[0, 0], [1, 1]]
matches=[[0, 0], [1, 1]]
Without the toString
, you get Object.toString
output rather than the integer numbers, which is not helpful at all.
It is in JavaDocs, etc. Most are gone, but some remain. E.g., in the State
class, the enclosed state property
can become the state's state property
. Etc.
Currently the implementations of WalkinshawGlobalScorer
and WalkinshawLocalScorer
are centered on how the computation is performed (i.e., either as a refinement operation, or by solving a system of linear equation). As a result, there is a lot of unnecessary duplication between them. Both these scorers effectively aim to calculate the same set of equations (as described in Walkinshaw's article), but in different ways.
To make matters worse, the outcome of #2 will essentially be a pre-processing step to already statically determine as many similarity scores as possible, which would introduce a third way to compute these equations. Moreover, #15 proposes to calculate different equations, based on whether or not initial state information is available in the input gLTSs. This complicates the implementation even more.
I think there are ways to implement all this more elegantly. As an alternative design, we could for example introduce a WalkinshawFormula
class, that contains all information of a similarity scoring formula, as described by Walkinshaw's article (i.e., containing a number of unknown variables, some coefficient like the attenuation factor, some constant numerator, some constant denominator, etc.). A set of such formulae would then represent an equation system, as Walkinshaw proposes it.
Secondly, we could have WalkinshawFormulaBuilder
s, that construct formulas in a particular way. For example, we could have a GLTSWalkinshawFormulaBuilder
, that construct formulas as described in the paper, and a LTSWalkinshawFormulaBuilder
that includes initial state information, etc. Other builders can be thought of, and making custom builders would be possible.
Lastly, we could have different WalkinshawSolver
s, e.g., a local and global one.
This design would split the problem to compute, from the computation itself, leading to a cleaner design. Additionally, we could implement simplification procedures, to simplify the formulae directly. In this way, the "zero-scoring optimization" / pre-processing ideas from #2 can be implemented in a much cleaner way.
Currently, there is no newline after the last }
.
Currently WalkinshawMatcher
performs a combinability check in the identifyLandmarks
function, before adding any state pair to candidates
. This check can be moved to the getScorePairs
function, which might be a small improvement since then incompatible pairs will be filtered out a bit earlier.
It combines two properties to a fixed property.
Currently the implementation of the Walkinshaw (local/global) scorers consider every possible pair of (LHS, RHS)-states. The Walkinshaw global scorer even constructs an N*N matrix for encoding the linear system, with N the number of such state pairs.
However, for some state pairs the similarity score can already be determined without e.g. having to solve a linear equation system. For example, states that disagree on state acceptance will never be merged onto one another, and therefore the score can already be determined statically. The same goes with state pairs that do not have common incoming/outgoing transitions (depending on whether a forward or backward score is computed).
The proposal is to predetermine all similarity scores that can already be determined statically, and use those to reduce the computation time of the more expensive operations like solving systems of linear equations.
We can look at JavaBDD for how to add this.
We have tests, but have not yet added them to this open source repo.
Blocked by #13
Currently the DefaultMerger
is merging combinable transitions one-by-one, while it would be conceptually cleaner and implementation-wise nicer to merge them collectively (i.e., construct the set of all combinable transitions, and combine them into a single combined transition).
Currently the LTSStatePropertyCombiner
is an 'or' combiner. It should be an equality combiner. For LTSs, there is no diff kind, so it does not make sense to combine initial states with non-initial states. We should check state combiners as well.
Currently combiners have an includes
method, that intuitively determines if all information from one combinable property is contained in another. However, conceptually this method doesn't properly fit as a member of Combiner
. Instead, it should move to the implementation of the post processing operations.
More technically 'includes(e1,e2)' is now fixed to be 'e2 = c(e1, e2)'. However, this may not always be the case. Sometimes it may be desired to define 'includes' differently, for example as 'c(e1, e2) = e1', depending on the combiner. So this should be configurable. And it should be configurable on the level of post processing operations.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.