Coder Social home page Coder Social logo

gltsdiff's People

Contributors

dhendriks avatar wytseoortwijn avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

gltsdiff's Issues

Get rid of getLhs/getRhs methods in structural comparison

All scorers, matchers, etc. have separate getLhs/getRhs methods. Why do these need to be stored in fields? Wouldn't it make more sense to pass these as arguments to the compute methods?

Think about this, and if needed, refactor the code accordingly.

Rethink representation and handling of initial states

Currently, LTS is the base class of all representations (essentially gLTS from the paper we're writing). In the theory, we can have any types of state properties, but the implementation requires initial states are part of that type.

We should generalize it further, to not have initial states in the most basic form, and add them for LTSs (and by inheritance also NFA, etc).

This will require that all places where we use initial states, this becomes configurable as well. This includes scorers, writers, etc.

Scorers could default for LTSs to include the initial states, and consider state combinability, and the surrounding network of transitions, etc. But not in the most basic form, initial states are not part of this.

Some matchers, like the brute force matcher, don't consider initial states yet, but should do so.

Add more unit tests

After contributing our existing tests (#12), we should add more of them to get better coverage.

Optional combiner's combinability relation is not an equivalence relation (it is not transitive)

If combinable(x, -) and combinable(-, y), then by transitivity also combinable(x, y) must hold. But that depends on combinability as defined for the type of x and y. We should change the optional combiner to always allow combining optional values, but requiring that the concrete values are then always combinable. That is, for an optional combiner of Optional<T>, the combiner for T must have a combinable notion that always returns true.

Rethink the property inclusion relation

The Combiner::includes method got removed from Combiner as part of #4, and was moved to the post processing rewriters instead. We should think of what we want to do with the inclusion relation. We could make it a separate operator, like combiners, hiders, etc. We could completely remove it, if it's only useful for the post-processing rewriters (which at the moment may not make effective use of such a relation). Perhaps it fits somewhere else somehow?

Probably first we should have a look at the mathematics of the post-processing rewriters, before deciding what to do here.

Add frontend for ease of use

The current examples require quite a bit of code to set everything up. Ideally, we find a way to make that easier and shorter.

Improve JDT settings of gLTSdiff plugin

I spotted some mistakes that a good warnings configuration will spot. Also, a bunch of files have trailing whitespace, that is easily eliminated with cleanup on save actions. Etc. I propose to use the ones from Eclipse ESCET, which I'm personally quite content with.

Rethink `glts` package structure

Currently, the glts package has a bunch of classes. If we add more representations, especially ones with many classes like EFAs, we may want more structure. For instance:

  • glts
  • glts.automaton
  • glts.automaton.diff
  • glts.automaton.efa

Or so.

Reconsider what we think is 'big'

WalkinshawGlobalScorer indicates big LTSs with a few dozes states. Since we become more and more efficient, what is big changes over time. Maybe change it to bigger LTSs with more states or so?

Maybe we have similar statements elsewhere?

Introduce disentangle operation

The behavioral comparison results sometimes contain "tangle states". These are states colored black that only have red and green incoming/outgoing transitions. Such tangle states are strange, in the sense that they clearly don't match (otherwise they would have black incoming/outgoing transitions as well), but are nevertheless still matched. For this reason, a disentangle operation should be introduced that split such tangle states into a red and green state.

This issue proposes to:

  • Implement a disentangle operation.
  • Implement some unit/regression tests to make sure nothing breaks later.
  • Further validate the operation on real cases that have tangles.

Scorer vs scoring vs scores inconsistencies need resolving

Some time ago we changed 'scoring' to 'scorer' for the algorithms that produce a 'scoring'. But we still refer to 'scorers' as 'scorings' in various places, at least in the matchers package. For instance, some examples from ScoringMatcher:

  • SimilarityScorer<S, T, U> scoring -> SimilarityScorer<S, T, U> scorer`
  • scoring The algorithm -> scorer The algorithm
  • SimilarityScorer<S, T, U> scoring -> SimilarityScorer<S, T, U> scorer

We should check the entire matchers package, and the rest of the whole repo.

Not sure whether we want 'scores' or 'scoring' or 'scoring function'. But all are used for the results that scorers produce. We may also make that consistent.

Add a way to disable scoring optimizations in Walkinshaw global scorer

An optimization to the Walkinshaw scoring algorithms will be introduced by #2. However, there is not a clean way to disable any of these pre-processing optimizations, other than manually changing a few lines of code in the WalkinshawGlobalScorer. There should be a better (more configurable) way to disable/enable these optimizations, possibly via a Boolean flag (or by providing such functionality as a lambda, etc.)

Improve documentation

Improve the README file:

  • Consider using some of the text of the paper being written for the introduction.
  • A bit more introduction in the readme would be helpful for potential new users, I think.
  • Explain what we have in terms of representations, combiners, algorithms, etc.
  • Explain the structure of the library.
  • Explain how to use the library, with examples for different property types, post-processing, etc.
  • Explain how to extend the library, with examples.
  • Refer to our own paper, once that is published, rather than to Walkinshaw's paper.

Probably it shouldn't all be in the readme, but refer to some sub-documentation files. Maybe we need a docs dir or so?

Perform a first release

Release:

  • In GitHub (if the repo is public or private)
  • On Maven Central (if the repo is public)

Brute force matcher could favor tangles

The BruteForceMatcher tries all matchings, considering only transitions for scorings, not states. In case of a tangle, the tangle has the same number of red and green transitions as a non-tangle version. Since it ignores matchings with the same score as the current best matching, it depends on the order in which tangle vs non-tangle matchings are encountered, which one will be chosen. We could improve this by detecting if the scores are equal, which matching has less tangles, and considering that one better. Essentially, we first score/compare on number of red/green, and secondly on number of tangles.

Allow using predefined scores and matches

If you know certain matches, you want them to be taken into account. This avoids having to compute all alternatives, score them, etc. Note that you want the matches to be taken into account when scoring (for score-based matchers). This differs from omitting the matches from the input, as then those states are not present in the input, as are the transitions to/from those states, which would influence the scores of other state pairs.

Similarly, if you know the scores of some pairs of states, you want to be able to specify them. These state pairs then don't have to be scored. But, they should still be taken into account when scoring other state pairs.

Resolve inconsistencies in the Walkinshaw scoring system

Strictly speaking, for uncombinable transitions, LTSDiff counts unique symbols (not number of transitions), and we count properties (not equivalence classes of uncombinable transitions/properties). Is that on purpose? Note that LTSDiff does use all possible combinations of common transitions, but not for uncommon ones, which could be considered inconsistent.

This should be changed. We want to update the encoding of the equation proposed by Walkinshaw et al., to be only on the level of transitions. (Now the equations seem to consider a combination of transitions and transition properties.) Moreover, we may consider using lists instead of sets, in considering transitions, to also counts for any duplicates if that's useful.

Adding extra assertion in `DefaultMerger`

Currently the DefaultMerger has a comment saying "Note: all matched state pairs have combinable state properties (otherwise there is a bug)". It would be better to actually check this, via an extra assertion.

Change Commons Math version to 3.5.0

We depend on 3.6.1, but colleagues are using older versions, and gLTSdiff compiles fine against that. We can change the minimum required version to 3.5.0.

Resolve non-determinism in matchers

It seems the output by gLTSdiff is sometimes non-deterministic. After some investigation it seems that BruteForceMatcher and WalkinshawMatcher use Collectors.toMap(...), which gives a collector that collects in a HashMap. We should make a version that writes to a LinkedHashMap instead.

Making DynamicMatcher more flexible

Currently DynamicMatcher is less flexible than DynamicScorer, since the former has a fixed implementation determining whether certain thresholds are being met, while the latter is not fixed. Moreover, they are inconsistent. They should be made consistent.

Configure Checkstyle

In issue #13, a development environment with Checkstyle is configured. Configure Checkstyle such that it is actually used.

Add 'toString' to 'State' class

I propose to add:

    @Override
    public String toString() {
        return Integer.toString(id);
    }

This allows debugging, to get for instance the following output:

bestScoringPairs=[[[0, 0], 0.48419500145306593], [[1, 1], 0.48155090574445414]]
landmarks=[[0, 0], [1, 1]]
matches=[[0, 0], [1, 1]]

Without the toString, you get Object.toString output rather than the integer numbers, which is not helpful at all.

Get rid of 'enclosed'

It is in JavaDocs, etc. Most are gone, but some remain. E.g., in the State class, the enclosed state property can become the state's state property. Etc.

Redesign implementation of Walkinshaw scorers

Currently the implementations of WalkinshawGlobalScorer and WalkinshawLocalScorer are centered on how the computation is performed (i.e., either as a refinement operation, or by solving a system of linear equation). As a result, there is a lot of unnecessary duplication between them. Both these scorers effectively aim to calculate the same set of equations (as described in Walkinshaw's article), but in different ways.

To make matters worse, the outcome of #2 will essentially be a pre-processing step to already statically determine as many similarity scores as possible, which would introduce a third way to compute these equations. Moreover, #15 proposes to calculate different equations, based on whether or not initial state information is available in the input gLTSs. This complicates the implementation even more.

I think there are ways to implement all this more elegantly. As an alternative design, we could for example introduce a WalkinshawFormula class, that contains all information of a similarity scoring formula, as described by Walkinshaw's article (i.e., containing a number of unknown variables, some coefficient like the attenuation factor, some constant numerator, some constant denominator, etc.). A set of such formulae would then represent an equation system, as Walkinshaw proposes it.

Secondly, we could have WalkinshawFormulaBuilders, that construct formulas in a particular way. For example, we could have a GLTSWalkinshawFormulaBuilder, that construct formulas as described in the paper, and a LTSWalkinshawFormulaBuilder that includes initial state information, etc. Other builders can be thought of, and making custom builders would be possible.

Lastly, we could have different WalkinshawSolvers, e.g., a local and global one.

This design would split the problem to compute, from the computation itself, leading to a cleaner design. Additionally, we could implement simplification procedures, to simplify the formulae directly. In this way, the "zero-scoring optimization" / pre-processing ideas from #2 can be implemented in a much cleaner way.

Move combinability check in Walkinshaw matcher for better performance

Currently WalkinshawMatcher performs a combinability check in the identifyLandmarks function, before adding any state pair to candidates. This check can be moved to the getScorePairs function, which might be a small improvement since then incompatible pairs will be filtered out a bit earlier.

Improve performance of structural comparison scorers

Currently the implementation of the Walkinshaw (local/global) scorers consider every possible pair of (LHS, RHS)-states. The Walkinshaw global scorer even constructs an N*N matrix for encoding the linear system, with N the number of such state pairs.

However, for some state pairs the similarity score can already be determined without e.g. having to solve a linear equation system. For example, states that disagree on state acceptance will never be merged onto one another, and therefore the score can already be determined statically. The same goes with state pairs that do not have common incoming/outgoing transitions (depending on whether a forward or backward score is computed).

The proposal is to predetermine all similarity scores that can already be determined statically, and use those to reduce the computation time of the more expensive operations like solving systems of linear equations.

Improve merging of transitions

Currently the DefaultMerger is merging combinable transitions one-by-one, while it would be conceptually cleaner and implementation-wise nicer to merge them collectively (i.e., construct the set of all combinable transitions, and combine them into a single combined transition).

LTSStatePropertyCombiner should be equality combiner

Currently the LTSStatePropertyCombiner is an 'or' combiner. It should be an equality combiner. For LTSs, there is no diff kind, so it does not make sense to combine initial states with non-initial states. We should check state combiners as well.

Move Combiner::includes to post processing operations

Currently combiners have an includes method, that intuitively determines if all information from one combinable property is contained in another. However, conceptually this method doesn't properly fit as a member of Combiner. Instead, it should move to the implementation of the post processing operations.

More technically 'includes(e1,e2)' is now fixed to be 'e2 = c(e1, e2)'. However, this may not always be the case. Sometimes it may be desired to define 'includes' differently, for example as 'c(e1, e2) = e1', depending on the combiner. So this should be configurable. And it should be configurable on the level of post processing operations.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.