adl / hoaf Goto Github PK

View Code? Open in Web Editor NEW

13.0 13.0 2.0 2.79 MB

Hanoi Omega-Automata Format

Makefile 10.16% TeX 56.67% CSS 15.60% Perl 9.81% HTML 7.77%

hoaf's People

Stargazers

Watchers

Forkers

strejcek tobias-wilfert

hoaf's Issues

Support for non-AP alphabets

In most cases I use automata whose alphabet is not the powerset of some set.
Hence in HOA my transitions look like [0 & !1 & !2 & !3 & ...]

Jan Strejcek already explained me that I can use aliases for large formulas. This already reduces the file size drastically but basically only shifts the problem from the body to the header.

My main application is an automata-based approach to software verification. Without optimizations we have there one letter in the alphabet for each statement in the program that we want to verify. Even with optimizations, we often have hundreds of letters in our alphabet.

Negation in accepting conditions?

We say that labels are Boolean formulas over AP. And labels can contain negation.
At the same time, we say that accepting condition is a Boolean formula over F(S_i),I(S_i),... . But this time, negation is not allowed. We should either allow negation in acceptance condition or call it "positiove Boolean formula" instead.

Figures referenced from examples in README.md are not working

e.g. https://github.com/adl/hoaf/blob/master/figures/aut1.svg

mandatory tool version number?

I suggest to change

"tool:" STRING STRING

"tool:" STRING STRING?

to make the version number optional.

better encoding for parity acceptance

I'm implementing some support for generating and recognizing parity acceptance conditions, and I don't like the encoding currently suggested in the specifications:

acc-name: parity min even 5
Acceptance: 5 Inf(0) | (Fin(0)&Fin(1)&Inf(2)) | (Fin(0)&Fin(1)&Fin(2)&Fin(3)&Inf(4))

I would prefer this to be

acc-name: parity min even 5
Acceptance: 5 Inf(0) | (Fin(1)&Inf(2)) | (Fin(1)&Fin(3)&Inf(4))

The two formulas are logically equivalent, but the second

is an irredundant sum of products,
does not use the same set number in both Fin and Inf primitves.

In particular, I have found the latter point to be important in several algorithms. I can think of at least three algorithms in Spot where either I have special cases to handle acceptance with sets that are both used as Inf and Fin, or the algorithm is simply requires disjoint Fin/Inf numbers as a prerequisite. (My typical simple "horror case" is Acceptance: 2 (Fin(0)&Inf(1)) | (Inf(0)&Fin(1)) which basically encodes Inf(0) xor Inf(1).)

What do you think? Can I change the parity examples to use the irredundant encoding?

No clear way to specify dead-end states

The documentation states:

If one state has no label, and no labeled edges, then there should be exactly $2^a$ edges listed, where $a$ is the number of atomic propositions. In this case, each edge corresponds to a transition, with the same order as in ltl2dstar. If a transition $t$ is the $i$-th transition of a state (starting with 0), then the label of $t$ can be deduced by interpreting $i$ as a bitset. The label is a set of atomic propositions such that the atomic proposition $j$ is in the set if the $j$-th least significant bit of $i$ is set to 1.

So it seems if one would like to specify a state with no exit arcs, the seemingly obvious code with no state label and no outgoing arcs is in fact incorrect, as 2^a arcs are necessary. So it looks like the only way to do that is to have the label [f] on the state, which is awkward for automata where only transitions carry labels.

automata without initial state

The format states that it is not necessary to specify any initial state. Do we want to support automata without any initial state? Or do we plan to add some rule like "if no initial state is specified, than 0 is initial?" Or is it there just for the case of empty automaton without any state ate all?

byte-order-mark

Joachim points out that if the comments or strings in our format are allowed to contain utf8 character, the file (or stream) should also be allowed to contain some BOM at the very beginning, and that this BOM should be propagated to the output.

I suggest we ignore this for now.

chaos in labels

The current document admits that one state of the automaton can be state-labeled, another one can have labels on outgoing transition, and another can have implicit labels (à la ltl2dstar). Is this what we want. I don't think so.

We should clearly state that the whole automaton is either state-labelled or explicit transition-labelled or implicit transition-labelled.

Do you agree?

dealing with automata without acceptance set

Sometimes (e.g., when translating any safety formula) we obtain an automaton without any acceptance set: in generalized Büchi acceptance it means that all words recognized by the automaton are accepting.

Presently the format does not allow for such empty acceptance, and I think this is a flaw. The Acceptance: line requires a non-empty acceptance-cond:

header-item ::= … | "Acceptance:" INT acceptance-cond
acceptance-cond ::= "I" "!"? INT
                 | "F" "!"? INT
                 | (acceptance-cond)
                 | acceptance-cond & acceptance-cond
                 | acceptance-cond | acceptance-cond

Should we allow Acceptance: 0 without the acceptance-cond part in this case? Or should we prefer something like Acceptance: 0 t? (with t for true, as in labels).

A related question is the definition of parametric acceptance names when the parameter is 0. Assuming Acceptance: 0 is allowed, do we agree it corresponds to the following to names?

acc-name: generalized-Buchi 0
acc-name: Streett 0

Should we introduce a dedicated name for Acceptance: 0? (e.g. acc-name: monitor.)

Also, should we allow those?

acc-name: generalized-co-Buchi 0 
acc-name: Rabin 0
acc-name: generalized-Rabin /* no digit */

Because of the disjunctive nature of the above three acceptance conditions, their empty variant should be equivalent to false, i.e., an automaton rejecting all words. Does it make any sense to support it?

Support for weighted automata

Should we make some room for representation of weighted omega-automata?

We could have some optional header

Weights: …

that somehow describes the semi-ring used for weights.

And allow initial weight to be specified with something like:

Start: 0 <0.2> 1 <0.3>

and transitions could have weights

[!0 & 1] 1 <0.2> {1}
[0 & !1] 0 <0.5> {1 2}

Notice that the above suggestion is not compatible the change suggested for alternation (#2) since multiple initial states are now represented with a state formula such as 0|1. It is also not clear to me how weights are used in alternating automata. I suspect that weights, if any, should appear in the state-formula syntax.

None of our tools support weights, so working on this does not seem a priority at the moment.

terminal property

We have weak and inherently weak (#22). When discussing with Joachim it seems it would make sense to have something like terminal. All guarantee properties can be expressed by terminal automata. A terminal automaton is an automaton in which all accepting states are true self-loops with no other successor. In practice all these states can be merged into one, so an efficient way to represent the accepting set of such an automaton in memory is just to store the number of the accepting state. If you check a (non-blocking) system against a terminal automaton, you can stop the verification as soon as you hit an accepting state.

It's not clear to me whether we should also have inherently terminal: i.e., a weak automaton in which all accepting SCCs are "complete" (in the sense that they accept any word from any state). This would be consistent with weak/inherently weak, but is there any point?

Conversion to BA format?

Are there any existing tools to convert HOA to the BA format?

examples / test cases

In #32 @kleinj wrote:

I would then as a next step extract the example automata in an examples subdirectory, so we can use them for testing parser implementations. Ok?

Sounds nice.

Maybe we can have a script that reads README.md and populate examples/ to keep them in sync?

grammar ambiguity

Jan Strejcek said:

I think that the format is ambiguous. If I write State: 2 (0|1) 3, is
the label a state-label or an edge-label?

I believe we discussed this in the bus in Hanoi. We have to fix the definition of state-name. Instead of

state-name       ::= "State:" INT DSTRING? label? acc-sig?

we could go with:

state-name       ::= "State:" label? INT DSTRING? acc-sig?

structural vs. language-dependent properties

Another suggestion from Joachim.

Most of the properties we list are structural (they depends on the structure of the automaton, e.g., deterministic) while some other might inherent to the language represented by the automaton (e.g. stutter-invariant). It could be useful to partition the set of properties in two to make it clear what is structural and was is language-dependent. This way, a tool processing an automaton without changing its language could propagate all the language-dependent properties that were already specified on input.

We discussed either using a prefix for property names, or (my preference) using two kind of properties: lines.

positive integers starting with 0

The current definition allows to write integers like 01 or 007. Do we care about the zeros or not. For example, has "Start: 007" the same meaning as "Start: 7"? We should make a comment about this anyway, I guess.

multiple transitions with same source/label/destination, but different acceptance

If we ignore alternation, the semantic we give for omega automata with transition-based acceptance assume that the set of transitions and the acceptance sets are subsets of Q*B(AP)*Q where B(AP) is the set of Boolean functions. However our syntax allows automata that cannot easily fit into this constraint.

Consider this:

/* ... */
Start: 0
AP: 1 "a"
Acceptance: 2 Inf(0) & Inf(1)
--BODY--
State: 0
[0] 0 {0}
[0] 0 {1}
--END--

this declare two transitions (0)-a->(0), each of them in a different acceptance set. But our semantics do not support distinguishing these transitions. Of course we could interpret this situation as a single transition that belongs to both sets: indeed if one of these transitions appear infinitely often in a run, the other transition can be used infinitely often as well. So this automaton is equivalent to the following:

/* ... */
Start: 0
AP: 1 "a"
Acceptance: 2 Inf(0) & Inf(1)
--BODY--
State: 0
[0] 0 {0 1}
--END--

(Shameless plug: Spot's autfilt --merge-transitions will do this simplification if you need it.)
However this reduction is only correct because we are using Inf acceptance.
For instance consider again the first automaton, but negating the acceptance condition:

/* ... */
Start: 0
AP: 1 "a"
Acceptance: 2 Fin(0) | Fin(1)
--BODY--
State: 0
[0] 0 {0}
[0] 0 {1}
--END--

The above non-deterministic automaton is accepting since we can decide to stop using one of the two transitions at some point. We cannot consider the two transitions as being a single transition that
belongs to both sets, otherwise the automaton would be non-accepting. If fact, because the sets are Fin-accepting, we could declare the (0)-a->(0) transition in their intersection if we want to avoid duplicate transitions.

Using intersection of Fin sets, and union for Inf sets fail when the same set number is used both with Fin and Inf. Consider for instance the same automaton with:

Acceptance: 2 (Fin(0)&Inf(1)) | (Inf(0)&Fin(1))    /* = Inf(0) xor Inf(1) */

I believe we have to change the semantics to support duplicate transitions in different acceptance sets to ensure that they work for any automaton we can represent with HOAF.

Here are two ideas for a possible fix:

Change Q*B(AP)*Q to something like Q*B(AP)*2^F*Q where F is a set of /acceptance numbers/. Then change the acceptance of a run to extract the set of acceptance numbers seen infinitely often and make sure this set is compatible with the acceptance formula.
Change Q*B(AB)*Q to something like Q*B(AB)*N*Q where N are just natural numbers to allow us to duplicate transitions, but are not used for any other purpose. In this case F can remain a set of subsets of transitions.

I like the second option better because it still makes it clear we think in terms of acceptance sets of transitions.

Lexical analysis / tokenizer

I stumbled upon some small potential issues for the lexical analysis. In the current version of the grammar specification, the following would be a valid acceptance condition:
Acceptance: 3 F0 & F 1 & F!2 & I2

The syntax without spaces (F0) however would normally be tokenized as an identifier by the lexer. We could either

require a space between F or I and the integer, making F and I reserved keywords (priority over generic identifier rule)
use a context-sensitive lexer (different tokenization after an Acceptance: token until the next token)
use some unique symbols, e.g., <>[] / []<>
allow identifiers as atoms in acceptence condition, such that the identifier "F0" will be interpreted as F 0 (ugly in the grammar)
...?

I would like to avoid requiring a context-sensitive lexer. If we make F and I keywords (option 1) then we would have to allow them in the miscellaneous headers. The same applies to 't' and 'f' as well, which would also be considered as keywords by the lexer.

generalized Streett?

I'm currently implementing a conversion from arbitrary acceptance to generalized Rabin and generalized Streett (with the goal of implementing an emptiness check of the latter). However I just realized that HOA defines generalized-Rabin but not generalized-Streett.

Turns out that the definition of generalized-Streett is not necessarily natural, because of what I (now) think is a design mistake in the format: in generalized-Rabin, Rabin, and Streett: Fin sets always come before Inf sets in clauses. As a consequence complementing Rabin 1 acceptance (Fin(0)&Inf(1)) gives Inf(0)|Fin(1) which is different from Streett 1 (= Fin(0)|Inf(1)): one additionally needs to rename the acceptance sets to obtain a Streett automaton when complementing an Rabin automaton.

So for generalized-Streett, one option is to propagate this flaw, and use

acc-name: generalized-Streett 3 2 1 0
Acceptance: 6 (Fin(0)|Fin(1)|Inf(2))&(Fin(3)|Inf(4))&Inf(5)

or we can use a proper dual of generalized Rabin:

acc-name: generalized-Streett 3 2 1 0
Acceptance: 6 (Inf(0)|Fin(1)|Fin(2))&(Inf(3)|Fin(4))&Inf(5)

This second option makes the code more elegant. For instance with this option I could implement to_generalized_streett(aut) as complement_acceptance(to_generalized_rabin(complement_acceptance(aut))).

However with this second option, it hurts that generalized-Streett 3 0 0 0 is not equal to Streett 3 :-(

on the semantics of parity acceptance

(This is my attempt to summarize an issue that Joachim reported to me by email.)

First, in a parity automaton every state is labeled by exactly one number and what max odd or min even means is unambiguous: the maximum (or minimum) of the numbers that occur infinitely often along a path has to be odd (or even). The same definition could be done with transition-based parity automata, that isn't an issue either.

The problem is that in HOA we do not define parity automata. We only define parity acceptance, and in our context, the numbers are set numbers. So we have to give to max odd or min even some semantics that work even if a state belongs to multiple sets or none. In particular the case were no set is visited infinitely often is troublesome, because the max and min do not exist, so it is hard to claim it to be odd or even.

We could for instance decide that max odd means "some acceptance set is visited infinitely often, and the maximum set visited infinitely often is odd". This is the view currently reflected in the specifications (but only implicitly, because of the examples given), and it gives the following encodings:

parity max odd 0: f
parity max odd 1: f
parity max odd 2: Inf(1)
parity max odd 3: Fin(2) & Inf(1)
parity max odd 4: Inf(3) | (Fin(2) & Inf(1))
parity max odd 5: Fin(4) & (Inf(3) | (Fin(2) & Inf(1)))
parity max odd 6: Inf(5) | (Fin(4) & (Inf(3) | (Fin(2) & Inf(1))))
parity max odd 7: Fin(6) & (Inf(5) | (Fin(4) & (Inf(3) | (Fin(2) & Inf(1)))))

But (this was Joachim's natural interpretation) we could also decide that it means "no acceptance set is visited infinitely often, or the maximum set visited infinitely often is odd". This gives the following encodings:

parity max odd 0: t
parity max odd 1: Fin(0) 
parity max odd 2: Inf(1) | Fin(0)
parity max odd 3: Fin(2) & (Inf(1) | Fin(0))
parity max odd 4: Inf(3) | (Fin(2) & (Inf(1) | Fin(0)))
parity max odd 5: Fin(4) & (Inf(3) | (Fin(2) & (Inf(1) | Fin(0))))
parity max odd 6: Inf(5) | (Fin(4) & (Inf(3) | (Fin(2) & (Inf(1) | Fin(0)))))
parity max odd 7: Fin(6) & (Inf(5) | (Fin(4) & (Inf(3) | (Fin(2) & (Inf(1) | Fin(0))))))

I believe these two semantics are equally valid, or maybe equally invalid...

In case of parity automata (i.e., with the guaranty that each state belong to exactly one set) the two views are equivalent (even for case 0, because the only parity automaton we can build without using any acceptance set is the empty automaton). So we basically have to choose what we want for the case where parity acceptance is used without that extra constraint.

What I do not like in these two semantic is the following. If we assume that some acceptance set has to be visited (first semantic) for all combinations of max/min, odd/even, we get

parity max odd 0: f
parity max even 0: f
parity min odd 0: f
parity min even 0: f

If we take the other semantic, we get

parity max odd 0: t
parity max even 0: t
parity min odd 0: t
parity min even 0: t

I find both unsatisfactory, because they show no duality between odd and even. Switching odd into even in a deterministic parity automaton gives the complement automaton, and I would like it if switching odd into even in a deterministic automaton with parity acceptance (I'm not writing "parity automaton" here!) would give the complement automaton as well. The problem is very obvious in the case 0 shown above, but it also exist in other cases, as pointed out in the last paragraph added by b524fba.

For this to work, we could decide for instance that max odd means "some acceptance set is visited infinitely often, and the maximum set visited infinitely often is odd", while max even means "no acceptance set is visited infinitely often, or the maximum set visited infinitely often is even". (Or we can switch the first part of this semantic if it matters.) What do you think?

Joachim also suggested adding a property: colored to denote the fact that a state (or transition) belongs to exactly one acceptance set. So a /parity automaton/ would be an automaton with parity acceptance and property: colored. I like this name.

add "controllable-AP" header

This is actually suggested in: https://arxiv.org/pdf/1912.05793.pdf

the definition of F as a set of sets sounds incorrect

In the formal semantics we have something like this

$F={S_0,S_1,…,S_{m−1}}$ is a finite set of acceptance sets.

where each S_i is a subset of transitions.

I think declaring F as a set is a mistake, because it implicitly implies that all S_i will be different, which is not the case in the format.

I suggest to rewrite this as

$F=(S_0,S_1,…,S_{m−1})$ is a tuple of $m$ acceptance sets.

Do you agree?

constraints between acc-name and Acceptance

Excerpt from the Acceptance Specifications section (emphasis is mine):

Even if acc-name: is used, the Acceptance: line is mandatory and should match exactly the patterns given in the following examples. This way the tools that ignore the acc-name: line have no suprise.
[...]
A generalized [Büchi] automaton with three acceptance sets can be defined with:
acc-name: generalized-Buchi 3
Acceptance: 3 Inf(0)&Inf(1)&Inf(2)

I don't remember why we decided to require an exact match. I don't see what could go wrong if the Acceptance: line uses a different, but logically equivalent formula:

acc-name: generalized-Buchi 3
Acceptance: 3 Inf(1)&Inf(2)&Inf(0)

I would like to suggest that we change "should match exactly" to "should be logically equivalent to".

A side-effect of this change is that we would accept any of those

acc-name: parity min even 5
Acceptance: 5 Inf(0) | (Fin(0)&Fin(1)&Inf(2)) | (Fin(0)&Fin(1)&Fin(2)&Fin(3)&Inf(4))

acc-name: parity min even 5
Acceptance: 5 Inf(0) | (Fin(1)&Inf(2)) | (Fin(1)&Fin(3)&Inf(4))

acc-name: parity min even 5
Acceptance: 5 Inf(0) | Fin(1)&(Inf(2) | Fin(3)&Inf(4))

While currently only one of these three should be allowed to have acc-name: parity min even 5.
(I would leave the last two examples in the specification, and not show the first one.)

The only drawback I see here is that it makes any validation of the correspondance between acc-name: and Acceptance: slightly harder. One way it can be done it is to convert acc-name: and Acceptance: as two separate BDDs, and then check them for equality. I'm currently implementing such a BDD-based check not for validation (currently Spot always ignore the acc-name: line) but to automatically name acceptance conditions in autfilt. If I detect some Acceptance: line is equivalent to parity min even 5, then I can add the corresponding acc-name: line, and depending on what we decide here, I can either leave the Acceptance: line as-is or I can standardize it so it become an exact match.

Not clear whether acceptance sets can be specified both on a set and some of its exit arcs

The documentation says:


The INT* used in acc-sig represent the acceptance sets the state or edge belongs to. Since we use transition-based acceptance, when acc-sig is used on a state to declare membership to some acceptance sets, it is syntactic sugar for the membership of all the outgoing transitions to this set. For instance State: 0 {1 3} would states that all transitions leaving state 0 are in acceptance sets 1 and 3.

It is not clear whether acc-sig can be specified on both the state and some of its exit arcs. There are several possibilities:

disallow this (and explicitly tell in the documentation)
allow this and then acc-sig on the state is united with those on the arcs; also, one may require the union to be disjoint.

using more acceptance sets than needed by the acceptance condition

Tomáš and Fanda recently found that when autfilt is fed an automaton with Acceptance: 1 t, it interprets it as Acceptance: 1 Inf(0). This looks bogus to me. I think the original acceptance condition states "the automaton will use one acceptance set, but this set is not used to decide acceptance of runs: all runs are acceptings".

We could also imagine something like Acceptance: 3 Inf(0)&Fin(2) not using the acceptance set 1 for the purpose of acceptance. But maybe {1} can be used for another purpose that does not affect the semantics, e.g., as a state or transition marker for debugging an algorithm.

So far, there is no discussion about this kind of acceptance in the specification of the format. I think we should either explicitely allow it or explicitely disallow it. Something like

An acceptance condition declaring $m$ sets need not actually use all of these sets.  
In this case the unused sets can be ignored if they appear in the body of the automaton.

An acceptance condition declaring $m$ sets must use all of these sets.

What do you think? I have a slight preference for allowing unused sets, but I've no problem with the other (and simpler) solution.

state numbers

Jan asked this in a mail:

Is it necessary to declare the states in the order 0..n-1?
The number is just a name. And do we really need to specify them all?
Even non-accpeting states without transitions?

I don't have any strong feelings about this.

I don't see any reason to force an order on the output.

States without outgoing transitions need to be discussed also for #5, but I'd rather list them explicitly: those should be seldom used in practice, so I'd prefer to be able to notice when they are used.

Edges or transitions?

In the format description, we always use the term "transition" except the part describing the automaton body, where the term "edge" is used consistently (probably because it is also used in the formal grammar defining the syntax). Do we want to unify the terminology or is it ok?

else-based eges

(Joachim's suggestion)

For the boolean-edge-label format one idea I had a while ago but never
researched in practice is the following. For a deterministic
automaton, the edge definitions are read in order. However, the
boolean formulas are interpreted with an implicit "successor state for
label is not yet defined". E.g,

State: 0
[1 & 2] 1
[t] 2

The second edge definition would be interpreted as
"t & !(1 & 2)". When the edge information is internally stored in a BDD I imagine
such a scheme could potentially result in a smaller encoding, as the
latter edge defs can use more "don't cares". So, this is a more
speculative idea :)

Jan S suggests that we implement some e (short for "else") constant that can be used as:

State: 0
[1 & 2] 1
[e]     2

and the semantic of e would be either the complement of every thing we have seen so far (my view), or the complement of all other transitions (his).

In all cases, this can be useful to complete automata, and this does not require changing the semantic of t.

unambiguous automata

I suggest to change the description of unambiguous automata to follow the formal definition (I don't like the word "recognized" in the current version). I suggest this:

unambiguous hints that, for each word, there is at most one accepting run of the automaton.

better typesetting

We need to prepare a better version of these specifications, with better rendering of maths, and pictures of automata to illustrate some example and the semantics. I guess we want both a document that can be printed, and that can be browsed on the web.

Should we use LaTeX? or a better flavor of MarkDown with support for maths?

Note that the repository currently has a Makefile that will compile README.md using pandoc to produce some HTML or PDF. In the current configuration, it is possible to use $...$ or \[...\] for inline or displayed maths using TeX syntax. It's just the Markdown flavor used at github that do not support that. We could continue using the current markdown file and just host the resulting html on github pages.

transition-based acceptance marks in alternating automata

I wonder if it was a mistake to consider that a transition with universal branching can have a single set of acceptance marks: I believe it would make more sense to have one such set per destination of universal transition.

Let me justify this statement by considering the dualization of an automaton, an operation that will be in the next major release of Spot.

A complete alternating automaton with state-based acceptance, can be complemented by simply complementing the acceptance condition, and dualizing the transition structures (i.e. existential choices become universal and vice-versa).

ltl2tgba -C FGa

ltl2tgba -C FGa | autfilt --dualize

The transformation is quite easy to implement, and works for any acceptance. It does not add new states, and the total number of destinations of transitions does not change.

Now let's transpose this algorithm to alternating automata with transition-based acceptance.
For instance here is a handcrafted automaton recognizing GFa:

Dualizing the above automaton as follows would be clearly incorrect, since it would accept any word.

What I'd really want is the following, that HOA cannot represent:

Note that in this case, pushing the acceptance marks to states 1 and 2 is possible without adding more states, so we could use the state-based dualization. However we can render this case more complex by making sure that states 1 or 2 are destination of other transitions with different marks.

Formalize the semantics of the version number

If we are going to release another version of the format, we should decide and specify how the version number is used.

I would suggest that the syntax for version X.Z should be a superset of the syntax for version X.Y whenever Z ≥ Y. This means a parser for v1.1 should be able to parse anything written for v1 (which I interpret as v1.0). But the opposite direction would not have to hold. The introduction of negated properties discussed in #55 can be done with the ! syntax if we bump the version number, and document that no-univ-branch is an obsolete name for !univ-branch.

I would increment X for changes that invalidate the actual syntax.

In parser for version X.Y, I would try to parse even files written with version X.Z for any Z, in case it is in the subset of the syntax I understand. But when the parser finds a syntax error in a file with Z>Y, I would additionally complain about the unsupported version number.

Add an end marker to support streams of automata?

I like pipes, so I would like to write tools that read streams of automata on input for batch processing. For instance imagine a filtering tool that reads a stream of HOA from its standard input, and output only the automata that respect some criteria.

For this it would be convenient to have a simple way to detect the end of an automaton. While the number of states is announced in the header, the number of transitions is not always known in advance. So currently if we had a stream of HOA inputs, the only way to detect the end of an automaton would be to reach EOF or the HOA: token that marks the start of the next automaton. Unfortunately it means that when we detect the end of some automaton, we have already started consuming a tokensfrom the next automaton. This makes it inconvenient to write a parser that returns the first automaton and that should be called again to parse the next automaton (because the same HOA: token would have to be used by the two calls).

An end marker would make things easier. I'm thinking something like

automaton ::= header "---" body "%%%"

but cleaner suggestions are most welcome.

ε-labelled transitions

For my construction I need to output an ω-automaton with ε-transitions and I want to use the HOA format. Unfortunately the specification does not mention ε-transitions explicitly and now I'm wondering how to properly do it. I talked to @JanKretinsky in person, but he was also unsure how to do it. It would be great, if the specification either explains why ε-transitions are not supported or how to express them.

DSTRING

DSTRING occurs in definition of Body of the automaton, but it's not defined in the document. Is it something standard of it is a typo and should be just STRING (defined at the beginning of the document)?

HOA logo

Joachim, Jan S. and I have been discussing logos over the past month. This is the current runner up. It's based on an original idea by Jan, rendered in TikZ by me, and including some adjustments from Joachim. I've just recently added an "horizontal" bar to the H, and prepared a small version that can be used as an icon (for file organizers).

Full version:

Icon version

What do you think?

fixing the semantic of the deterministic property

Here is a picture of the class of alternating automata, containing dual the subclasses of existential automata (a.k.a. non-deterministic automata) and universal automata.

Currently, HOA v1 uses no-univ-branch to indicate existential automata, and deterministic to indicate universal automata. This is unfortunate, because it makes no-univ-branch the dual of deterministic.

Since we have already renamed no-univ-branch into !univ-branch, I would like to propose that we rename the actual deterministic property as !exist-branch, and that we define deterministic as syntactic-sugar for !univ-branch !exist-branch.

For tools that only work with automata without universal branching, this does not change the semantic of deterministic. The change of semantic of deterministic only matters to tools working with alternating automata, but I don't think any of them used deterministic (at least ltl3ba -H1 and ltl3dra -H1 never seems to output deterministic).

Spot 2.3 uses deterministic in the specified sense for alternating automata, but I'd really like to change that as suggested, to get in line with existing literature. In particular when complementing an automaton by dualization (See for instance Section 1.6 of Löding's Diploma Thesis), I'd really like that an exist-branch !univ-branch automaton becomes a !exist-branch univ-branch automaton.

a C-like double-quoted string "(\\.|[^\\"])*"

Can you please explain the regular expression. I don't get the double backslashes and the meaning of .|[^"] - it seems to me that this matches any character, or not?

weak alternating

Joachim asked whether property weak (#22) made sense with universal branching.

The definition of SCC might not be very clear in the context of universal branching.

I believe that if we consider a branching transition as several arcs when defining SCCs, our definition of weak matches the established definition of weak alternating automata.

It might even make sense to add a very-weak property in case all SCCs have size 1.

Note that when defined on words (our case) weak alternating Büchi automata are as expressive as alternating Büchi automata and as Büchi automata. However weak Büchi automata are less expressive. So weak really has to be understood as a structural property of the automaton (and I will have to make efforts not to think obligation whenever I read weak).

Add Alphabet header for specifying an alphabet not based on atomic propositions

To support use cases where the alphabet of the automaton is not defined over a set of atomic propositions, it would make sense to allow an alternative way of specifying the alphabet:

header-item ::= ...
             | "Alphabet:" INT STRING*

Example:

Alphabet: 3 "a" "b" "c"

This header would be mutually exclusive to the AP header.

In such an automaton, the integer indizes would refer to the individual letters:

[0]     successor // letter a
[1 | 2] successor // letter b or c

t would be "any letter", f and conjunctions would lead to ignored transitions.

Areas that would need adaptions:

Trace semantics
Mapping of the transition labels (explicit/implicit) to the alphabet
Properties (perhaps?), if there are subtle dependencies on the AP-based alphabet
more?

Adding such an Alphabet header would not cause existing tools to misinterpret the automaton, as they should reject unknown headers with semantic implications (capitalized A).

support for additional state/transition properties

This was partly discussed in an email, and is also related to #3.

The question here is whether we want to support extra and optional properties for states or transitions. For instance it could be useful for a GUI to attach coordinates to states. Other tools might want to attach different kind of properties to states or transitions: for instance to partition the state into different SCCs, to add weights (cf. #3), to preserve the number of the original state in algorithms that transform automata, to add rendering information (like "draw in bold"), etc.

Currently, we only support such extra information with /* comments */. While this can be helpful for debugging, it is not a good way to carry information between tools.

We discussed a few options with Jan S and Fanda today. It seems clear that if this has to be extensible, the additional properties should be named somehow. E.g., if we need to attach a coordinate, we could write <x: 10, y: 20>. So the properties would be a list of key-value pairs. I suggest that the same rule that we have applied to headers be used here: properties that carry semantic information should be capitalized, so that tools may ignore only properties starting with a lowercase letter.

If we have such a key-value syntax for attaching properties to states and transitions, a natural question is whether we should not regard the guards [0&1] and membership to acceptance sets {0 1} as properties too, and use a uniform syntax for everything? We would have something like Label: 0&1, Acc: 0 1, etc. But then we have to decide how to properly represent properties that are not simple values (like 0&1, or 0 1) and repeating Label and Acc for every transltion appears to be bloat. We would rather keep the current structure, and add an optional <key:value...> if needed.

We suggest to get a first working version of the format without the additional properties, and then see if we need more and what. So far we have lived very well with less than we currently plan to support, and over-designing the format will just delay its implementation.

better wording in case States: is not given

Consider this (IMHO incorrect) input:

HOA: v1
Start: 6
Acceptance: 0 t
--BODY--
--END--

The specifications currently say:

States should be numbered from 0 to n−1 (where n is the value given by the States: header item if present)

and then

States should be numbered from 0 to n−1, may be listed in any order, but should all be listed (i.e., if the header has States: 10 then the body should have ten State: INT statements, with all numbers from 0 to 9).

but that does not really specifies what is expected if the optional States: header is missing, since n is then undefined. My suggestion would be to fix those sentences so that n denotes the value of the States: header if present, or the highest state number used as an initial state or as a destination of any transition.

add "semi-deterministic" property

For definition see e.g. https://en.wikipedia.org/wiki/Semi-deterministic_B%C3%BCchi_automaton

Personal notes

This issue just reminds that the document contains two perosnal notes (one starting with "Note:" and the other with "I (Alexandre)") that should be removed or reformulated before publication of the document.

Properties: Add syntactic support (or convention) for negation of properties

When specifying properties, it would be convenient to be able to specify the negation of properties (e.g. not deterministic, not stutter-invariant,...).

One option would be syntactic support, e.g.,

properties: !stutter-invariant

However, this would break backward-compatibility as such an automaton would not be a valid 'HOA: v1' automaton anymore: The ! is not in the current grammar. I don't know if that would be a big deal at the current level of tool support.

An alternative would be to consistently use a prefix such as currently used for no-univ-branch. There, not-... would probably be the most neutral from a language perspective, even though no-... or non-... might sound nicer for some of the properties.

more binary operators

(suggested by Joachim)

Supporting more binary operators (like → or ↔) could make some labels or acceptance conditions easier to write. If labels are expressed as BDDs, it might also make sense to have some kind of ITE operator.

"the jist"

I cannot find the word "jist" (used in th 2nd sentence of README.md) in any directory. And the meaning in wiktionary is strange in our context:
http://en.wiktionary.org/wiki/jist

adl / hoaf Goto Github PK

hoaf's People

Stargazers

Watchers

Forkers

hoaf's Issues

Recommend Projects

Recommend Topics

Recommend Org