nwf / dyna Goto Github PK
View Code? Open in Web Editor NEWDyna2 compiler and REPL
License: GNU Affero General Public License v3.0
Dyna2 compiler and REPL
License: GNU Affero General Public License v3.0
Ultimately by type checking, of course, but for now you could even do it by adding a conjunct of the form is_bool(X).
Speaking of scary and unhelpful error messages like the ones from #30, I wonder if it could be made a little less scary? Words like FATAL
and panic
and illegal
may trigger traumatic memories, and long hex strings have been known to trigger seizures.
DynaCompilerError:
FATAL: Encountered error in input program:
Unable to plan initializers for rule(s):
/home/jasoneis/.dyna/tmp/15766/f8207d93ff9ba46002ae41fd3e0d02c0840e84e5.dyna:1:1-/home/jasoneis/.dyna/tmp/15766/f8207d93ff9ba46002ae41fd3e0d02c0840e84e5.dyna:1:14
Also, some planner failures say "This is almost assuredly not your fault" and others don't: see the examples on #30. How about saying this in both cases?
This rule is syntactically valid, but this version of the
Dyna compiler doesn't yet know how to handle it, sorry.
(Perhaps it's not what you meant to write anyway?)
Right now
foo.
appears to translate as
foo |= true.
I think the aggregator should be :-
rather than |=
since this is part of the Prolog fragment.
Also, I'd like to be able to write
foo(I) for in_range(I).
to mean
foo(I) :- true for in_range(I).
Better yet, use Java interface to Dynasty to allow a live link as the chart changes.
But this may be a problem over ssh ...
Support query, vquery, and trace commands at the repl:
query foo(_,Y)
3 =* foo(...,"this")
4 =* foo(...,"that)
vquery foo(_,Y)
3 where Y="this"
4 where Y="that"
vvquery foo(length(X)+1,Y)
3 =* foo(9,"this") where X="sent", length(X)==8, length(X)+1==9, Y="this" % shows intermediate expressions (temp variables) bottom to top, left to right, uniqued. Except that the Result variable comes first and is formatted specially. Make sure to quote RHS of == when needed.
trace foo(1+bar(X),square(X)) % trace usually does a full trace, but it should take an optional integer argument that limits the depth of tracing. We also allow vvvquery, vvvvquery, etc., as synonyms for trace 1, trace 2, etc. trace 0 is just vvquery in a different format.
3 =* foo(8,16) where X=4
8 =* 1+bar(4)
7 =* bar(4)
bar(4) += baz(4,X')*bing(X').
5 =* baz(4,0)*bing(0) where X'=0
5 =* baz(4,0)
baz(4,0) += 3.
baz(4,0) += 2.
1 =* bing(0)
bing(0) := 1.
2 =* baz(4,123)*bing(123) where X'=123
...
16 =* square(4)
square(4) = 4*4
16 =* 4*4
Right now, mode failures produce nothing, not even an error message -- that's odd.
By the way, IIRC, writing foo(X) := 1 or foo(X) := X at the prompt doesn't do anything.
Why isn't that rule ok? I thought we allowed backward chaining for such rules (although we might only be able to run the query in mode +).
Should I create a "Dyna" organization? Is it then easy to transfer ownership of the repo/issues/etc.?
Maggie Eismeier writes on Piazza post number 30:
I'm still having trouble getting my grades file to open in Dyna. The last line it gives me when I try to open it is "Saving crash as eismeier:2013-07-03:23:14:59:8361.log ... cp: cannot stat `/home/eismeier/.dyna/crash.log': No such file or directory"
This looks like there's a bug in the crash handling?
You might also want to get in touch with her to find out why she can't open grades.dyna .
Some of the dynabases that are loaded from the dyna command line (or later, opened at the repl or obtained by the built-in $load function) might be filtered versions of other files such as .txt files. We need to be able to specify the filter.
Possibly we should have a default mapping from filename extension to filter?
Before implementing, can someone (Tim?) make a proposal for the command-line syntax and for what the filters should do?
Running "dyna cky.dyna" crashes. It says to report the error by emailing [email protected], and to attach /home/jason/.dyna/crash.log . (I'll do so.)
Unfortunately, I don't think our LSA students will know how to do this. They are probably using a webmail client, and they don't know how to use scp to get a file off the CLSP network to the local machine. Can we have our local Dyna installation just auto-report errors?
BTW, you are correct that if an attachment is needed, they should use [email protected], because RT allows attachments and git issue tracker doesn't. But I don't know whether you really want a separate issue tracker for user bugs or not.
Right now,
a=3.
gives a syntax error. It should at least be an "aggregator not supported" error.
But do we want to support the "=" aggregator?
We have some notes somewhere about how to do this while still being forgiving of overrides that happen at the command line.
Would like to allow "destructuring definitions" like the first line below:
[baz,[bing,quux]] += temp.
temp = [5,[6,7]].
or
pair[baz,pair[bing,quux]] += temp.
temp = pair[5,pair[6,7]].
The details and corner cases are currently written up on an email thread starting on 7/2/13, which should possibly be copied here.
I wrote there:
I like destructuring definitions even if we find a way to get away without them for argmax= and argmin=. They seem like a very natural parallel to unification. But they might not be as high-priority a feature if we don't need them for argmax= and argmin=.
See #29 for discussion of the argmax= and argmin= stuff.
I would prefer that the prompt not be :-
because that looks too much like part of the language itself. This has been bugging me for a while and is bad for several reasons.
.dyna
file really do start with :-
while others don't, and we don't want to get people into the mindset of thinking that all lines should start with :-
.:- :- mypragma
.:-
operator is also an infix aggregator and looks a lot like the very common :=
aggregator.:-
into a .dyna
file, it is hard to spot the error visually or, perhaps with the parser. It's also hard to correct with search-and-replace.I always imagined that >
would be a good choice:
This is a backward chaining example on Peano integers.
value(&z) := 0.
value(&s(X)) := 1+value(X).
I wasn't sure whether the planner could handle it yet, but here's what I got after a long backtrace:
RuntimeError: dictionary changed size during iteration
FATAL ERROR (RuntimeError): dictionary changed size during iteration
|
and &
are strict boolean operators. (No, not bitwise integer operators, I don't think.)
Like other normal operators, they return null if they have any null argument, and otherwise $error if they have any $error argument.
But we also need lenient versions that will treat null arguments sort of like false (three-valued logic).
I just fell into the trap myself by using a strict operator in
contains(Sentence, brown(Sentence,Position)).
query contains(S,"who") | contains(S,"what")
This returned only sentences S that contained both "who" and "what", because the |
expression fails if either argument is null. I should have written the query as
query contains(S,"who") || contains(S,"what")
so that null
values would not cause the expression to fail.
||
and ||=
behave like this:
And &&
and &&=
are the same but upside-down:
However, nulls are never passed to aggregators (an aggregator receives a set of non-null values). Hence the test "if any arguments are null" cannot succeed with &&=
, which only returns null if all of its arguments are null.
Thus, ||=
is just like |=
except that true
beats $error
rather than vice-versa. And &&=
is just like &=
except that false
beats $error
rather than vice-versa. The fact that errors can be masked is similar to the way that lazy ||
and &&
are often used in C and its descendants, e.g.,
if (eof() || getchar()==' ') ...
if (!eof() && getchar==' ') ...
except that our version doesn't care about the order of the arguments.
Note that &&
is a normal operator in that null arguments cause it to fail and return null. However, ||
is crucially not normal in this way! The right way to implement ||
is to first define ||=
and then say that foo || bar
is syntactic sugar for the temp item defined by
temp ||= foo.
temp ||= bar.
[5/29/17: Or I think we could just say that foo || bar
is the same as (||= foo; bar)
, couldn't we? Note again that foo && bar
can't be written in this way. These are however the same as (|= foo; bar)
and foo & bar
except that those have stricter error handling in this design, and it's not totally clear that we ever really want that strict error handling.]
Even on a14, Dyna sometimes takes a few seconds to start up, and at one point this morning it took about 10 seconds. At other times it's fast to start. What accounts for the startup time?
(Obviously, putting .dyna files on the command line can slow things down further while the agenda runs, but I'm not talking about that part.)
Or allow both.
I really find myself wanting prefix aggregators already, so that I don't have to introduce too many new names. This is particuarly true for queries, where they're just throwaways anyway:
query max= count(Word). % most frequent word
Here's a fancy example involving a nested pair of aggregators:
query argmax= [(+= 1 for brown(Sentence,Position)), Sentence].
% length and ID of longest sentence
This assumes that the prefix form of argmax=
returns a pair, as in the design suggested in this #29 (comment) comment.
Should work with "for X < 100" guard (and this fix should be shown in the documentation).
Startup message should warn: "Dyna will eventually be efficient again. This version X.Y.Z is an inefficient prototype intended to test a new version of the language and interface."
Repl commands should go into a log script, by default called log.dyna.
Such a script can be reexecuted by calling /usr/bin/dyna on it later.
If the user edits the log file during the repl session that is writing it, then we notice this at the start of the next command. Prior to executing the next command, we cold-start by executing the new version of the script from a blank slate (but suppressing any output).
When a new dyna session starts and its logfile already exists, probably rename it to a numbered backup, e.g., the old log.dyna --> log.3.dyna.
A student writes:
I accidentally put a period after attempting to retract a rule and it crashed. Doesn't seem like an error that should make it crash.
She gives a screenshot at https://piazza.com/class/hgfxeb95g9z4vi?cid=30 .
true
and false
should be parsed specially as literals, just like 123
would be.
They are not 0-ary functors.
Of course, you can make any string into a functor (this is convenient when inventing functors from the header line of a TSV file, for example). So you can do so with these strings too, but you'd have to quote them.
'true'(foo,bar)
'false'(foo,bar)
'123'(foo,bar)
At the moment, they seem to be functors. This means that the boolean true
unifies with the 0-ary structure 'true'
, which it shouldn't, just as the integer 123
shouldn't unify with the 0-ary structure '123'
. It also means that trace
sometimes prints &true
(which is as weird as printing &123
) -- see an example on #1.
See also #48, but this is a different issue.
As nwf said today:
Right now, the crux of a hyperedge is shown as a rectangle with a mysterious rule index. Shrink this down, perhaps using shape="invis".
Hovering over any part of the hyperedge (the crux or the arms, i.e., arrows) should bring up a tooltip that displays the text of the actual rule. Ideally, if you hover over one of the input arms, the corresponding term in the rule should be highlighted.
Also, "draw foo(X)" should produce a smaller hypergraph that shows only the nodes that match foo(X) plus their transecedents.
Right now results are sorted by an item's print name, giving the out-of-order results below. It would be nice to sort instead by a recursive comparison function on terms that understands how to compare numbers (and which compares on functor before arguments even if that functor is written as infix).
> query brown(0,Position)
"It" =* brown(0,0)
"was" =* brown(0,1)
"Barco" =* brown(0,10)
"!" =* brown(0,11)
"!" =* brown(0,12)
"among" =* brown(0,2)
"these" =* brown(0,3)
"that" =* brown(0,4)
"Hinkle" =* brown(0,5)
"identified" =* brown(0,6)
"a" =* brown(0,7)
"photograph" =* brown(0,8)
"of" =* brown(0,9)
This program should leave b
in an error state, but currently leaves it at 2
for some reason:
a(1) = 1.
a(2) = 2.
b := 0.
b := a(X).
It should be an error because the last rule to contribute any aggregands to b
(namely rule 4) contributes multiple aggregands, and there's no way to choose among them.
Until recently, I would have said that the following should also leave b
in an error state. But now possibly we should allow b
to be 2 in this case for consistency with the new version of =
that we're trying on #23. Thoughts?
a(1) = 2.
a(2) = 2.
b := 0.
b := a(X).
jasoneis@a14:~$ ls *5.dyna
anagram5.dyna subst5.dyna swap5.dyna swapadj5.dyna
jasoneis@a14:~$ dyna -i *5.dyna
usage: interpreter.py [-h] [-i] [--plan] [-o OUTPUT]
[--post-process [POST_PROCESS [POST_PROCESS ...]]]
[--load [LOAD [LOAD ...]]]
[source]
interpreter.py: error: unrecognized arguments: subst5.dyna swap5.dyna swapadj5.dyna
Saving crash as jasoneis:2013-07-03:17:38:09:32508.log ...
How should we support backpointers for extracting derivations?
> last_time max= Time for word(Time) != &eos.
DynaCompilerError:
Could not parse:
<repl> error: expected: "%",
"&", "*", "**", "+", ",", "-",
"->", ".", "/", "<", "<=", "=",
"==", ">", ">=", "^", "for",
"in", "is", "whenever",
"with_key", "|"
last_time max= Time for word(Time) != &eos. % repl line 0<EOF>
^
but it parses fine and works correctly if I swap the arguments to !=
.
Parenthesizing the !=
expression doesn't help.
At the moment we use FIFO.
For acyclic programs there is probably little advantage to straying away from topological order. Computing topological order in a Dyna program isn't totally obvious (to me at least) because we are unrolling the graph as we go.
Should we require integer priorities for faster agenda maintenance? (Integer priorities speed things up asymptotically and the constant factors are better, if I'm not mistaken).
Eventually we'll need tools for diagnosing bad prioritization.
It would be nice if Dyna had some kind of randomization, so students could play around with generating random ngram sentences.
Should convert the Penn Treebank to Dyna rules (by running an external script manually or as a filter, see #7). Then count and normalize to get the grammar. Finally, try parsing some of the training sentences or some held-out sentences.
The forward-backward ice cream spreadsheet example seems to be exposing bugs in the Dyna implementation. Can you guys try to make it work right? I wrote this code last night to give to the LI students -- I taught them from the spreadsheet yesterday. I figured I'd also give them some POS tagging data so they could play with the same program on a bigger example.
I'm getting a lot of division by 0 errors when running with a concatenation of icecream.dyna
and forward-backward.dyna
(checking those now into the examples/
directory).
Several odd things:
num_iterations := 0
, and then gradually increase num_iterations
one step at a time at the repl, I get no errors and I exactly match the HMM ice cream spreadsheet. But if I start num_iterations
above 0, or increase it by more than 1 at a time, I get errors.$error
values would feed through many iterations, but they don't seem to. IIRC, I only get values at all as far as the first iteration that has errors. And there seem to be some uncomputed things, e.g., count_emission_denom
is defined as a sum over count_emission
but sometimes a query says that it's 0 even though the summands are all positive.sol
include error messages for items that are only $error
because they have an antecedent that is $error
? Not sure, but it probably shouldn't as those can clutter things up and make it hard to see where the errors actually start.*
wasn't defined on Float * Term, but the Term in this case seemed to be nan
, which I would think would be a kind of Float.num_iterations
or lambda
doesn't seem to automatically print all the changes as I'd expect (or the errors). Sometimes it prints some of the changes, and sometimes none at all. I can still type sol
to see what I missed.Note that all documentation will go in this repo.
(The dyna2-docs repo has some of our older notes, and may continue to be a good location for writing papers.)
Related to trace commands on #1, we might look at AILog2's tracing.
http://www.cs.ubc.ca/~poole/aibook/code/ailog/ailog_man_7.html
It's interactive, which I think would probably better be handled in a GUI, although they do it in the terminal.
An interesting point is that they support interactive recursive tracing of queries that failed -- what they call "whynot" questions. I think in our setting, this might mean a GUI that in general, optionally shows some of the "interesting" null aggregands. More precisely, make it possible to see show groundings or partial groundings of the rule body that didn't complete but "came close" because some of the subgoals succeeded (or came close in turn).
Some heuristics (or user control) are needed to determinine which bindings are worth showing. E.g., if the rule body is foo(X)*bar(X+1)
, we don't want to report X=5
as an interesting binding simply because Y is X+1
succeeded there, but we might be interested in that binding if foo(5)
or bar(6)
succeeded, particularly if that is surprising because those are selective predicates or at least finite predicates (support mode -
).
The rule of thumb might be that we show the partial bindings from our actual query plan or update plan, which will tend to start with the selective subgoals. I think that's in fact what AILog2 does, although it's a bit easier there because it only does backward chaining.
Strings should be double-quoted.
I'm not sure what the current single-quoted strings mean: in Prolog, that's just a way of writing functors that may contain special characters (if we decide to continue to support such functors)
The edge weights can be determined by the Euclidean distance formula, if each city has a position given by an (x,y) pair and you are lucky enough to be a crow.
eom.
I have a preliminary version of a crash handler up and running. Right now it only writes a very verbose stack trace to ~/.dyna/crash.log
and tells the user to email a nonexistent(?) email address and attach the file.
... [tons of color stacktrace dumped to the terminal]
Please report this error by emailing [email protected].
Please attach the following file /home/timv/.dyna/crash.log
Open questions:
[email protected]
resolve thersync
,@nwf: Students who innocently try to define an innocent-looking function will be met with a scary and unhelpful error message (see below). I'd rather not have to explain planning and the backchain declaration to them.
Eventually we want the determination of chaining to be automatic. So for now, when ordinary planning fails, couldn't the planner see if it helps to add a backchain declaration and try again? This beats making the user do it.
:- foo(X) = X+1.
DynaCompilerError:
FATAL: Encountered error in input program:
Unable to plan initializers for rule(s):
/home/jasoneis/.dyna/tmp/15766/f8207d93ff9ba46002ae41fd3e0d02c0840e84e5.dyna:1:1-/home/jasoneis/.dyna/tmp/15766/f8207d93ff9ba46002ae41fd3e0d02c0840e84e5.dyna:1:14
> new rule(s) were not added to program.
Python's bool is a subtype of int. Hence, 0==False and 1==True, hash(0)==hash(False) and hash(1)==hash(True).
This is going to be a difficult fix. I propose waiting until Dyna has type inference to truly solve this problem. Until then, this is a "known bug".
Once we have path tracing or argmax, add that to the tutorial.
Until then, maybe explain how to use draw to see the answer.
If an item is aggregated with :-
, then its only possible value should be true
. That means we can print it cleanly as phrase("np",3,5).
rather than the cruftier phrase("np",3,5) = true
. This is great for printing out lists of facts and printing them in the form in which they were probably read. Can we handle this special case?
Also, :-
should type-fail on any aggregands other than true
. Right now it is apparently a synonym for |=, which is a bitwise operator, leading to the confusing trace below.
Also let's make |=
and &=
be boolean rather than bitwise operators, and have them type-fail on any aggregands other than true and false.
> e :- 10.
Changes
=======
e = 10.
> e :- 12.
Changes
=======
e = 14.
> e :- true.
Changes
=======
e = 15.
> e :- "seven".
Changes
=======
e = $error.
Cut-and-paste from .dyna files into the repl currently fails because the repl doesn't quite know about comments. A trailing comment gives an error that the line doesn't end in a period ... unless the comment ends in a period!
I think you just need to do the period check after stripping comments, not before.
As per longstanding design, the unary symbol *
is a gensym.
Basic gensyms should be easy enough to implement for now (the design only gets tricky once we add stuff like persistence/saving, multiple dynabases especially with extension, and prefix aggregators #39).
The main thing you need is a little syntactic sugar. The rule
p(W,X) min= q(X,Y) + r(Y,*).
just needs to be read as if it were
p(W,X) min= q(X,Y) + r(Y, $g5(W,X,Y)).
where the functor $g5
means that this is the 5th gensym *
that has been encountered by this Dyna process (there's a global counter), and the arguments are all of the variables from the rule (if any). The $g5
functor should be declared to have a dispos that says it doesn't evaluate, so that it doesn't need to be quoted when it is printed or read.
One question is whether the user should be able to type literal gensym names like $g5
. That is certainly bad form in a program since it's dangerously implementation-dependent: they can't really guess what the name will be. Worse, it could result in clashes: when loading a .dyna file that contains $g5
when $g5
was already defined.
On the other hand, it can be useful in queries (imagine tracing back some pointers implemented with gensyms). I think a good compromise is to allow it in queries, but to reject it in user rules -- here's a suggested message:
Error: You can't refer to the gensym $g5 directly in a rule.
Use `*` to get a new gensym. If you want to refer to the
same gensym across rules, you must give it a reusable
name by assigning it to an item.
wget http://cs.jhu.edu/~jason/tmp/edit.dyna
wget http://cs.jhu.edu/~jason/tmp/swapadj5.dyna
wget http://cs.jhu.edu/~jason/tmp/swap5.dyna
wget http://cs.jhu.edu/~jason/tmp/anagram5.dyna
wget http://cs.jhu.edu/~jason/tmp/subst5.dyna
cat edit.dyna *5.dyna > wordgraph.dyna
I seem to be able to load these files individually with dyna -i
, but when I try to load their concatenation I get this crash:
jasoneis@a14:~$ dyna -i wordgraph.dyna
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
/home/nwf/src/dyna2/src/Dyna/Backend/Python/interpreter.py in <module>()
688
689 if __name__ == '__main__':
--> 690 main()
/home/nwf/src/dyna2/src/Dyna/Backend/Python/interpreter.py in main()
668
669 try:
--> 670 interp.do(plan)
671 except DynaInitializerException as e:
672 print e
/home/nwf/src/dyna2/src/Dyna/Backend/Python/interpreter.py in do(self, filename, initialize)
512
513 # TODO: this should be a transaction.
--> 514 for k, v in env.agg_decl.items():
515 self.new_fn(k, v)
516
AttributeError: 'module' object has no attribute 'agg_decl'
FATAL ERROR (AttributeError): 'module' object has no attribute 'agg_decl'
Crash log available /home/jasoneis/.dyna/crash.log
Saving crash as jasoneis:2013-07-03:18:01:13:19675.log ...
This demonstrates a current bug:
:- foo(X) += bar(X).
:- :- backchain bar/1.
:- bar(X) = X+1.
:- query foo(3)
No results.
You can't even belatedly declare backchain foo/1.
(well, you can, but the query still finds no results because it's already been planned).
I think the solution is that when the original planner assumes that foo/1
and bar/1
are to be forward chained, it should record that they are FC. Then subsequent declarations saying that they are BC (possibly automatic ones: see #30) should fail, unless they trigger a successful round of replanning.
(Hmm, seems like some reactive programming language might help with this kind of thing. :-)
User should (as it can't yet, but should) be able to interrupt execution of the solver, inspect intermediate results with debugger, retract troublesome rules, etc.. and resume execution with out side-effects!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.