mhulden / foma Goto Github PK
View Code? Open in Web Editor NEWAutomatically exported from code.google.com/p/foma
Automatically exported from code.google.com/p/foma
What steps will reproduce the problem?
1. A make file does: foma -l blabla.foma <savestackblabla.sh
2. savestackblabla is a file, contents: save stack blabla.fst
and repeats the above several times.
At running there are numerous lines reporting:
....
defined Lexiconnum2: 16.3 kB. 572 states, 901 arcs, Cyclic.
defined Grammarnum2: 21.5 kB. 379 states, 1151 arcs, Cyclic.
">>> read in enhunum3 <<<"
Root...7, NumPref2...58, NumPref3...30, NumPref12...6, NumPref13...5,
NumPref14...2, Num...22, Num1...2, Fel...1, AddNum...3, Poss...18, Plur...2,
Fam...2, Gen...3, Case...20, Szor...3, Sub...4
Building lexicon...
Determinizing...
Minimizing...
Done!
16.4 kB. 576 states, 906 arcs, Cyclic.
defined Lexiconnum3: 16.4 kB. 576 states, 906 arcs, Cyclic.
defined Grammarnum3: 14.8 kB. 263 states, 720 arcs, Cyclic.
...
and so on
-q flag does not change this, it just suppresses prompt display.
It would be good a --silence flag, that would cause to suppress all the above
information, and display only messages, that indicate error or warning
due to problems in user created .foma and .lexc script itself. User included
echo-s (in .foma scripts) should be also displayed then, maybe they could
be also suppressable with an additional flag.
At present, if user edits, and recompiles, he gets thousands of lines,
he has to save and then search in the saved information for lines indicating
errors or warnings. That burden could be easily saved with a --silence flag.
I attached the Makefile, and also attached the list of .lexc files for
Hungarian, about 110 files, (plus 6 .foma files) and the make output, which
contains 1846 lines
Original issue reported on code.google.com by [email protected]
on 10 Jan 2013 at 7:17
Attachments:
What steps will reproduce the problem?
regex a <- [. .] ;
What is the expected output? What do you see instead?
I am getting this error:
1.11-1.11: error: ***syntax error at ';'.
What version of the product are you using? On what operating system?
Foma, version 0.9.16alpha, Ubuntu
Original issue reported on code.google.com by [email protected]
on 21 Dec 2012 at 1:49
I solved the problem to also recognize upcase words using an upcase converter
like:
define ToUpcase a -> A || .#. _ ,,
á -> Á || .#. _ ,,
b -> B || .#. _ ,,
c -> C || .#. _ ,,
d -> D || .#. _ ,,
e -> E || .#. _ ,,
é -> É || .#. _ ,,
f -> F || .#. _ ,,
g -> G || .#. _ ,,
h -> H || .#. _ ,,
i -> I || .#. _ ,,
í -> Í || .#. _ ,,
j -> J || .#. _ ,,
k -> K || .#. _ ,,
l -> L || .#. _ ,,
m -> M || .#. _ ,,
n -> N || .#. _ ,,
o -> O || .#. _ ,,
ó -> Ó || .#. _ ,,
ö -> Ö || .#. _ ,,
ő -> Ő || .#. _ ,,
p -> P || .#. _ ,,
q -> Q || .#. _ ,,
r -> R || .#. _ ,,
s -> S || .#. _ ,,
t -> T || .#. _ ,,
u -> U || .#. _ ,,
ú -> Ú || .#. _ ,,
ü -> Ü || .#. _ ,,
ű -> Ű || .#. _ ,,
v -> V || .#. _ ,,
w -> W || .#. _ ,,
x -> X || .#. _ ,,
y -> Y || .#. _ ,,
z -> Z || .#. _ ;
and by doubling all grammars having a normal and an upcase version:
define Grammar Lexicon .o.
ConsonantDoubling .o.
EDeletion .o.
EInsertion .o.
YReplacement .o.
KInsertion .o.
Cleanup;
define Grammarup Lexicon .o.
ToUpcase .o.
ConsonantDoubling .o.
EDeletion .o.
EInsertion .o.
YReplacement .o.
KInsertion .o.
Cleanup;
regex Grammar | Grammarup;
Attached the complete project.
The approach has two disadvantages:
1. I have to double all grammars
2. using down:
foma[1]: down
apply down> cat+N+Sg
cat
Cat
apply down> Peter+N+Sg
Peter
I also get for cat+N+Sg Cat, which is obvious and in fact unnecessary.
Is there a more elegant way to solve up/lower case, or is my one the optimal
one?
Thanks in advance.
Original issue reported on code.google.com by [email protected]
on 29 Jul 2012 at 12:28
Attachments:
What steps will reproduce the problem?
1. read regex a => b _ c;
2. read regex [a]+ => b _ c;
3. read regex [\a]+ => b _ c;
4. read regex [\a]+ => .#. _;
What is the expected output? What do you see instead?
1. works
2. works
3. accepts a* ?!
4. Error message Symbol '@#@' not found in network!
What version of the product are you using? On what operating system?
0.9.17alpha
Please provide any additional information below.
I want to do sth like this in lexc (i.e., swallow +Pref+):
LEXICON Root
+Pref+:0 Second;
<[[\{+}]+ => .#. _] @-> ... {/}> Pos;
LEXICON Second
<[[\{+}]+ => .#. _] @-> ... {/}> Pos;
Original issue reported on code.google.com by [email protected]
on 27 Jan 2013 at 2:50
Lines longer than 2048 characters are broken into multiple lines by flookup
(and then processed line-by-line).
Is this intentional?
I'm using flookup 1.0 (foma library version 0.9.14alpha)
on Debian GNU/Linux kernel 2.6.32-bpo.5-xen-amd64
Original issue reported on code.google.com by [email protected]
on 17 Jul 2012 at 9:12
What steps will reproduce the problem?
Erroneous:
define ConsAeJaje [ g | k | l | m | n | p | r | t | v | gy | n y | t y ];
correct:
define ConsAeJaje [ g | k | l | m | n | p | r | t | v | gy | n y | t y ];
Rule that uses it, looks:
define HarmRuleJ §J§ -> j || .#. \"^"* [ Vowel | ConsJaje ] %^ ?* _ .o.
§J§ -> j || .#. \"^"* ConsAeJaje %^ ?* _ ,,
§J§ -> 0 || .#. \"^"* ConsAeJaje %^ ?* _ .o.
§J§ -> 0 || .#. \"^"* ConsAe %^ ?* _ ;
If erroneous, foma behaves like:
foma[1]: down
apply down> nagy+Noun+Nom
???
What is the expected output? What do you see instead?
It were nicer, if foma told at compilation time, the define line is
not OK, or would somehow instruct at execution time,
why is the searched word (nagy) not found.
Original issue reported on code.google.com by [email protected]
on 27 Jan 2012 at 5:40
Attachments:
What steps will reproduce the problem?
See the following foma script:
echo BUGCOMMENT: Everything works fine for ASCII networks and ASCII input
echo
echo read regex n ;
read regex n ;
echo
echo BUGCOMMENT: Computing med n
med n
echo BUGCOMMENT: BUG with non-ASCII input and non-ASCII network
echo BUGCOMMENT: ñ is not recognized by med ñ
echo
echo read regex ñ ;
read regex ñ ;
echo
echo BUGCOMMENT: Computing med ñ
med ñ
What is the expected output? What do you see instead?
I would expect to see the same costs for n and ñ.
BUGCOMMENT: Everything works fine for ASCII networks and ASCII input
read regex n ;
194 bytes. 2 states, 1 arcs, 1 path.
BUGCOMMENT: Computing med n
Calculating heuristic [h]
Using Levenshtein distance.
n
n
Cost[f]: 0
n*
*n
Cost[f]: 2
*n
n*
Cost[f]: 2
BUGCOMMENT: BUG with non-ASCII input and non-ASCII network
BUGCOMMENT: Cost[f]: 0 is missing
read regex ñ ;
195 bytes. 2 states, 1 arcs, 1 path.
BUGCOMMENT: Computing med ñ
Calculating heuristic [h]
Using Levenshtein distance.
*ñ
ñ
Cost[f]: 2
ñ*
ñ
Cost[f]: 2
*ñ*
###
Cost[f]: 3
What version of the product are you using? On what operating system?
0.9.15 on linux and macos
Please provide any additional information below.
Original issue reported on code.google.com by [email protected]
on 23 Nov 2011 at 12:22
Attachments:
What steps will reproduce the problem?
1. In line 4412 of a lexc file there is an erroneous line:
bős-nagymarosi AddNoun;;
(There are accidentally 2 ;;-s at the end, column 24)
2. Foma reports:
***Syntax error on line 4403 column 26 at ';'
which is 9 lines away from the actual error
What is the expected output? What do you see instead?
It would be nicer, if foma reported:
***Syntax error on line 4412 column 24 at ';'
Complete test environment attached.
What version of the product are you using? On what operating system?
Please provide any additional information below.
Original issue reported on code.google.com by [email protected]
on 1 Feb 2012 at 1:04
Attachments:
In the fsmbook there is a so called compounding exercise (p 248), but it is not
worked out, with other words, it does not show, how to compound words and how
to filter out unlikely compounds. Also not a single word is said in fsmbook
about specific German compounding, where nouns are upper case. It would be very
good, if foma documentation would give some working examples for compounding.
Original issue reported on code.google.com by [email protected]
on 8 Jan 2012 at 9:48
What steps will reproduce the problem?
in .lexc file:
LEXICON Case
+Abl:^+cAbl^t§oooe20§l #; ! tól, től
...
.foma file sees this as §oooe2§, it does not see the 0. (null)
This is a minor issue. If this is a feature (hopefully not), should
be documented.
Original issue reported on code.google.com by [email protected]
on 12 Jan 2012 at 3:32
What steps will reproduce the problem?
1. re "a";
2. zero-plus
3. write att
What is the expected output? What do you see instead?
Expected:
0 1 a a
1 1 a a
0
1
Seen:
0 1 a a
1 1 a a
1
What version of the product are you using? On what operating system?
Foma, version 0.9.14alpha; tested on Ubuntu Linux 10.04 and Mac OS 10.7.
Please provide any additional information below.
Also tested with several other automata. Explicit Kleene star in the regex
works.
Original issue reported on code.google.com by [email protected]
on 27 Jan 2012 at 9:31
Let me start with expressing my gratitude for making this neat tool available.
What steps will reproduce the problem?
1. execute "foma"
2. foma[0]: "read lexc xhosa-utf8.lex" (attached)
What is the expected output? What do you see instead?
=> foma outputs: "152 bytes. 1 states, 0 arcs, 0 paths" (along with "defined
but not used" warnings)
=> xfst reads the same file and outputs: "38.2 Kb. 382 states, 824 arcs,
4070784 paths"
=> While definitions in this file are not used in the same file, they are used
in other scripts read afterwards.
What version of the product are you using? On what operating system?
=> 0.9.15alpha for linux x86_64
Please provide any additional information below.
=> A small nuisance I experienced porting an xfst script on foma: definitions
like "define Syll[Combos Vowels];" give an error cuz it expects a space after
"Syll". But this is not a big deal.
Thanks,
waleed
Original issue reported on code.google.com by [email protected]
on 14 Nov 2011 at 10:14
Attachments:
See the description of issue #49.
Original issue reported on code.google.com by [email protected]
on 17 Jul 2013 at 12:10
If there is a character that can take multiple values, it would be useful, if
there existed an "otherwise" rule.
Example: §aoe§ set up in the .lexc file can be a, o or e.
in .foma file:
define Ruleforaoe §aoe§-> a // Rule1,,
§aoe§-> o // Rule2 | Rule3,,
§aoe§-> e // _Otherwise_;
The advantage for the user, that he does not have to think out any special rule
for passing the value 'e' to the character, just simply say _Otherwise_.
Original issue reported on code.google.com by [email protected]
on 20 Feb 2012 at 3:46
What steps will reproduce the problem?
I have an empty continuation class named CC00 which I use to terminate words.
For example:
#clase de continuación para terminar palabras:
LEXICON CC00
# ;
#Sufijos para sustantivos (la ruta más general)
LEXICON CC_sust
[Diminuitivo]:+itu CC_sust_bif;
0:0 CC_sust_bif;
CC00;
What is the expected output? What do you see instead?
When I run the code above, I see that the last word ("palabras") in the
comment:
#clase de continuación para terminar palabras:
is inserted in the output to the "print upper" command in place of the empty
continuation class CC00.
For example:
vato@debamos:~/corrector/foma-training/qu3$ foma -l simikuna.foma
Root...4, Sustantivos...7, Verbos...5, Adjetivos...3, Adverbios...2, CC00...1,
CC_sust...3, CC_sust_diminuitivos_2_generos...3, CC_sust_bif...2,
CC_sust_numero_primero...2, CC_sust_numero_primero_bif...2,
CC_sust_limitativo_a_pos...2, CC_sust_posesivos_a_caso...9,
CC_sust_posesivos_bif...2, CC_sust_posesivos_primero...9,
CC_sust_numero_a_lim...2, CC_sust_numero_bif...2,
CC_sust_limitativo_a_caso...2, CC_sust_caso...11, CC_sust_continuativos...3,
CC_sust_puni...2, CC_sust_caso_y_limitativo...3, CC_sust_cliticos...15,
CC_verbalizadores...6, CC_verb01...3, CC_verb02...6, CC_verb_rqu...2,
CC_verb_ri...2, CC_verb_chi...2, CC_verb_tata...2, CC_verb_rpari...2,
CC_verb_kipa...2, CC_verb_ysi...2, CC_verb_infijos_generales...9,
CC_verb_limitativo...2, CC_verb_progresivo...2, CC_verb_tiempo_persona...12,
CC_ind_pres...26, CC_ind_fut...26, CC_ind_perf...25, CC_ind_plusc...25,
CC_poten...28, CC_subj...19, CC_oblig...25, CC_inf...3, CC_imp...14,
CC_part_pres...3, CC_part_pas...26, CC_agentivo...3, CC_verb_raq...3,
CC_verb_puni...2, CC_verb_sina...2, CC_verb_taq...1, CC_verb_cliticos...12,
CC_adv_cliticos...12, CC_adv_limitativo...2, CC_adv_acusativo...3, CC_adv...1,
CC_adj...1
Building lexicon...
*Warning: lexicon 'CC_agentivo' defined but not used
***Warning: lexicon 'CC_part_act' used but never defined
*Warning: lexicon 'CC_verbalizadores' defined but not used
***Warning: lexicon 'CC_sust_raq' used but never defined
Determinizing...
Minimizing...
Done!
34.2 kB. 1449 states, 2086 arcs, 13079858235 paths.
defined LEX: 34.2 kB. 1449 states, 2086 arcs, 13079858235 paths.
defined Cons: 1.2 kB. 5 states, 26 arcs, 32 paths.
defined Vocales: 671 bytes. 12 states, 11 arcs, 1 path.
defined Niyuq: 3.9 kB. 11 states, 193 arcs, Cyclic.
defined Rclean: 332 bytes. 1 state, 2 arcs, Cyclic.
defined Morfo: 35.4 kB. 1458 states, 2158 arcs, 13079858235 paths.
35.4 kB. 1458 states, 2158 arcs, 13079858235 paths.
Foma, version 0.9.16alpha
Copyright © 2008-2011 Mans Hulden
This is free software; see the source code for copying conditions.
There is ABSOLUTELY NO WARRANTY; for details, type "help license"
Type "help" to list all commands available.
Type "help <topic>" or help "<operator>" for further help.
foma[1]: print upper
tullu[Adj]palabras
tullu[Sust][Acusativo][Aditativo]palabras
tullu[Sust][Acusativo][Responsivo]palabras
tullu[Sust][Acusativo][Interrogativo]palabras
tullu[Sust][Acusativo][Topico]palabras
tullu[Sust][Acusativo][Vacilativo]palabras
tullu[Sust][Acusativo][ReportativoVocal]palabras
tullu[Sust][Acusativo][ReportativoCons]palabras
tullu[Sust][Acusativo][TestimonialVocal]palabras
tullu[Sust][Acusativo][TestimonialCons]palabras
tullu[Sust][Acusativo][Impresivo]palabras
tullu[Sust][Acusativo][Dubitativo]palabras
tullu[Sust][Acusativo][Definitivo][Aditativo]palabras
tullu[Sust][Acusativo][Definitivo][Responsivo]palabras
tullu[Sust][Acusativo][Definitivo][Interrogativo]palabras
tullu[Sust][Acusativo][Definitivo][Topico]palabras
tullu[Sust][Acusativo][Definitivo][Vacilativo]palabras
tullu[Sust][Acusativo][Definitivo][ReportativoVocal]palabras
tullu[Sust][Acusativo][Definitivo][ReportativoCons]palabras
tullu[Sust][Acusativo][Definitivo][TestimonialVocal]palabras
tullu[Sust][Acusativo][Definitivo][TestimonialCons]palabras
tullu[Sust][Acusativo][Definitivo][Impresivo]palabras
tullu[Sust][Acusativo][Definitivo][Dubitativo]palabras
tullu[Sust][Acusativo][Definitivo][Contrastivo]palabras
tullu[Sust][Acusativo][Definitivo][Contrastivo][Topico]palabras
tullu[Sust][Acusativo][Definitivo][Contrastivo][Responsivo]palabras
tullu[Sust][Acusativo][Definitivo]palabras
tullu[Sust][Acusativo][Continuativo][Aditativo]palabras
tullu[Sust][Acusativo][Continuativo][Responsivo]palabras
tullu[Sust][Acusativo][Continuativo][Interrogativo]palabras
tullu[Sust][Acusativo][Continuativo][Topico]palabras
tullu[Sust][Acusativo][Continuativo][Vacilativo]palabras
tullu[Sust][Acusativo][Continuativo][ReportativoVocal]palabras
tullu[Sust][Acusativo][Continuativo][ReportativoCons]palabras
tullu[Sust][Acusativo][Continuativo][TestimonialVocal]palabras
tullu[Sust][Acusativo][Continuativo][TestimonialCons]palabras
tullu[Sust][Acusativo][Continuativo][Impresivo]palabras
tullu[Sust][Acusativo][Continuativo][Dubitativo]palabras
tullu[Sust][Acusativo][Continuativo][Definitivo][Aditativo]palabras
tullu[Sust][Acusativo][Continuativo][Definitivo][Responsivo]palabras
tullu[Sust][Acusativo][Continuativo][Definitivo][Interrogativo]palabras
tullu[Sust][Acusativo][Continuativo][Definitivo][Topico]palabras
tullu[Sust][Acusativo][Continuativo][Definitivo][Vacilativo]palabras
tullu[Sust][Acusativo][Continuativo][Definitivo][ReportativoVocal]palabras
tullu[Sust][Acusativo][Continuativo][Definitivo][ReportativoCons]palabras
tullu[Sust][Acusativo][Continuativo][Definitivo][TestimonialVocal]palabras
tullu[Sust][Acusativo][Continuativo][Definitivo][TestimonialCons]palabras
tullu[Sust][Acusativo][Continuativo][Definitivo][Impresivo]palabras
tullu[Sust][Acusativo][Continuativo][Definitivo][Dubitativo]palabras
tullu[Sust][Acusativo][Continuativo][Definitivo][Contrastivo]palabras
tullu[Sust][Acusativo][Continuativo][Definitivo][Contrastivo][Topico]palabras
tullu[Sust][Acusativo][Continuativo][Definitivo][Contrastivo][Responsivo]palabra
s
What version of the product are you using? On what operating system?
Foma, version 0.9.16alpha in Debian Testing Linux 3.1.0-1-amd64
Attached are the two Foma files I am using.
Original issue reported on code.google.com by [email protected]
on 19 Sep 2012 at 12:01
Attachments:
What steps will reproduce the problem?
LEXICON Start
+Whatever:0 Gen2
Gen2
LEXICON Gen2
+Gens:^+cGens^é Case;
+Genpl:^+cGenpl^éi Case;
Case2;
....
LEXICON Gen2 ! This is in fact a typo, should be Gen3
+Gens:^+cGens^é Case3;
+Genpl:^+cGenpl^éi Case3;
Case4;
The compiler does not say a word, the user has no idea,
where did the program go after Start/Whatever
An error message would be nice in such cases.
Original issue reported on code.google.com by [email protected]
on 17 Jan 2012 at 5:47
I have test.lexc and test.foma files:
$ cat test.lexc
Multichar_Symbols
+A +M +F
LEXICON Root
almizcler AdjGenderO;
LEXICON AdjGenderO
o+A+M:o #;
o+A+F:a #;
$ cat test.foma
read lexc test.lexc
define Lexicon;
define AO a | á | o;
define CtoZC c -> zc || _ "^E" AO;
regex Lexicon .o.
CtoZC ;
So, I've defined two relations:
almizclero+A+M <--> almizclero
almizclero+A+F <--> almizclera
Rule CtoZC should not have an impact to this relations.
So, when I do next steps...
$ foma -l test.foma
....
foma[1]: words
almizclero:a+A:0+F:0
almizclero+A:0+M:0
foma[1]: apply down
apply down> almizclero+A+F
???
It is strange and looks like incorrect behavior.
But when I comment out line with CtoZC rule and change main regex simply to
"regex Lexicon;", it's OK:
$ foma -l test.foma
....
foma[1]: words
almizclero:a+A:0+F:0
almizclero+A:0+M:0
foma[1]: apply down
apply down> almizclero+A+F
almizclera
Foma version: foma 0.9.16alpha
System version: Debian wheezy (testing), Linux 3.5 x86_64
Original issue reported on code.google.com by [email protected]
on 19 Oct 2012 at 8:31
Attachments:
What steps will reproduce the problem?
define Etoee e -> é || _ "^" [ \0 ] ;
...
define Grammar Lexicon .o.
HarmRule .o.
E_to_ee .o.
Cleanup;
Causes a segmentation error.
Changing name E_to_ee to Etoee solves the problem.
Would not an error message be more elegant, it the name is not liked?
Original issue reported on code.google.com by [email protected]
on 28 Dec 2011 at 11:32
What steps will reproduce the problem?
Use in foma file:
ead lexc /home/en/program/foma/tktest/lexc/verb/enhuige7ik.lexc
define Lexicon7ik
define Grammar7ik Filter1 .o.
Lexicon7ik .o.
CleanupEndings .o.
HarmRuleAAA .o.
...
HarmRuleszorelolmely .o.
Cica .o. <--- this rule does not exist
Cleanup;
What is the expected output? What do you see instead?
An error message or warning.
Instead, all words of enhuige7ik.lexc are unknown:
foma[1]: up
apply up> aggik
???
apply up>
Newest foma version, linux debian.
Original issue reported on code.google.com by [email protected]
on 14 Mar 2012 at 2:12
I have written a consonant assimilation rule that works:
define HarmRuleV V -> v || Vowel %^ _ ,,
V -> b || b %^ _ ,,
V -> c || c %^ _ ,,
V -> d || d %^ _ ,,
V -> f || f %^ _ ,,
V -> g || g %^ _ ,,
V -> h || h %^ _ ,,
V -> j || j %^ _ ,,
V -> k || k %^ _ ,,
V -> l || l %^ _ ,,
V -> m || m %^ _ ,,
V -> n || n %^ _ ,,
V -> p || p %^ _ ,,
V -> q || q %^ _ ,,
V -> r || r %^ _ ,,
V -> s || s %^ _ ,,
V -> t || t %^ _ ,,
V -> v || v %^ _ ,,
V -> z || z %^ _ ;
What it does is:
If the words ends with vowel, it is a v
If the word ends with consonant, it is the ending consonant.
for example: rém- rémm
rab - rabb
hajó- hajóv
In sfst there is a special variable, a so called agreement variable, that takes
the value of the matching character, and that simplified the above rule, since
one can express with one line: if a consonant is matched, take its value.
Is there nothing similar in foma to simplify the above rule?
Original issue reported on code.google.com by [email protected]
on 4 Jan 2012 at 6:00
1. foma doesn't compile under recent versions of GCC. The problem is caused by
the stricter policy followed by GCC as to when standard headers are included.
The current workaround is to add
#include <stdbool.h>
to the file that includes fomalib.h. The solution would be to include this line
in fomalib.h itself.
2. The parameter of fsm_read_binary_file(char*) should be a const char*. This
goes for all other functions that accept a filename
(fsm_read_binary_file_multiple_init, io_get_file_size, etc).
Original issue reported on code.google.com by [email protected]
on 15 Jul 2013 at 1:20
What steps will reproduce the problem?
1. read regex %"test%"%)%.;
2. write prolog test.prolog
3. read prolog test.prolog
What is the expected output? What do you see instead?
The expected output is a simple FSA: (0) --"test").--> ((1)).
What I see instead are:
1. with print net:
Ss0: \"test\ -> fs1.
2. with view net:
three states: ((];)), ((1)), (0), with no transitions between them.
What version of the product are you using? On what operating system?
Latest svn (or 0.9.17alpha).
Please provide any additional information below.
The behavior is caused by fsm_read_prolog() not handling quotation marks (") in
symbols: when parsing the arc() clause in the .prolog file, it assumes that it
ends with "symbol"). or ("in":"out"). However, quotation marks can be part of a
symbol in foma, in which case they are escaped by a backslash (\). Two problems
arise:
1. If the symbol contains ": or ")., as in the test above, those are
erroneously parsed as the end of the input symbol and the clause, respectively.
2. Even if it doesn't, the parser treats the symbol string as it is, and
doesn't remove the backslashes from before the quotation marks.
I have attached a patch that fixes this bug in io.c.
Original issue reported on code.google.com by [email protected]
on 17 Jul 2013 at 12:08
Attachments:
It would be very nice, if foma had a command #include "file.name" similar to C.
That would help to avoid keeping long word lists in lexc files.
I imagine its usage like this:
----------------------------------
LEXICON Noun
#include "words/std.lex"
LEXICON AddNoun
...
----------------------------------
But also for repeated code parts would #include be quite useful.
Original issue reported on code.google.com by [email protected]
on 17 Jan 2012 at 1:44
What steps will reproduce the problem?
Erroneous rule looks:
define HarmRuleszelol §szelol§ -> o l // .#. \"^"* BackVowel \Vowel* [ s | z
] "^" _ ,,
§szelol§ -> e l // .#. \"^"* FrontVowel \Vowel* [ s | z ] "^" _ ,,
§szelol§ -> sz // .#. \"^"* "^" [ \"^"* "^"]* _ ;
The error is the last line, should be §szelol§ -> s z // ...
There is no error message whatever at compilation time.
This erroneous rule causes, that all words containing sz will become unknown.
An error message at compilation time would help a lot for the user to find out,
what is the problem.
Original issue reported on code.google.com by [email protected]
on 24 Feb 2012 at 4:09
Attachments:
What steps will reproduce the problem?
1. In lex file I add by a mistake two LEXICON entries with the same name, for
example:
LEXICON AddVerbrag11mely
+Verb:0 Caserag1mely; ! should be Caserag11mely, but forgot to change
LEXICON Caserag1mely ! should be Caserag11mely, but forgot to change
+IndefSg1:^+cIndefSg1^§igeomely§k #; ! ek/ok
and
LEXICON AddVerbrag1mely
+Verb:0 Caserag1mely;
LEXICON Caserag1mely
+IndefSg1:^+cIndefSg1^§igeomely§k #; ! ek/ok
What is the expected output? What do you see instead?
Expected is an error message or at least a warning
Instead, foma takes one of the identical entries, and uses that,
which is a hard to grasp behaviour.
Original issue reported on code.google.com by [email protected]
on 12 Mar 2012 at 10:36
What steps will reproduce the problem?
1. if at a rule ,, or .o. forgotten like:
define HarmRuleIgejatokitek §igejatokitek§ -> i t e k // .#. \"^"* FrontVowel
\Vowel* "^" ?* _ <---- here forgotten ,,
§igejatokitek§ -> s á t o k // .#. \"^"* BackVowel s "^" ?* _ ,,
...
then compiler crashes:
defined HarmRuleIgeija: 14.0 kB. 14 states, 818 arcs, Cyclic.
*** glibc detected *** foma: double free or corruption (!prev): 0x097eac70 ***
======= Backtrace: =========
/lib/tls/i686/cmov/libc.so.6[0x4ebff1]
It would be nicer, if compiler saw the syntax error, and gave a warning or
error message.
Original issue reported on code.google.com by [email protected]
on 21 Feb 2012 at 1:55
Attachments:
I have a question. Cases For and Tem are special cases, where word ending e
should not turn to é.
# rege+For regéként <-- should be regeként
# rege+Tem regékor <-- should be regekor
# rege+Posss3+For regéjéként <-- should be regéjeként
I tried
# define Etoee e -> é || _ "^" [ \0 & \{+Tem} & \{+For} ] ; # \0: not zero
and also
# define Etoee e -> é || _ "^" [ \0 & \{ként} & \{kor} ] ; # \0: not zero
and similarly
# define HarmRuleC C -> á // BackVowel \Vowel* _ %^ [ \0 ] .o.
C -> é // FrontVowel \Vowel* _ %^ [ \0 ] .o.
C -> a // BackVowel \Vowel* _ %^ [ 0 | {+For} ] .o.
C -> e // FrontVowel \Vowel* _ %^ [ 0 | {+For} ] ;
and also
# define HarmRuleC C -> á // BackVowel \Vowel* _ %^ [ \0 ] .o.
C -> é // FrontVowel \Vowel* _ %^ [ \0 ] .o.
C -> a // BackVowel \Vowel* _ %^ [ 0 | {ként} ] .o.
C -> e // FrontVowel \Vowel* _ %^ [ 0 | {ként} ] ;
But this does not work. I can not find any solution, please help.
Attached the lexc/foma pair that is a little test program especially for this.
Thanks in advance.
Original issue reported on code.google.com by [email protected]
on 3 Jan 2012 at 5:57
Attachments:
What steps will reproduce the problem?
1. in Lexc file:
LEXICON AddVerb
...
LEXICON AddVerb
...
What is the expected output? What do you see instead?
A warning, that LEXICON AddVerb is doubly defined
Instead: No warning, both LEXICONs are happily used.
Original issue reported on code.google.com by [email protected]
on 3 Apr 2012 at 10:42
What steps will reproduce the problem?
1. I use a UTF-8 source file with the BOM symbol at the beginning
2. I apply "source" or "read" command to the file
What is the expected output? What do you see instead?
I expect foma to ignore BOM. However, it is regarded as a normal symbol,
resulting in errors (like "unknown character").
What version of the product are you using? On what operating system?
I am using 0.9.17 on Windows 7.
Actually, it's a really minor ussue but new users may have troubles as the
reason is not very clear at first. Perhaps, it should be specified in the
tutorial that one has to use UTF-8 files without BOM.
Original issue reported on code.google.com by [email protected]
on 3 Mar 2013 at 3:36
What steps will reproduce the problem?
1. use Hungarian foma's result hunfnnum.fst, Hungarian foma downloadable from
https://gitorious.org/hunmorph-foma/hunmorph-foma/trees/master
2. do_testup.sh (attached), watch flookup size using ps, it is 39.6 MB
3. perl x.pl <x >x1 (both attached)
4. perl /home/en/program/foma/tktest/szokincsteszt/szeged/chkwdlistup.pl x1 >
x2
this will take about 5-10 minutes, after the test flookup size increases to > 54 MB
(chkwdlistup.pl attached)
What is the expected output? What do you see instead?
The problem is, that apply.c
----------------------------------------------------------------
int apply_check_flag(struct apply_handle *h, int type, char *name, char *value)
{
struct flag_list *flist, *flist2;
for (flist = h->flag_list; flist != NULL; flist = flist->next) {
if (strcmp(flist->name, name) == 0) {
break;
}
}
h->oldflagvalue = flist->value;
h->oldflagneg = flist->neg;
if (type == FLAG_UNIFY) {
if (flist->value == NULL) {
flist->value = xxstrdup(value); /* this causes the hog */
return SUCCEED;
}
--------------------------------------------------
duplicates a string, and never ever frees it. I found a solution, that fixes
the problem:
in flookup.c:
at the declarations:
extern void apply_clean();
extern void apply_clean_start();
.....
void handle_line(char *s) {
char *result, *tempstr;
apply_clean_start();
....
}
apply_clean();
}
In apply.c:
In declarations:
static int apply_clean_variable;
#define MAX_SAVED 10
static struct flag_list *saved_flag_list[MAX_SAVED];
static char *saved_values[MAX_SAVED];
static int clean_ix;
static void apply_add_clean_list(struct flag_list *flist, char *value);
void apply_clean();
void apply_clean_start();
....
int apply_check_flag(struct apply_handle *h, int type, char *name, char *value)
{
struct flag_list *flist, *flist2;
for (flist = h->flag_list; flist != NULL; flist = flist->next) {
if (strcmp(flist->name, name) == 0) {
break;
}
}
h->oldflagvalue = flist->value;
h->oldflagneg = flist->neg;
if (type == FLAG_UNIFY) {
if (flist->value == NULL) {
flist->value = xxstrdup(value);
apply_add_clean_list(flist, flist->value);
return SUCCEED;
}
....
void apply_clean_start()
{
apply_clean_variable = 1;
}
void apply_add_clean_list(struct flag_list *flist, char *value)
{
if(apply_clean_variable){
saved_flag_list[clean_ix] = flist;
saved_values[clean_ix] = value;
if(++clean_ix >= MAX_SAVED){
clean_ix = 0;
}
}
}
void apply_clean(){
if(apply_clean_variable){
int i;
for(i = 0; i < clean_ix; i++){
xxfree(saved_values[i]);
saved_flag_list[i]->value = NULL;
saved_values[i] = NULL;
saved_flag_list[i] = NULL;
}
clean_ix = 0;
apply_clean_variable = 0;
}
}
void apply_clean_start();
void apply_add_clean_list(struct flag_list *flist, char *value);
void apply_clean();
What version of the product are you using? On what operating system?
Newest from svn, linux debian
Please provide any additional information below.
The solution's description:
1. sign that we are enter flookup,
set apply_clean_variable = 1;
2. If apply_clean_variable == 1, remember all strdups in a list, max 10 of them
3. when leaving flookup, free all strings in list, put NULL into their pointer
in struct flag_list *flist, set apply_clean_variable = 0;
----------------------------------
I have also tried to eliminate the strdup in apply_check_flag, and pass back a
FAIL, however, in that case lots of words were not found, that is functionality
of foma fails.
Original issue reported on code.google.com by [email protected]
on 16 Jan 2013 at 2:26
Attachments:
We have typically the word lists in lexc files, like:
LEXICON Noun
cat Ninf;
city Ninf;
fox Ninf;
panic Ninf;
try Ninf;
watch Ninf;
There is an intercafe command read text, however, that works only from foma,
not from lexc.
I'd like to keep some hundred words in an external file, and read them
in compile time into the lexc file. Is that possible?
Original issue reported on code.google.com by [email protected]
on 11 Jan 2012 at 6:52
What steps will reproduce the problem?
1. Create a binary for regex
regex [?* a] @-> d;
2. Load it into foma and apply down the following words:
bf
aaaaaaaf
faaaaf
dddaaaf
3. Send the words through flookup -ix.
What is the expected output? What do you see instead?
The expected output is `bf` for the first word and `df` for all the others
The foma interpreter returns the correct results but flookup does not:
bf -> bf
aaaaaaaf -> df
faaaaf -> +?
dddaaaf -> df (!)
What version of the product are you using? On what operating system?
foma 0.9.16
Ubuntu 11.10, 12.04
Please provide any additional information below.
Looking at the transducer (see attachment), we can see that the paths the above computations should take are
(0) -> (3) // bf
(0) -> (1) -> (3) // aaaaaaaf
(0) -> 2 -> (1) -> (3) // faaaaf
(0) -> (1) -> 2 -> (1) -> (3) // dddaaaf
The last sequence is successful again, I assumed it was because of the arc (0)
-d-> (1), but I was not sure how. So I inserted a line that prints the current
state at apply.c:910. The results are:
flookup:
-------
$ echo "aaa" | ./flookup -ix ../foo.bin
State no 0
State no 1
State no 1
State no 1
d
$ echo "ad" | ./flookup -ix ../foo.bin
State no 0
State no 1
State no 3
dd
State no 2
$ echo "caa" | ./flookup -ix ../foo.bin
State no 0
State no 3
+?
foma:
----
foma[1]: down caa
State no 0
State no 3
State no 2
State no 1
State no 1
d
foma[1]: down aadd
State no 0
State no 1
State no 1
State no 3
State no 3
ddd
State no 2
State no 2
Apparently, in case of `caa`, flookup does not backtrack from state 3, while
the interpreter does.
Another interesting bit is the analysis of `ad` and `aadd`, where the
interpreter moves from state 3 to state 2 after the results have been printed.
This seems completely superfluous.
Original issue reported on code.google.com by [email protected]
on 26 Jul 2012 at 3:55
Attachments:
This is not a problem, but a lexc file syntac check tool in perl. Maybe, others
find it also useful.
It checks for:
1. doubly defined LEXICONs
2. Unused LEXICONs
3. Undefined but used LEXICONs
It is handy, because it quickly checks a set of files in a directory.
Original issue reported on code.google.com by [email protected]
on 2 Jul 2012 at 9:07
Lexc file contains:
+Posss1p:^+cPosss1p^§JPnull§§AE§im Gen; ! aim
citromlé:citromlev AddNoun1;
Foma file contains:
define FrontVowel [ e | é | i | í | ü | ű | ö | ő | E | É | I | Í | Ü
| Ű | Ö | Ő ];
define BackVowel [ a | á | o | ó | u | ú | A | Á | O | Ó | U | Ú ];
define HarmRuleai §AE§ -> 0 || .#. \"^"* Vowelwithy %^ \%^* _ .o.
§AE§ -> a // .#. \"^"* BackVowel Cons+ %^ \%^* _ .o.
§AE§ -> e // .#. \"^"* FrontVowel Cons+ %^ \%^* _ ;
The word has Frontvowel-backvowel-frontwovel vowels (mixed)
I get this
foma[1]: down
apply down> citromlé+Noun+Posss1p+Gens+Dat
citromlé^im^é^n§AA§k
citromlev^aim^é^n§AA§k
when I stop immediately after replacement of §AE§.
This is wrong! the rule says:
§AE§ -> a // .#. \"^"* BackVowel Cons+ %^ \%^* _ .o.
Start from the beginning of the word; if you see backvowel, then only
consonant(s) and then ^, replace §AE§ after this with 'a';
However there is no such sequence in citromlev^, since before ^ there is
frontvowel and not backvowel.
If I use:
define HarmRuleai §AE§ -> 0 || .#. \"^"* Vowelwithy %^ \%^* _ .o.
§AE§ -> e // .#. \"^"* FrontVowel Cons+ %^ \%^* _ .o.
§AE§ -> a // .#. \"^"* BackVowel Cons+ %^ \%^* _ ;
That is, I swap check for front and back vowel, I get correct behaviour but I
do not understand, why the first version delivers an incorrect replacement.
Any ideas?
Original issue reported on code.google.com by [email protected]
on 1 Feb 2012 at 4:28
Attachments:
What steps will reproduce the problem?
My last 2 defines are:
define AccSingle [ j | l | n | r | s | z ]
define Nodupy [ Vowel | b | c | d | e | f | h | j | k | m | n | p | q | r | s |
v | w | x | y | z ]
(then follow rules)
I have forgotten to write ';' at the end of the defines.
Foma compiles, reports no error, no warning, and then lower-words answers with
empty list:
foma[1]: lower-words
foma[1]:
What is the expected output? What do you see instead?
It would be much nicer, if foma reported an error or at least a warning during
compilation, that the user knows, where to search.
Original issue reported on code.google.com by [email protected]
on 27 Jan 2012 at 2:05
Attachments:
What steps will reproduce the problem?
Create a Foma file with comments which contain > (greater than) or " (double
quotation mark) characters. For example:
# Reglas para diminuitivos (para implementar con expresiones regulares):
#1. Si termina en U, añade -itu, sin importar el genero.
# Ejs: allqu > allqitu,
# "perrito" o "perrita"
LEXICON CC_sust_diminuitivos
[Diminuitivo]:+itu CC_sust_bif;
The above example generates the following error messages:
***Syntax error on line 58 column 18 at '>'
***Syntax error on line 59 column 12 at '"'
What version of the product are you using? On what operating system?
Foma version 0.9.16alpha, running in Debian Testing Linux 3.1.0-1-amd64
Attached are the two Foma to test this.
Please provide any additional information below.
Original issue reported on code.google.com by [email protected]
on 19 Sep 2012 at 11:44
Attachments:
What steps will reproduce the problem?
1.
!!!hun41.lexc!!!
Multichar_Symbols +N +V +Nom +Pl
!Poss
+Posss1 +Posss3 +Posss3
!Genitiv
+Gen
+Genpl
!Cases
+Abl +Acc +Ade +All
+Cau +Dat +Del
+Ela +Fac +For
+Ill +Ine +Ins
+Sub +Sup +Ter
!Special cases
+Dis +Ess +Fam +Soc +Tem
LEXICON Root
Noun ;
LEXICON Noun
!+N:kar Plur;
!+N:kéz Plur;
!+N:kör Plur;
!+N:hajó Plur;
rege Scase;
rege Poss;
LEXICON Poss
+Posss1:^Bm Plur;
+Posss2:^Bd Plur;
+Posss3:^JC Plur;
Plur;
LEXICON Plur
+Plur:^Ok Fam;
Fam;
LEXICON Fam
+Fam:^ék Gen;
Gen;
LEXICON Gen
+Gen:^é Case;
+Genpl:^éi Case;
Case;
LEXICON Case
+Abl:^tUl #;
!+Acc:^Gk #;
!+Ade:^nHl #;
!+All:^hIz #;
!+Cau:^ért #;
!+Dat:^nKk #;
!+Del:^rUl #;
!+Ela:^bUl #;
+Fac:^VD #;
!+For:^ként #;
!+Ill:^nHk #;
!+Ine:^bHn #;
+Ins:^VFl #;
!+Sub:^rK #;
!+Sup:^Pn #;
!+Ter:^ig; #;
LEXICON Scase
!+Dis:^Lnként #;
+Ess:^Zl #;
!+Soc:^NstZl #;
+Tem:^kor #;
#;
### hun4.foma ###
# Vowels
define Vowel [ a | á | e | é | i | í | o | ó | u | ú | ü | ű | ö | ő ];
define BackVowel [ a | á | o | ó | u | ú ];
define FrontUnroundedVowel [ e | é | i | í | ü | ű ];
define FrontRoundedVowel [ ö | ő ];
define FrontVowel [e | é | i | í | ü | ű | ö | ő ];
# E to é: if any ending e-> é
define Etoee e -> é || _ "^" [ \0 ] ;
# Cleanup: remove morpheme boundaries
define Cleanup "^" -> 0;
#define DelRule O -> 0 || Vowel %^ _ ;
define HarmRuleO O -> 0 // Vowel %^ _ .o.
O -> o // BackVowel \Vowel+ _ ,,
O -> e // FrontUnroundedVowel \Vowel+ _ ,,
O -> ö // FrontRoundedVowel \Vowel+ _ ;
define HarmRuleB B -> 0 // Vowel %^ _ .o.
B -> o // BackVowel \Vowel+ _ ,,
B -> e // FrontUnroundedVowel \Vowel+ _ ,,
B -> ö // FrontRoundedVowel \Vowel+ _ ;
define HarmRuleA A -> 0 // Vowel %^ _ .o.
A -> a // BackVowel \Vowel+ _ ,,
A -> e // FrontVowel \Vowel+ _ ;
define HarmRuleC C -> a // BackVowel \Vowel+ _ .#. .o.
C -> e // FrontVowel \Vowel+ _ .#. .o.
C -> á // BackVowel \Vowel+ _ .o.
C -> é // FrontVowel \Vowel+ _ ;
define HarmRuleJ J -> j || Vowel %^ _ .o.
J -> 0 // \Vowel+ _ ;
define HarmRuleU U -> ó // BackVowel \Vowel+ _ ,,
U -> ő // FrontVowel \Vowel+ _ ;
define HarmRuleZ Z -> u // BackVowel \Vowel+ _ ,,
Z -> ü // FrontVowel \Vowel+ _ ;
define HarmRuleD D -> á // BackVowel \Vowel+ _ ,,
D -> é // FrontVowel \Vowel+ _ ;
define HarmRuleF F -> a // BackVowel \Vowel+ _ ,,
F -> e // FrontVowel \Vowel+ _ ;
define HarmRuleV V -> v || Vowel %^ _ ,,
V -> k || k %^ _ ,,
V -> m || m %^ _ ,,
V -> d || d %^ _ ,,
V -> r || r %^ _ ;
define Ablaut é -> e || _ z "^" [ \0 ] ;
read lexc hun41.lexc
define Lexicon
define Grammar Lexicon .o.
HarmRuleO .o.
HarmRuleB .o.
HarmRuleA .o.
HarmRuleJ .o.
HarmRuleU .o.
HarmRuleC .o.
HarmRuleZ .o.
HarmRuleD .o.
HarmRuleF .o.
HarmRuleV .o.
Ablaut .o.
Etoee .o.
Cleanup;
regex Grammar;
foma[1]: upper-words
rege
rege+Ins
rege+Fac
rege+Abl
rege+Genpl+Ins
rege+Genpl+Fac
rege+Genpl+Abl
rege+Gen+Ins
rege+Gen+Fac
rege+Gen+Abl
rege+Fam+Ins
rege+Fam+Fac
rege+Fam+Abl
rege+Fam+Genpl+Ins
rege+Fam+Genpl+Fac
rege+Fam+Genpl+Abl
rege+Fam+Gen+Ins
rege+Fam+Gen+Fac
rege+Fam+Gen+Abl
rege+Plur+Ins
rege+Plur+Fac
rege+Plur+Abl
rege+Plur+Genpl+Ins
rege+Plur+Genpl+Fac
rege+Plur+Genpl+Abl
rege+Plur+Gen+Ins
rege+Plur+Gen+Fac
rege+Plur+Gen+Abl
rege+Plur+Fam+Ins
rege+Plur+Fam+Fac
rege+Plur+Fam+Abl
rege+Plur+Fam+Genpl+Ins
rege+Plur+Fam+Genpl+Fac
rege+Plur+Fam+Genpl+Abl
rege+Plur+Fam+Gen+Ins
rege+Plur+Fam+Gen+Fac
rege+Plur+Fam+Gen+Abl
rege+Posss3+Ins
rege+Posss3+Fac
rege+Posss3+Abl
rege+Posss3+Genpl+Ins
rege+Posss3+Genpl+Fac
rege+Posss3+Genpl+Abl
rege+Posss3+Gen+Ins
rege+Posss3+Gen+Fac
rege+Posss3+Gen+Abl
rege+Posss3+Fam+Ins
rege+Posss3+Fam+Fac
rege+Posss3+Fam+Abl
rege+Posss3+Fam+Genpl+Ins
rege+Posss3+Fam+Genpl+Fac
rege+Posss3+Fam+Genpl+Abl
rege+Posss3+Fam+Gen+Ins
rege+Posss3+Fam+Gen+Fac
rege+Posss3+Fam+Gen+Abl
rege+Posss3+Plur+Ins
rege+Posss3+Plur+Fac
rege+Posss3+Plur+Abl
rege+Posss3+Plur+Genpl+Ins
rege+Posss3+Plur+Genpl+Fac
rege+Posss3+Plur+Genpl+Abl
rege+Posss3+Plur+Gen+Ins
rege+Posss3+Plur+Gen+Fac
rege+Posss3+Plur+Gen+Abl
rege+Posss3+Plur+Fam+Ins
rege+Posss3+Plur+Fam+Fac
rege+Posss3+Plur+Fam+Abl
rege+Posss3+Plur+Fam+Genpl+Ins
rege+Posss3+Plur+Fam+Genpl+Fac
rege+Posss3+Plur+Fam+Genpl+Abl
rege+Posss3+Plur+Fam+Gen+Ins
rege+Posss3+Plur+Fam+Gen+Fac
rege+Posss3+Plur+Fam+Gen+Abl
rege+Posss2+Ins
rege+Posss2+Fac
rege+Posss2+Abl
rege+Posss2+Genpl+Ins
rege+Posss2+Genpl+Fac
rege+Posss2+Genpl+Abl
rege+Posss2+Gen+Ins
rege+Posss2+Gen+Fac
rege+Posss2+Gen+Abl
rege+Posss2+Fam+Ins
rege+Posss2+Fam+Fac
rege+Posss2+Fam+Abl
rege+Posss2+Fam+Genpl+Ins
rege+Posss2+Fam+Genpl+Fac
rege+Posss2+Fam+Genpl+Abl
rege+Posss2+Fam+Gen+Ins
rege+Posss2+Fam+Gen+Fac
rege+Posss2+Fam+Gen+Abl
rege+Posss2+Plur+Ins
rege+Posss2+Plur+Fac
rege+Posss2+Plur+Abl
rege+Posss2+Plur+Genpl+Ins
rege+Posss2+Plur+Genpl+Fac
rege+Posss2+Plur+Genpl+Abl
rege+Posss2+Plur+Gen+Ins
rege+Posss2+Plur+Gen+Fac
rege+Posss2+Plur+Gen+Abl
Output stops here.
Using down, I can see, it knows also the rest:
foma[1]: down
apply down> rege+Posss2+Plur+Gen+Abl
regédekétől
apply down> rege+Posss1+Plur+Gen+Abl
regémekétől
apply down> rege+Posss1+Fam+Abl
regéméktől
However, Hungarian has 769 forms for each noun at a minimal test. We also have
at least 30 noun classes, that need to be tested individually. It is impossible
to test that much forms using up and down. I would suggest to show all valid
forms and valid words for the command, that makes testing possible.
Thanks in advance for help or support.
Original issue reported on code.google.com by [email protected]
on 1 Jan 2012 at 4:21
What steps will reproduce the problem?
1.
LEXICON Ninf
+Noun+Sg:0 #;
+Noun+Det:det #;
+Noun+Det:acc #;
+Noun+Pl:^s CaseN;
2.
We loose closing of case Det:
LEXICON Ninf
+Noun+Sg:0 #;
+Noun+Det:det
+Noun+Det:acc #;
+Noun+Pl:^s CaseN;
3.
What is the expected output?
Compiler should report an error
What do you see instead?
defined Cleanup: 268 bytes. 1 state, 2 arcs, Cyclic.
Root...3, Noun...7, Verb...6, Misc...1, Ninf...3, CaseN...1, Vinf...5, Nmisc...1
Building lexicon...
Determinizing...
Minimizing...
Done!
1.9 kB. 57 states, 75 arcs, 52 paths.
defined Lexicon: 1.9 kB. 57 states, 75 arcs, 52 paths.
defined Grammar: 2.4 kB. 72 states, 102 arcs, 52 paths.
defined Grammarup: 3.0 kB. 72 states, 102 arcs, 52 paths.
3.1 kB. 72 states, 110 arcs, 101 paths.
Foma, version 0.9.16alpha
What version of the product are you using? On what operating system?
Foma, version 0.9.16alpha
Linux, debian
Please provide any additional information below.
Original issue reported on code.google.com by [email protected]
on 6 Sep 2012 at 3:05
Attachments:
What steps will reproduce the problem?
1. Run the following code in foma:
regex [ ? -> x ];
save stack test.foma
2. Create a text file test.txt:
a
ab
abc
3. Run the text file through flookup:
cat test.txt | flookup -i -x test.foma
What is the expected output? What do you see instead?
Expected:
x
xx
xxx
What I see:
+?
+?
+?
What version of the product are you using? On what operating system?
flookup 1.02 (foma library version 0.9.16alpha) on OpenSUSE 12.1.
Original issue reported on code.google.com by [email protected]
on 9 Oct 2012 at 12:58
What steps will reproduce the problem?
1. define IitoiAcc3 í -> i || .#. \%^* _ [ DCons | Cons ] "^" ?* [ "+"
{Acc} ] ?* ;
2. ...
3. define Grammar Lexicon .o.
IitoiAcd3; # mistyped Acc as Acd
Foma loads things nicely, no warning or error,
but there is no output:
apply down> ég+Noun+Nom
???
It would be nicer, if foma gave a warning or error
message in such cases. (mistyped rule names)
Original issue reported on code.google.com by [email protected]
on 13 Jan 2012 at 4:46
What steps will reproduce the problem?
I define 25 output combination in lexc:
LEXICON Ninf
+N+Nom:0 #;
+N+Pl:^Ok #;
+N+Abl:t^Ul #;
+N+Gen:é #;
+N+Fam:ék #;
+N+Fam+Abl:ékt^Ul #;
+N+Fam+Gen+Abl:ékét^Ul #;
+N+Ess:^Zl #;
+N+Posss1:^Bm #;
+N+Posss2:^Bd #;
+N+Posss3:^J^A #;
+N+Posss1+Abl:^Bmt^Ul #;
+N+Posss2+Abl:^Bdt^Ul #;
+N+Posss3+Abl:^J^Ct^Ul #;
+N+Pl+Gen:^Oké #;
+N+Fam+Gen:éké #;
+N+Pl+Abl:^Okt^Ul #;
+N+Gen+Abl:ét^Ul #;
+N+Pl+Gen+Abl:^Okét^Ul #;
+N+Posss1+Gen:^Bmé #;
+N+Posss2+Gen:^Bdé #;
+N+Posss3+Gen:^J^Cé #;
+N+Posss1+Gen+Abl:^Bmét^Ul #;
+N+Posss2+Gen+Abl:^Bdét^Ul #;
+N+Posss3+Gen+Abl:^J^Cét^Ul #;
What is the expected output? What do you see instead?
I would expect 25 output combinations, but I get 37:
foma[28]: upper-words
rege+N+Nom
rege+N+Gen
rege+N+Gen+Abl
rege+N+Fam
rege+N+Fam+Abl
rege+N+Fam+Gen
rege+N+Fam+Gen+Abl
rege+N+Abl
rege+N+Posss2+Gen
rege+N+Posss2+Gen+Abl
rege+N+Posss2+Abl
rege+N+Posss2
rege+N+Posss1+Gen
rege+N+Posss1+Gen+Abl
rege+N+Posss1+Abl
rege+N+Posss1
rege+N+Ess
rege+N+Pl+Abl
rege+N+Pl+Gen
rege+N+Pl+Gen+Abl
rege+N+Pl
rege+N+Posss3+Gen
rege+N+Posss3+Gen+Abl
rege+N+Posss3+Abl
rege+N+Posss3
rege+N+Posss2+Gen
rege+N+Posss2+Gen+Abl
rege+N+Posss2+Abl
rege+N+Posss2
rege+N+Posss1+Gen
rege+N+Posss1+Gen+Abl
rege+N+Posss1+Abl
rege+N+Posss1
rege+N+Pl+Abl
rege+N+Pl+Gen
rege+N+Pl+Gen+Abl
rege+N+Pl
(12 get duplicated, the last 12)
What version of the product are you using? On what operating system?
Foma, version 0.9.16alpha
Linux
Please provide any additional information below.
foma and lex files attached
Original issue reported on code.google.com by [email protected]
on 29 Dec 2011 at 4:53
Attachments:
What steps will reproduce the problem?
1. foma -l enhu1.foma
What is the expected output? What do you see instead?
Expected compilation
Instead I see:
Fatal error: out of memory
: Cannot allocate memory
What version of the product are you using? On what operating system?
Foma, version 0.9.16alpha
Linux en-desktop 2.6.31-14-generic #48-Ubuntu SMP Fri Oct 16 14:04:26 UTC 2009
i686 GNU/Linux
Please provide any additional information below.
Original issue reported on code.google.com by [email protected]
on 6 Jan 2012 at 5:20
Attachments:
What steps will reproduce the problem?
1. Write the attached code (filename: bug_report.cpp)
2. Compile using "g++ -ggdb3 -o bug_report bug_report.cpp -lfoma"
3. Run "valgrind --leak-check=full ./bug_report"
Expected result: no leaks and no errors.
Present result: Leak (25 allocs, 20 frees) and error (invalid write)
What version of the product are you using? On what operating system?
Foma Api (0.9.16alpha (20111213) according to first line in the changelog). I
changed the Makefile to compile with flag -ggdb3 for debug purposes. I also
changed prefix to "/usr".
Archlinux quite recently updated.
Please provide any additional information below.
The attached files are the C++ code and valgrind output.
Original issue reported on code.google.com by [email protected]
on 8 May 2012 at 6:25
Attachments:
What steps will reproduce the problem?
1. in Foma file there is:
define Grammarfnige Lexiconfnige .o.
Etoee .o.
Atoaa .o.
Changez2 ,o, <---- here ,o, instead of .o.
...
Cleanup .o.
ToUpCase;
Compiler says:
defined Lexiconfnige: 376.0 kB. 12797 states, 23924 arcs, 2520580612 paths.
2299.34-2299.34: error: ***syntax error at ','.
">>> read in enhufnnum <<<"
No indication in which file or in which line is the error.
What is the expected output? What do you see instead?
Compiler should say:
Error in file enhu2.foma
line 2277: , instead of .
What version of the product are you using? On what operating system?
Foma, version 0.9.16alpha
debian linux
Please provide any additional information below.
Original issue reported on code.google.com by [email protected]
on 24 Aug 2012 at 11:13
What steps will reproduce the problem?
verb fst is called hun42.fst. It works fine alone:
en@en-desktop ~/program/foma/tktest $ kill_flookup.sh[1]+ Kilőve
en@en-desktop ~/program/foma/tktest $ flookup -i -S -A 127.0.0.1
/home/en/program/foma/tktest/hun42.fst &
[1] 4770
en@en-desktop ~/program/foma/tktest $ Started flookup server on 127.0.0.1 port
6062
en@en-desktop ~/program/foma/tktest $ echo "abnormalitásoz+Verb+CondDefPl3" |
nc -w 1 -u 127.0.0.1 6062
abnormalitásoz+Verb+CondDefPl3 Abnormalitásoznák
abnormalitásoz+Verb+CondDefPl3 abnormalitásoznák
-------------------------------------
Here I combine verbs with the other word types:
cat hfnnum.foma
read regex @"hun41.fst" | @"hun42.fst" | @"hunnum.fst" | @"hunadj.fst" |
@"hunfxpp.fst" | @"hunmisc.fst";
en@en-desktop ~/program/foma/tktest $ cat crfnnum.sh
foma -l hfnnum.foma <savestackfnnum.sh
cat savestackfnnum.sh
save stack hunfnnum.fst
----------------------------------------------
Here I try the combined fst file:
kill_flookup.sh
[1]+ Kilőve
flookup -i -S -A 127.0.0.1 /home/en/program/foma/tktest/hun42.fst
$ cat do_test.sh
kill_flookup.sh
flookup -i -S -A 127.0.0.1 /home/en/program/foma/tktest/hunfnnum.fst &
en@en-desktop ~/program/foma/tktest/tools/fomaallchk $ sh do_test.sh
en@en-desktop ~/program/foma/tktest/tools/fomaallchk $ Started flookup server
on 127.0.0.1 port 6062
en@en-desktop ~/program/foma/tktest $ echo "abnormalitásoz+Verb+CondDefPl3" |
nc -w 1 -u 127.0.0.1 6062
abnormalitásoz+Verb+CondDefPl3 ?+
It works for lots of cases, but not for ConjIndef... , ConjDef...,
CondIndef..., CondDef...
Also just using foma -l ....foma shows the same results; the problem is not
flookup related.
What is the expected output? What do you see instead?
expected:
abnormalitásoz+Verb+CondDefPl3 Abnormalitásoznák
I see instead:
abnormalitásoz+Verb+CondDefPl3 ?+
What version of the product are you using? On what operating system?
foma 0.9.16alpha (from svn)
linux debian
Please provide any additional information below.
Original issue reported on code.google.com by [email protected]
on 16 Aug 2012 at 2:58
This is a request for future versions.
There are often grammatical versions, that are more often used, and others,
that seldom. For example in Hungarian possession 3-rd person is expressed with
a/e or with ja/je. I can say, tor-a, but also tor-ja for 'his tor'. For
translation applications it would be helpful, if the more often used version
were weighted; Program would then generate in the more often used version, but
it would understand even the less often used version.
What is more or less used, is individual, and must be set up for each word (in
some cases for each word group) individually.
Original issue reported on code.google.com by [email protected]
on 6 Feb 2012 at 10:27
What steps will reproduce the problem?
echo regék | flookup -x hun41.foma
File format error foma!
: Success
File error: hun41.foma
Hun41 lexc/foma look:
!!!hun41.lexc!!!
Multichar_Symbols +N +V +Nom +Pl
!Poss
+Posss1 +Posss2 +Posss3 +Possp1 +Possp2 +Possp3
+Posss1p +Posss2p +Posss3p +Possp1p +Possp2p +Possp3p
!Genitiv
+Gen
+Genpl
!Cases
+Abl +Acc +Ade +All
+Cau +Dat +Del
+Ela +Fac +For
+Ill +Ine +Ins
+Sub +Sup +Ter
!Special cases
+Dis +Ess +Fam +Soc +Tem
LEXICON Root
Noun ;
LEXICON Noun
!+N:kar Plur;
!+N:kéz Plur;
!+N:kör Plur;
!+N:hajó Plur;
rege Poss;
LEXICON Poss
+Dis:^Lnként #;
+Ess:^Zl #;
+Soc:^NstZl #;
+Tem:^kor #;
+Posss1:^Bm Plur;
+Posss2:^Bd Plur;
+Posss3:^JC Plur;
+Possp1:^Hnk Plur;
+Possp2:^KtQk Plur;
+Possp3:^JRk Plur;
+Posss1p:^STim Plur;
+Posss2p:^STid Plur;
+Posss3p:^STi Plur;
+Possp1p:^STink Plur;
+Possp2p:^STitWk Plur;
+Possp3p:^STik Plur;
Plur;
LEXICON Plur
+Plur:^Ok Fam;
Fam;
LEXICON Fam
+Fam:^ék Gen;
Gen;
LEXICON Gen
+Gen:^é Case;
+Genpl:^éi Case;
Case;
! H, K, unused
LEXICON Case
+Abl:^tUl #;
+Acc:^Gt #;
+Ade:^nDl #;
+All:^hIz #;
+Cau:^ért #;
+Dat:^nFk #;
+Del:^rUl #;
+Ela:^bUl #;
+Fac:^VD #;
+For:^ként #;
+Ill:^bF #;
+Ine:^bFn #;
+Ins:^VFl #;
+Sub:^rF #;
+Sup:^Pn #;
+Ter:^ig; #;
### hun4.foma ###
# Vowels
define Vowel [ a | á | e | é | i | í | o | ó | u | ú | ü | ű | ö | ő ];
define BackVowel [ a | á | o | ó | u | ú ];
define FrontUnroundedVowel [ e | é | i | í | ü | ű ];
define FrontRoundedVowel [ ö | ő ];
define FrontVowel [e | é | i | í | ü | ű | ö | ő ];
# E to é: if any ending e-> é
define Etoee e -> é || _ "^" [ \0 ] ;
# Cleanup: remove morpheme boundaries
define Cleanup "^" -> 0;
#define DelRule O -> 0 || Vowel %^ _ ;
define HarmRuleO O -> 0 // Vowel %^ _ .o.
O -> o // BackVowel \Vowel+ _ ,,
O -> e // FrontUnroundedVowel \Vowel+ _ ,,
O -> ö // FrontRoundedVowel \Vowel+ _ ;
define HarmRuleB B -> 0 // Vowel %^ _ .o.
B -> o // BackVowel \Vowel+ _ ,,
B -> e // FrontUnroundedVowel \Vowel+ _ ,,
B -> ö // FrontRoundedVowel \Vowel+ _ ;
define HarmRuleA A -> 0 // Vowel %^ _ .o.
A -> a // BackVowel \Vowel+ _ ,,
A -> e // FrontVowel \Vowel+ _ ;
define HarmRuleC C -> a // BackVowel \Vowel+ _ .#. .o.
C -> e // FrontVowel \Vowel+ _ .#. .o.
C -> á // BackVowel \Vowel+ _ .o.
C -> é // FrontVowel \Vowel+ _ ;
define HarmRuleJ J -> j || Vowel %^ _ .o.
J -> 0 // \Vowel+ _ ;
define HarmRuleU U -> ó // BackVowel \Vowel+ _ ,,
U -> ő // FrontVowel \Vowel+ _ ;
define HarmRuleZ Z -> u // BackVowel \Vowel+ _ ,,
Z -> ü // FrontVowel \Vowel+ _ ;
define HarmRuleD D -> á // BackVowel \Vowel+ _ ,,
D -> é // FrontVowel \Vowel+ _ ;
define HarmRuleF F -> a // BackVowel \Vowel+ _ ,,
F -> e // FrontVowel \Vowel+ _ ;
define HarmRuleV V -> v || Vowel %^ _ ,,
V -> k || k %^ _ ,,
V -> m || m %^ _ ,,
V -> d || d %^ _ ,,
V -> r || r %^ _ ;
define HarmRuleG G -> 0 // Vowel %^ _ .o.
G -> o // BackVowel \Vowel+ _ ,,
G -> e // FrontUnroundedVowel \Vowel+ _ ,,
G -> ö // FrontRoundedVowel \Vowel+ _ ;
define HarmRuleI I -> o // BackVowel \Vowel+ _ ,,
I -> e // FrontUnroundedVowel \Vowel+ _ ,,
I -> ö // FrontRoundedVowel \Vowel+ _ ;
define HarmRuleP P -> 0 // Vowel %^ _ .o.
P -> o // BackVowel \Vowel+ _ ,,
P -> e // FrontUnroundedVowel \Vowel+ _ ,,
P -> ö // FrontRoundedVowel \Vowel+ _ ;
define HarmRuleL L -> 0 // Vowel %^ _ .o.
L -> o // BackVowel \Vowel+ _ ,,
L -> e // FrontUnroundedVowel \Vowel+ _ ,,
L -> ö // FrontRoundedVowel \Vowel+ _ ;
define HarmRuleN N -> 0 // Vowel %^ _ .o.
N -> o // BackVowel \Vowel+ _ ,,
N -> e // FrontUnroundedVowel \Vowel+ _ ,,
N -> ö // FrontRoundedVowel \Vowel+ _ ;
define HarmRuleH H -> 0 // Vowel %^ _ .o.
H -> u // BackVowel \Vowel+ _ ,,
H -> ü // FrontVowel \Vowel+ _ ;
define HarmRuleK K -> 0 // Vowel %^ _ .o.
K -> o // BackVowel \Vowel+ _ ,,
K -> e // FrontUnroundedVowel \Vowel+ _ ,,
K -> ö // FrontRoundedVowel \Vowel+ _ ;
define HarmRuleQ Q -> o // BackVowel \Vowel+ _ ,,
Q -> e // FrontUnroundedVowel \Vowel+ _ ,,
Q -> ö // FrontRoundedVowel \Vowel+ _ ;
define HarmRuleR R -> u // BackVowel \Vowel+ _ ,,
R -> ü // FrontVowel \Vowel+ _ ;
define HarmRuleS S -> j || Vowel %^ _ .o.
S -> 0 // \Vowel+ _ ;
define HarmRuleT T -> a // BackVowel \Vowel+ _ ,,
T -> e // FrontVowel \Vowel+ _ ;
define HarmRuleW W -> o // BackVowel \Vowel+ _ ,,
W -> e // FrontVowel \Vowel+ _ ;
define Ablaut é -> e || _ z "^" [ \0 ] ;
read lexc hun41.lexc
define Lexicon
define Grammar Lexicon .o.
HarmRuleO .o.
HarmRuleB .o.
HarmRuleA .o.
HarmRuleJ .o.
HarmRuleU .o.
HarmRuleC .o.
HarmRuleZ .o.
HarmRuleD .o.
HarmRuleF .o.
HarmRuleV .o.
HarmRuleG .o.
HarmRuleI .o.
HarmRuleP .o.
HarmRuleL .o.
HarmRuleN .o.
HarmRuleH .o.
HarmRuleK .o.
HarmRuleQ .o.
HarmRuleR .o.
HarmRuleS .o.
HarmRuleT .o.
HarmRuleW .o.
Ablaut .o.
Etoee .o.
Cleanup;
regex Grammar;
Original issue reported on code.google.com by [email protected]
on 1 Jan 2012 at 7:03
What steps will reproduce the problem?
If I open foma and run 'source phonology.foma' and then 'save stack
phon_bin.foma', I get a binary file that works as expected, i.e., I can run
'echo "word" | flookup -x phon_bin.foma' and I get a correct parse.
However, if I try to repeat this procedure using Python's subprocess module,
the binary file generated is not as expected. The Python script is basically
as follows (with absolute paths replacing the filenames).
import subprocess
process = subprocess.Popen(['foma'], shell=False, stdin=subprocess.PIPE,
stdout=subprocess.PIPE)
process.stdin.write('source phonology.foma')
process.stdin.write('save stack phon_bin.foma')
What is the expected output? What do you see instead?
I expect to get the same output (i.e., binary file) from the command line as I
do from the Python script. However, the Python strategy results in an FSM with
2 states (678 bytes. 2 states, 2 arcs, 2 paths) while the command line strategy
correctly results in one with 20 (1.0 kB. 20 states, 24 arcs, 10 paths).
What version of the product are you using? On what operating system?
Foma, version 0.9.14alpha
Mac OS X 10.6.8
Python 2.5/2.6
Please provide any additional information below.
Is my approach incorrect? The goal is to be able to generate FSTs using foma
from within a python application.
Is it possible to use foma to convert a foma script into a binary
representation right from the command line?
Original issue reported on code.google.com by [email protected]
on 17 Feb 2012 at 4:16
What steps will reproduce the problem?
1. Create parallel Regular expression for two replacement rules
2. like: regex p -> "pl" ,, "p2" -> "p4" ;
my question is there any tool or library to visualize the compiled regular
expression as FST network plotted visually.
Thanks
Original issue reported on code.google.com by [email protected]
on 16 Jan 2013 at 11:23
What steps will reproduce the problem?
foma file reads in lexc files like:
define Grammar7 Filter1 .o.
Lexicon7 .o.
CleanupEndings .o.
HarmRuleAAA .o. <--- if instead of .o. there is .o
...
HarmRuleszorelolmely .o.
Cleanup;
one after the other.
If in one such set there is .o or o. or o instead of .o., then the effect is:
No error or warning, but every word of this lexc file is reported unknown, like:
apply up> ballag
???
apply up>
What is the expected output? What do you see instead?
It would be very nice, it this kind of error were reported at compilation time.
It took me several hours the find out the source of the problem now.
Original issue reported on code.google.com by [email protected]
on 13 Mar 2012 at 6:02
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.