mhulden / foma Goto Github PK

View Code? Open in Web Editor NEW

114.0 114.0 90.0 941 KB

Automatically exported from code.google.com/p/foma

C 82.69% Lex 7.19% Perl 0.66% JavaScript 0.29% Python 5.38% Yacc 2.84% Tcl 0.19% Shell 0.05% CMake 0.71%

foma's People

Contributors

Stargazers

Watchers

Forkers

noa brennan-m zephyr1999 theg4sh ttti07 jiapulidoar kusackoo antmeyer thorsonlinguistics spetitjean mattpen sarahrmoeller grantbaker r0ller august-yeom cscott eddieantonio bmwiedemann talitagroetzinger a100q100 kartikm lansang ryotamono ambientlighter simon-clematide max-ionov hutengdai jeremalinen pombredanne jessiz bencrowder mahaalkh kellmanaspnes brunoga snomos tdadoly mcheng24 kimyabuckner reedoei projecttermina beileixiang dowobeha tresoldi huulivoide clarissa2448 vpv salmedina ftyers kaungmt andrewdotn thiennhanng191 amulyakhurana orcuntasdemir hinantin flammie tinodidriksen mike-fabian muhammadtabishrao min4st1r1th mukhammadsaid19 samirshk victorvilanovab gdevigili hzh1-cl shygnome i-lovelife wenhycs waino thfransen84 alierdogan7 jeevanprakash0814 styfeng reecesu zal-khan ebakovic breno-azevedo barrafas evankozierok altschmerz wincentbalin tsukikage krzki scott-parkhill culturefoundryca moocow-the-bovine fweimer-rh dhdaines nonoco1200

foma's Issues

For compilation such quiet mode would be good, that printed only errors

What steps will reproduce the problem?
1. A make file does: foma -l  blabla.foma <savestackblabla.sh
2. savestackblabla is a file, contents: save stack blabla.fst
and repeats the above several times.

At running there are numerous lines reporting:
....
defined Lexiconnum2: 16.3 kB. 572 states, 901 arcs, Cyclic.
defined Grammarnum2: 21.5 kB. 379 states, 1151 arcs, Cyclic.
">>> read in enhunum3 <<<"                    
Root...7, NumPref2...58, NumPref3...30, NumPref12...6, NumPref13...5, 
NumPref14...2, Num...22, Num1...2, Fel...1, AddNum...3, Poss...18, Plur...2, 
Fam...2, Gen...3, Case...20, Szor...3, Sub...4
Building lexicon...
Determinizing...
Minimizing...
Done!
16.4 kB. 576 states, 906 arcs, Cyclic.
defined Lexiconnum3: 16.4 kB. 576 states, 906 arcs, Cyclic.
defined Grammarnum3: 14.8 kB. 263 states, 720 arcs, Cyclic.
...
and so on

-q flag does not change this, it just suppresses prompt display.

It would be good a --silence flag, that would cause to suppress all the above 
information, and display only messages, that indicate error or warning 
due to problems in user created  .foma and .lexc script itself. User included
echo-s (in .foma scripts)  should be also displayed then, maybe they could 
be also suppressable with an additional flag.

At present, if user edits, and recompiles, he gets thousands of lines,
he has to save and then search in the saved information for lines indicating
errors or warnings. That burden could be easily saved with a --silence flag.

I attached the Makefile, and also attached the list of .lexc files for 
Hungarian, about 110 files, (plus 6 .foma files) and the make output, which 
contains 1846 lines

Original issue reported on code.google.com by [email protected] on 10 Jan 2013 at 7:17

Attachments:

Left repalce epenthesis rule don't work

What steps will reproduce the problem?
regex  a <- [. .] ;


What is the expected output? What do you see instead?

I am getting this error:
1.11-1.11: error: ***syntax error at ';'.

What version of the product are you using? On what operating system?
Foma, version 0.9.16alpha, Ubuntu

Original issue reported on code.google.com by [email protected] on 21 Dec 2012 at 1:49

Question about upcase solution

I solved the problem to also recognize upcase words using an upcase converter 
like:

define ToUpcase a -> A || .#. _ ,,
                á -> Á || .#. _ ,,
                b -> B || .#. _ ,,
                c -> C || .#. _ ,,
                d -> D || .#. _ ,,
                e -> E || .#. _ ,,
                é -> É || .#. _ ,,
                f -> F || .#. _ ,,
                g -> G || .#. _ ,,
                h -> H || .#. _ ,,
                i -> I || .#. _ ,,
                í -> Í || .#. _ ,,
                j -> J || .#. _ ,,
                k -> K || .#. _ ,,
                l -> L || .#. _ ,,
                m -> M || .#. _ ,,
                n -> N || .#. _ ,,
                o -> O || .#. _ ,,
                ó -> Ó || .#. _ ,,
                ö -> Ö || .#. _ ,,
                ő -> Ő || .#. _ ,,
                p -> P || .#. _ ,,
                q -> Q || .#. _ ,,
                r -> R || .#. _ ,,
                s -> S || .#. _ ,,
                t -> T || .#. _ ,,
                u -> U || .#. _ ,,
                ú -> Ú || .#. _ ,,
                ü -> Ü || .#. _ ,,
                ű -> Ű || .#. _ ,,
                v -> V || .#. _ ,,
                w -> W || .#. _ ,,
                x -> X || .#. _ ,,
                y -> Y || .#. _ ,,
                z -> Z || .#. _ ;

and by doubling all grammars having a normal and an upcase version:
define Grammar Lexicon           .o. 
               ConsonantDoubling .o. 
               EDeletion         .o. 
               EInsertion        .o. 
               YReplacement      .o. 
               KInsertion        .o. 
               Cleanup;

define Grammarup Lexicon           .o. 
               ToUpcase          .o. 
               ConsonantDoubling .o. 
               EDeletion         .o. 
               EInsertion        .o. 
               YReplacement      .o. 
               KInsertion        .o. 
               Cleanup;

regex Grammar | Grammarup;

Attached the complete project.

The approach has two disadvantages:
1. I have to double all grammars
2. using down:
foma[1]: down
apply down> cat+N+Sg
cat
Cat
apply down> Peter+N+Sg
Peter

I also get for cat+N+Sg Cat, which is obvious and in fact unnecessary.

Is there a more elegant way to solve up/lower case, or is my one the optimal 
one?

Thanks in advance.

Original issue reported on code.google.com by [email protected] on 29 Jul 2012 at 12:28

Attachments:

Context restriction doesn't work with term negation

What steps will reproduce the problem?
1. read regex a => b _ c;
2. read regex [a]+ => b _ c;
3. read regex [\a]+ => b _ c;
4. read regex [\a]+ => .#. _;

What is the expected output? What do you see instead?
1. works
2. works
3. accepts a* ?!
4. Error message Symbol '@#@' not found in network!

What version of the product are you using? On what operating system?
0.9.17alpha

Please provide any additional information below.
I want to do sth like this in lexc (i.e., swallow +Pref+):

LEXICON Root
+Pref+:0        Second;
<[[\{+}]+ => .#. _] @-> ... {/}> Pos;

LEXICON Second

<[[\{+}]+ => .#. _] @-> ... {/}> Pos;

Original issue reported on code.google.com by [email protected] on 27 Jan 2013 at 2:50

flookup breaks line longer than 2048 chars

Lines longer than 2048 characters are broken into multiple lines by flookup 
(and then processed line-by-line).
Is this intentional?

I'm using flookup 1.0 (foma library version 0.9.14alpha)
on Debian GNU/Linux kernel 2.6.32-bpo.5-xen-amd64

Original issue reported on code.google.com by [email protected] on 17 Jul 2012 at 9:12

Forgot one blank in a define enumeration certain words not shown

What steps will reproduce the problem?
Erroneous:
define ConsAeJaje [ g | k | l | m | n | p | r | t | v | gy | n y | t y ];
correct: 
define ConsAeJaje [ g | k | l | m | n | p | r | t | v | gy | n y | t y ];

Rule that uses it, looks:
define HarmRuleJ §J§ -> j || .#. \"^"* [ Vowel | ConsJaje ] %^ ?* _ .o.
                 §J§ -> j || .#. \"^"* ConsAeJaje %^ ?* _ ,,
                 §J§ -> 0 || .#. \"^"* ConsAeJaje %^ ?* _ .o.
                 §J§ -> 0 || .#. \"^"* ConsAe %^ ?* _ ;


If erroneous, foma behaves like:
foma[1]: down
apply down> nagy+Noun+Nom
???

What is the expected output? What do you see instead?

It were nicer, if foma told at compilation time, the define line is 
not OK, or would somehow instruct at execution time,
why is the searched word (nagy) not found.

Original issue reported on code.google.com by [email protected] on 27 Jan 2012 at 5:40

Attachments:

fsav.tar.gz

med NON-ASCII does not recognize correctly

What steps will reproduce the problem?
See the following foma script:

echo BUGCOMMENT: Everything works fine for ASCII networks and ASCII input
echo
echo read regex n ;
read regex n ;

echo
echo BUGCOMMENT: Computing med n
med n

echo BUGCOMMENT: BUG with non-ASCII input and non-ASCII network
echo BUGCOMMENT: ñ is not recognized by med ñ
echo
echo read regex ñ ;
read regex ñ ;

echo
echo BUGCOMMENT: Computing med ñ
med ñ



What is the expected output? What do you see instead?
I would expect to see the same costs for n and ñ.

BUGCOMMENT: Everything works fine for ASCII networks and ASCII input

read regex n ;
194 bytes. 2 states, 1 arcs, 1 path.

BUGCOMMENT: Computing med n
Calculating heuristic [h]
Using Levenshtein distance.

n
n
Cost[f]: 0


n*
*n
Cost[f]: 2


*n
n*
Cost[f]: 2

BUGCOMMENT: BUG with non-ASCII input and non-ASCII network
BUGCOMMENT: Cost[f]: 0 is missing

read regex ñ ;
195 bytes. 2 states, 1 arcs, 1 path.

BUGCOMMENT: Computing med ñ
Calculating heuristic [h]
Using Levenshtein distance.

*ñ
ñ
Cost[f]: 2


ñ*
ñ
Cost[f]: 2


*ñ*
###
Cost[f]: 3

What version of the product are you using? On what operating system?
0.9.15 on linux and macos

Please provide any additional information below.

Original issue reported on code.google.com by [email protected] on 23 Nov 2011 at 12:22

Attachments:

med_bug.xfst

error line reported is

What steps will reproduce the problem?
1. In line 4412 of a lexc file there is an erroneous line:
bős-nagymarosi AddNoun;;  
(There are accidentally 2 ;;-s at the end, column 24)
2. Foma reports:
***Syntax error on line 4403 column 26 at ';'
which is 9 lines away from the actual error

What is the expected output? What do you see instead?
It would be nicer, if foma reported:
***Syntax error on line 4412 column 24 at ';'

Complete test environment attached.


What version of the product are you using? On what operating system?


Please provide any additional information below.

Original issue reported on code.google.com by [email protected] on 1 Feb 2012 at 1:04

Attachments:

fsav.tar.gz

Request:Compounding example

In the fsmbook there is a so called compounding exercise (p 248), but it is not 
worked out, with other words, it does not show, how to compound words and how 
to filter out unlikely compounds. Also not a single word is said in fsmbook 
about specific German compounding, where nouns are upper case. It would be very 
good, if foma documentation would give some working examples for compounding.

Original issue reported on code.google.com by [email protected] on 8 Jan 2012 at 9:48

foma file does not see nulls in lexc variable names.

What steps will reproduce the problem?
in .lexc file:
LEXICON Case
+Abl:^+cAbl^t§oooe20§l     #;  ! tól, től
...

.foma file sees this as §oooe2§, it does not see the 0. (null) 

This is a minor issue. If this is a feature (hopefully not), should
be documented.

Original issue reported on code.google.com by [email protected] on 12 Jan 2012 at 3:32

zero-plus behaves like one-plus

What steps will reproduce the problem?
1. re "a";
2. zero-plus
3. write att

What is the expected output? What do you see instead?
Expected:
0   1   a   a
1   1   a   a
0
1
Seen:
0   1   a   a
1   1   a   a
1


What version of the product are you using? On what operating system?
Foma, version 0.9.14alpha; tested on Ubuntu Linux 10.04 and Mac OS 10.7.

Please provide any additional information below.
Also tested with several other automata. Explicit Kleene star in the regex 
works.

Original issue reported on code.google.com by [email protected] on 27 Jan 2012 at 9:31

"read lexc" gives "defined but not used" warnings

Let me start with expressing my gratitude for making this neat tool available.

What steps will reproduce the problem?
1. execute "foma"
2. foma[0]: "read lexc xhosa-utf8.lex" (attached)

What is the expected output? What do you see instead?
=> foma outputs: "152 bytes. 1 states, 0 arcs, 0 paths" (along with "defined 
but not used" warnings)
=> xfst reads the same file and outputs: "38.2 Kb. 382 states, 824 arcs, 
4070784 paths"
=> While definitions in this file are not used in the same file, they are used 
in other scripts read afterwards. 

What version of the product are you using? On what operating system?
=> 0.9.15alpha for linux x86_64

Please provide any additional information below.
=> A small nuisance I experienced porting an xfst script on foma: definitions 
like "define Syll[Combos Vowels];" give an error cuz it expects a space after 
"Syll". But this is not a big deal.

Thanks,
waleed

Original issue reported on code.google.com by [email protected] on 14 Nov 2011 at 10:14

Attachments:

xhosa-utf8.lex

view net blows up if a symbol contains \"

See the description of issue #49.

Original issue reported on code.google.com by [email protected] on 17 Jul 2013 at 12:10

Request: "Otherwise" rule

If there is a character that can take multiple values, it would be useful, if 
there existed an "otherwise" rule.

Example: §aoe§ set up in the .lexc file can be a, o or e.
in .foma file:
define Ruleforaoe §aoe§-> a // Rule1,,
                  §aoe§-> o // Rule2 | Rule3,,
                  §aoe§-> e // _Otherwise_;

The advantage for the user, that he does not have to think out any special rule 
for passing the value 'e' to the character, just simply say _Otherwise_.

Original issue reported on code.google.com by [email protected] on 20 Feb 2012 at 3:46

An empty continuation class will insert the last word of the previous comment line into the output for "print upper"

What steps will reproduce the problem?

I have an empty continuation class named CC00 which I use to terminate words. 
For example:

#clase de continuación para terminar palabras:
LEXICON CC00
# ;

#Sufijos para sustantivos (la ruta más general)
LEXICON CC_sust
[Diminuitivo]:+itu  CC_sust_bif;
0:0                 CC_sust_bif;
CC00;


What is the expected output? What do you see instead?

When I run the code above, I see that the last word ("palabras") in the 
comment: 
  #clase de continuación para terminar palabras:  
is inserted in the output to the "print upper" command in place of the empty 
continuation class CC00.

For example: 

vato@debamos:~/corrector/foma-training/qu3$ foma -l simikuna.foma
Root...4, Sustantivos...7, Verbos...5, Adjetivos...3, Adverbios...2, CC00...1, 
CC_sust...3, CC_sust_diminuitivos_2_generos...3, CC_sust_bif...2, 
CC_sust_numero_primero...2, CC_sust_numero_primero_bif...2, 
CC_sust_limitativo_a_pos...2, CC_sust_posesivos_a_caso...9, 
CC_sust_posesivos_bif...2, CC_sust_posesivos_primero...9, 
CC_sust_numero_a_lim...2, CC_sust_numero_bif...2, 
CC_sust_limitativo_a_caso...2, CC_sust_caso...11, CC_sust_continuativos...3, 
CC_sust_puni...2, CC_sust_caso_y_limitativo...3, CC_sust_cliticos...15, 
CC_verbalizadores...6, CC_verb01...3, CC_verb02...6, CC_verb_rqu...2, 
CC_verb_ri...2, CC_verb_chi...2, CC_verb_tata...2, CC_verb_rpari...2, 
CC_verb_kipa...2, CC_verb_ysi...2, CC_verb_infijos_generales...9, 
CC_verb_limitativo...2, CC_verb_progresivo...2, CC_verb_tiempo_persona...12, 
CC_ind_pres...26, CC_ind_fut...26, CC_ind_perf...25, CC_ind_plusc...25, 
CC_poten...28, CC_subj...19, CC_oblig...25, CC_inf...3, CC_imp...14, 
CC_part_pres...3, CC_part_pas...26, CC_agentivo...3, CC_verb_raq...3, 
CC_verb_puni...2, CC_verb_sina...2, CC_verb_taq...1, CC_verb_cliticos...12, 
CC_adv_cliticos...12, CC_adv_limitativo...2, CC_adv_acusativo...3, CC_adv...1, 
CC_adj...1
Building lexicon...
*Warning: lexicon 'CC_agentivo' defined but not used
***Warning: lexicon 'CC_part_act' used but never defined
*Warning: lexicon 'CC_verbalizadores' defined but not used
***Warning: lexicon 'CC_sust_raq' used but never defined
Determinizing...
Minimizing...
Done!
34.2 kB. 1449 states, 2086 arcs, 13079858235 paths.
defined LEX: 34.2 kB. 1449 states, 2086 arcs, 13079858235 paths.
defined Cons: 1.2 kB. 5 states, 26 arcs, 32 paths.
defined Vocales: 671 bytes. 12 states, 11 arcs, 1 path.
defined Niyuq: 3.9 kB. 11 states, 193 arcs, Cyclic.
defined Rclean: 332 bytes. 1 state, 2 arcs, Cyclic.
defined Morfo: 35.4 kB. 1458 states, 2158 arcs, 13079858235 paths.
35.4 kB. 1458 states, 2158 arcs, 13079858235 paths.
Foma, version 0.9.16alpha
Copyright © 2008-2011 Mans Hulden
This is free software; see the source code for copying conditions.
There is ABSOLUTELY NO WARRANTY; for details, type "help license"

Type "help" to list all commands available.
Type "help <topic>" or help "<operator>" for further help.

foma[1]: print upper
tullu[Adj]palabras
tullu[Sust][Acusativo][Aditativo]palabras
tullu[Sust][Acusativo][Responsivo]palabras
tullu[Sust][Acusativo][Interrogativo]palabras
tullu[Sust][Acusativo][Topico]palabras
tullu[Sust][Acusativo][Vacilativo]palabras
tullu[Sust][Acusativo][ReportativoVocal]palabras
tullu[Sust][Acusativo][ReportativoCons]palabras
tullu[Sust][Acusativo][TestimonialVocal]palabras
tullu[Sust][Acusativo][TestimonialCons]palabras
tullu[Sust][Acusativo][Impresivo]palabras
tullu[Sust][Acusativo][Dubitativo]palabras
tullu[Sust][Acusativo][Definitivo][Aditativo]palabras
tullu[Sust][Acusativo][Definitivo][Responsivo]palabras
tullu[Sust][Acusativo][Definitivo][Interrogativo]palabras
tullu[Sust][Acusativo][Definitivo][Topico]palabras
tullu[Sust][Acusativo][Definitivo][Vacilativo]palabras
tullu[Sust][Acusativo][Definitivo][ReportativoVocal]palabras
tullu[Sust][Acusativo][Definitivo][ReportativoCons]palabras
tullu[Sust][Acusativo][Definitivo][TestimonialVocal]palabras
tullu[Sust][Acusativo][Definitivo][TestimonialCons]palabras
tullu[Sust][Acusativo][Definitivo][Impresivo]palabras
tullu[Sust][Acusativo][Definitivo][Dubitativo]palabras
tullu[Sust][Acusativo][Definitivo][Contrastivo]palabras
tullu[Sust][Acusativo][Definitivo][Contrastivo][Topico]palabras
tullu[Sust][Acusativo][Definitivo][Contrastivo][Responsivo]palabras
tullu[Sust][Acusativo][Definitivo]palabras
tullu[Sust][Acusativo][Continuativo][Aditativo]palabras
tullu[Sust][Acusativo][Continuativo][Responsivo]palabras
tullu[Sust][Acusativo][Continuativo][Interrogativo]palabras
tullu[Sust][Acusativo][Continuativo][Topico]palabras
tullu[Sust][Acusativo][Continuativo][Vacilativo]palabras
tullu[Sust][Acusativo][Continuativo][ReportativoVocal]palabras
tullu[Sust][Acusativo][Continuativo][ReportativoCons]palabras
tullu[Sust][Acusativo][Continuativo][TestimonialVocal]palabras
tullu[Sust][Acusativo][Continuativo][TestimonialCons]palabras
tullu[Sust][Acusativo][Continuativo][Impresivo]palabras
tullu[Sust][Acusativo][Continuativo][Dubitativo]palabras
tullu[Sust][Acusativo][Continuativo][Definitivo][Aditativo]palabras
tullu[Sust][Acusativo][Continuativo][Definitivo][Responsivo]palabras
tullu[Sust][Acusativo][Continuativo][Definitivo][Interrogativo]palabras
tullu[Sust][Acusativo][Continuativo][Definitivo][Topico]palabras
tullu[Sust][Acusativo][Continuativo][Definitivo][Vacilativo]palabras
tullu[Sust][Acusativo][Continuativo][Definitivo][ReportativoVocal]palabras
tullu[Sust][Acusativo][Continuativo][Definitivo][ReportativoCons]palabras
tullu[Sust][Acusativo][Continuativo][Definitivo][TestimonialVocal]palabras
tullu[Sust][Acusativo][Continuativo][Definitivo][TestimonialCons]palabras
tullu[Sust][Acusativo][Continuativo][Definitivo][Impresivo]palabras
tullu[Sust][Acusativo][Continuativo][Definitivo][Dubitativo]palabras
tullu[Sust][Acusativo][Continuativo][Definitivo][Contrastivo]palabras
tullu[Sust][Acusativo][Continuativo][Definitivo][Contrastivo][Topico]palabras
tullu[Sust][Acusativo][Continuativo][Definitivo][Contrastivo][Responsivo]palabra
s


What version of the product are you using? On what operating system?

Foma, version 0.9.16alpha in Debian Testing Linux 3.1.0-1-amd64

Attached are the two Foma files I am using.

Original issue reported on code.google.com by [email protected] on 19 Sep 2012 at 12:01

Attachments:

Lexc does not report, if two LEXICON entries have the same name

What steps will reproduce the problem?
LEXICON Start
+Whatever:0            Gen2
                       Gen2

LEXICON Gen2
+Gens:^+cGens^é        Case;
+Genpl:^+cGenpl^éi     Case;
                       Case2;
....
LEXICON Gen2   ! This is in fact a typo, should be Gen3
+Gens:^+cGens^é        Case3;
+Genpl:^+cGenpl^éi     Case3;
                       Case4;

The compiler does not say a word, the user has no idea, 
where did the program go after Start/Whatever

An error message would be nice in such cases.

Original issue reported on code.google.com by [email protected] on 17 Jan 2012 at 5:47

Incorrect behavior of the rule

I have test.lexc and test.foma files:

$ cat test.lexc
Multichar_Symbols
    +A +M +F
LEXICON Root
almizcler AdjGenderO;
LEXICON AdjGenderO
o+A+M:o    #;
o+A+F:a    #;

$ cat test.foma
read lexc test.lexc
define Lexicon; 
define AO    a | á | o;
define CtoZC c -> zc || _ "^E" AO;
regex Lexicon .o.
      CtoZC   ;


So, I've defined two relations:
almizclero+A+M <--> almizclero
almizclero+A+F <--> almizclera
Rule CtoZC should not have an impact to this relations.

So, when I do next steps...
$ foma -l test.foma
....
foma[1]: words
almizclero:a+A:0+F:0
almizclero+A:0+M:0
foma[1]: apply down
apply down> almizclero+A+F
???

It is strange and looks like incorrect behavior.
But when I comment out line with CtoZC rule and change main regex simply to 
"regex Lexicon;", it's OK:
$ foma -l test.foma
....
foma[1]: words
almizclero:a+A:0+F:0
almizclero+A:0+M:0
foma[1]: apply down
apply down> almizclero+A+F
almizclera

Foma version: foma 0.9.16alpha
System version: Debian wheezy (testing), Linux 3.5 x86_64

Original issue reported on code.google.com by [email protected] on 19 Oct 2012 at 8:31

Attachments:

Rule named E_to_ee causes segmentation error.

What steps will reproduce the problem?
define Etoee e -> é || _ "^" [ \0 ] ;
...
define Grammar Lexicon           .o. 
               HarmRule          .o. 
               E_to_ee             .o. 
               Cleanup;

Causes a segmentation error.
Changing name E_to_ee to Etoee solves the problem.

Would not an error message be more elegant, it the name is not liked?

Original issue reported on code.google.com by [email protected] on 28 Dec 2011 at 11:32

If rule does not exist, no error message or warning, but all words are unknown (???)

What steps will reproduce the problem?
Use in foma file:
ead lexc /home/en/program/foma/tktest/lexc/verb/enhuige7ik.lexc
define Lexicon7ik

define Grammar7ik  Filter1               .o. 
                Lexicon7ik               .o.                
                CleanupEndings        .o.
                HarmRuleAAA           .o. 
                ...
                HarmRuleszorelolmely      .o.
                Cica                      .o.  <--- this rule does not exist
                Cleanup;         

What is the expected output? What do you see instead?
An error message or warning.
Instead, all words of enhuige7ik.lexc are unknown:
foma[1]: up
apply up> aggik
???
apply up> 

Newest foma version, linux debian.

Original issue reported on code.google.com by [email protected] on 14 Mar 2012 at 2:12

Question: Consonant assimilation rule can be written more elegant?

I have written a consonant assimilation rule that works:

define HarmRuleV V -> v || Vowel %^  _ ,,
                 V -> b || b %^ _ ,,
                 V -> c || c %^ _ ,,
                 V -> d || d %^ _ ,,
                 V -> f || f %^ _ ,,
                 V -> g || g %^ _ ,,
                 V -> h || h %^ _ ,,
                 V -> j || j %^ _ ,,
                 V -> k || k %^ _ ,,
                 V -> l || l %^ _ ,,
                 V -> m || m %^ _ ,,
                 V -> n || n %^ _ ,,
                 V -> p || p %^ _ ,,
                 V -> q || q %^ _ ,,
                 V -> r || r %^ _ ,,
                 V -> s || s %^ _ ,,
                 V -> t || t %^ _ ,,
                 V -> v || v %^ _ ,,
                 V -> z || z %^ _ ;
What it does is:
If the words ends with vowel, it is a v
If the word ends with consonant, it is the ending consonant.

for example: rém- rémm
             rab - rabb
             hajó- hajóv

In sfst there is a special variable, a so called agreement variable, that takes 
the value of the matching character, and that simplified the above rule, since 
one can express with one line: if a consonant is matched, take its value.

Is there nothing similar in foma to simplify the above rule?

Original issue reported on code.google.com by [email protected] on 4 Jan 2012 at 6:00

Various code validity issues

1. foma doesn't compile under recent versions of GCC. The problem is caused by 
the stricter policy followed by GCC as to when standard headers are included. 
The current workaround is to add

#include <stdbool.h>

to the file that includes fomalib.h. The solution would be to include this line 
in fomalib.h itself.

2. The parameter of fsm_read_binary_file(char*) should be a const char*. This 
goes for all other functions that accept a filename 
(fsm_read_binary_file_multiple_init, io_get_file_size, etc).

Original issue reported on code.google.com by [email protected] on 15 Jul 2013 at 1:20

fsm_read_prolog() cannot handle quotation marks in symbols

What steps will reproduce the problem?
1. read regex %"test%"%)%.;
2. write prolog test.prolog
3. read prolog test.prolog

What is the expected output? What do you see instead?
The expected output is a simple FSA: (0) --"test").--> ((1)).

What I see instead are:
1. with print net:
   Ss0:    \"test\ -> fs1.
2. with view net:
   three states: ((];)), ((1)), (0), with no transitions between them.

What version of the product are you using? On what operating system?
Latest svn (or 0.9.17alpha).

Please provide any additional information below.
The behavior is caused by fsm_read_prolog() not handling quotation marks (") in 
symbols: when parsing the arc() clause in the .prolog file, it assumes that it 
ends with "symbol"). or ("in":"out"). However, quotation marks can be part of a 
symbol in foma, in which case they are escaped by a backslash (\). Two problems 
arise:

1. If the symbol contains ": or ")., as in the test above, those are 
erroneously parsed as the end of the input symbol and the clause, respectively.
2. Even if it doesn't, the parser treats the symbol string as it is, and 
doesn't remove the backslashes from before the quotation marks.

I have attached a patch that fixes this bug in io.c.

Original issue reported on code.google.com by [email protected] on 17 Jul 2013 at 12:08

Attachments:

read_prolog_fix.patch

Enhancement request: Include

It would be very nice, if foma had a command #include "file.name" similar to C. 
That would help to avoid keeping long word lists in lexc files.
I imagine its usage like this:
----------------------------------
LEXICON Noun
#include "words/std.lex"

LEXICON AddNoun
...
----------------------------------
But also for repeated code parts would #include be quite useful.

Original issue reported on code.google.com by [email protected] on 17 Jan 2012 at 1:44

If replacement falsely has no blanks between chars, no syntax error but words become unknown

What steps will reproduce the problem?

Erroneous rule looks:
define HarmRuleszelol §szelol§ -> o l // .#. \"^"* BackVowel \Vowel* [ s | z 
] "^"  _ ,,
                  §szelol§ -> e l // .#. \"^"* FrontVowel \Vowel* [ s | z ] "^" _ ,,
                  §szelol§ -> sz // .#. \"^"* "^" [ \"^"* "^"]* _ ;

The error is the last line, should be §szelol§ -> s z // ...

There is no error message whatever at compilation time.
This erroneous rule causes, that all words containing sz will become unknown.

An error message at compilation time would help a lot for the user to find out, 
what is the problem.

Original issue reported on code.google.com by [email protected] on 24 Feb 2012 at 4:09

Attachments:

No error or warning, if in lex file two LEXICON entries have the same name

What steps will reproduce the problem?
1. In lex file I add by a mistake two LEXICON entries with the same name, for 
example:

LEXICON AddVerbrag11mely
+Verb:0     Caserag1mely;  ! should be Caserag11mely, but forgot to change

LEXICON Caserag1mely       ! should be Caserag11mely, but forgot to change
+IndefSg1:^+cIndefSg1^§igeomely§k    #;    ! ek/ok

and 
LEXICON AddVerbrag1mely
+Verb:0     Caserag1mely;

LEXICON Caserag1mely
+IndefSg1:^+cIndefSg1^§igeomely§k    #;    ! ek/ok


What is the expected output? What do you see instead?
Expected is an error message or at least a warning
Instead, foma takes one of the identical entries, and uses that,
which is a hard to grasp behaviour.

Original issue reported on code.google.com by [email protected] on 12 Mar 2012 at 10:36

If forgotten ,, or .o. in a rule, compiler crashes instead of saying, there is an error.

What steps will reproduce the problem?
1. if at a  rule ,, or .o. forgotten like:
define HarmRuleIgejatokitek §igejatokitek§ -> i t e k // .#. \"^"* FrontVowel 
\Vowel* "^" ?* _   <---- here forgotten ,,
                     §igejatokitek§ -> s á t o k // .#. \"^"* BackVowel  s  "^" ?* _   ,, 
...

then compiler crashes:
defined HarmRuleIgeija: 14.0 kB. 14 states, 818 arcs, Cyclic.
*** glibc detected *** foma: double free or corruption (!prev): 0x097eac70 ***
======= Backtrace: =========
/lib/tls/i686/cmov/libc.so.6[0x4ebff1]

It would be nicer, if compiler saw the syntax error, and gave a warning or 
error message.

Original issue reported on code.google.com by [email protected] on 21 Feb 2012 at 1:55

Attachments:

Special cases do not need e to é conversion. How to get that done?

I have a question. Cases For and Tem are special cases, where word ending e 
should not turn to é.

 # rege+For            regéként    <-- should be regeként
 # rege+Tem            regékor     <-- should be regekor
 # rege+Posss3+For     regéjéként  <-- should be regéjeként

I tried 
 # define Etoee e -> é || _ "^" [ \0 & \{+Tem} & \{+For} ] ; # \0: not zero
and also
 # define Etoee e -> é || _ "^" [ \0 & \{ként} & \{kor} ] ; # \0: not zero

and similarly
 # define HarmRuleC C -> á // BackVowel \Vowel*  _ %^  [ \0 ] .o.
                 C -> é // FrontVowel \Vowel* _ %^  [ \0 ] .o.
                 C -> a // BackVowel \Vowel*  _ %^ [ 0 | {+For} ] .o. 
                 C -> e // FrontVowel \Vowel* _ %^ [ 0 | {+For} ] ;
and also
 # define HarmRuleC C -> á // BackVowel \Vowel*  _ %^  [ \0 ] .o.
                 C -> é // FrontVowel \Vowel* _ %^  [ \0 ] .o.
                 C -> a // BackVowel \Vowel*  _ %^ [ 0 | {ként} ] .o. 
                 C -> e // FrontVowel \Vowel* _ %^ [ 0 | {ként} ] ;

But this does not work. I can not find any solution, please help.

Attached the lexc/foma pair that is a little test program especially for this.

Thanks in advance.

Original issue reported on code.google.com by [email protected] on 3 Jan 2012 at 5:57

Attachments:

Doubly defined LEXICON in lexc not reported

What steps will reproduce the problem?
1. in Lexc file:
  LEXICON AddVerb
   ...
  LEXICON AddVerb
   ...

What is the expected output? What do you see instead?
A warning, that LEXICON AddVerb is doubly defined
Instead: No warning, both LEXICONs are happily used.

Original issue reported on code.google.com by [email protected] on 3 Apr 2012 at 10:42

BOM symbol is not ignored

What steps will reproduce the problem?
1. I use a UTF-8 source file with the BOM symbol at the beginning
2. I apply "source" or "read" command to the file

What is the expected output? What do you see instead?
I expect foma to ignore BOM. However, it is regarded as a normal symbol, 
resulting in errors (like "unknown character").

What version of the product are you using? On what operating system?
I am using 0.9.17 on Windows 7.

Actually, it's a really minor ussue but new users may have troubles as the 
reason is not very clear at first. Perhaps, it should be specified in the 
tutorial that one has to use UTF-8 files without BOM.

Original issue reported on code.google.com by [email protected] on 3 Mar 2013 at 3:36

memory hog in apply.c xxstrdup, string never freed, if used in flookup.

What steps will reproduce the problem?
1. use Hungarian foma's result hunfnnum.fst, Hungarian foma downloadable from 
https://gitorious.org/hunmorph-foma/hunmorph-foma/trees/master
2. do_testup.sh (attached), watch flookup size using ps, it is 39.6 MB
3. perl x.pl <x >x1 (both attached)
4.  perl /home/en/program/foma/tktest/szokincsteszt/szeged/chkwdlistup.pl x1 > 
x2
  this will take about 5-10 minutes, after the test flookup size increases to > 54 MB
  (chkwdlistup.pl attached)

What is the expected output? What do you see instead?
The problem is, that apply.c 
----------------------------------------------------------------
int apply_check_flag(struct apply_handle *h, int type, char *name, char *value) 
{
    struct flag_list *flist, *flist2;
    for (flist = h->flag_list; flist != NULL; flist = flist->next) {
    if (strcmp(flist->name, name) == 0) {
        break;
    }
    }
    h->oldflagvalue = flist->value;
    h->oldflagneg = flist->neg;

    if (type == FLAG_UNIFY) {
    if (flist->value == NULL) {
        flist->value = xxstrdup(value);  /* this causes the hog */
        return SUCCEED;
    }
--------------------------------------------------
duplicates a string, and never ever frees it. I found a solution, that fixes 
the problem:
in flookup.c:
at the declarations:
extern void apply_clean();
extern  void apply_clean_start();
.....
void handle_line(char *s) {
    char *result, *tempstr;
    apply_clean_start();

....
    }
    apply_clean();
}

In apply.c:
In declarations:
static int apply_clean_variable;
#define MAX_SAVED 10
static struct flag_list *saved_flag_list[MAX_SAVED];
static char *saved_values[MAX_SAVED];
static int clean_ix;
static void apply_add_clean_list(struct flag_list *flist, char *value);
void apply_clean();
void apply_clean_start();
....
int apply_check_flag(struct apply_handle *h, int type, char *name, char *value) 
{
    struct flag_list *flist, *flist2;
    for (flist = h->flag_list; flist != NULL; flist = flist->next) {
    if (strcmp(flist->name, name) == 0) {
        break;
    }
    }
    h->oldflagvalue = flist->value;
    h->oldflagneg = flist->neg;

    if (type == FLAG_UNIFY) {
    if (flist->value == NULL) {
        flist->value = xxstrdup(value);
            apply_add_clean_list(flist, flist->value);
        return SUCCEED;
    }
....

void apply_clean_start()
{
  apply_clean_variable = 1;
}
void apply_add_clean_list(struct flag_list *flist, char *value)
{
   if(apply_clean_variable){
     saved_flag_list[clean_ix] = flist;
     saved_values[clean_ix] = value;
     if(++clean_ix >= MAX_SAVED){
        clean_ix = 0;
     }
   }
}
void apply_clean(){
  if(apply_clean_variable){
      int i;
      for(i = 0; i < clean_ix; i++){
        xxfree(saved_values[i]);
        saved_flag_list[i]->value = NULL;
        saved_values[i] = NULL;
        saved_flag_list[i] = NULL;
      }
      clean_ix = 0;
      apply_clean_variable = 0;
  }
}
void apply_clean_start();
void apply_add_clean_list(struct flag_list *flist, char *value);
void apply_clean();



What version of the product are you using? On what operating system?
Newest from svn, linux debian

Please provide any additional information below.
The solution's description:
1. sign that we are enter flookup, 
set apply_clean_variable = 1;

2. If apply_clean_variable == 1, remember all strdups in a list, max 10 of them

3. when leaving flookup, free all strings in list, put NULL into their pointer 
in struct flag_list *flist, set apply_clean_variable = 0;

----------------------------------

I have also tried to eliminate the strdup in apply_check_flag, and pass back a  
FAIL, however, in that case lots of words were not found, that is functionality 
of foma fails.

Original issue reported on code.google.com by [email protected] on 16 Jan 2013 at 2:26

Attachments:

Question: read word list from file at compile time, words are not written in lexc- is that possible?

We have typically the word lists in lexc files, like:

LEXICON Noun

cat   Ninf;
city  Ninf;
fox   Ninf;
panic Ninf;
try   Ninf;
watch Ninf;

There is an intercafe command read text, however, that works only from foma, 
not from lexc. 

I'd like to keep some hundred words in an external file, and read them
in compile time into the lexc file. Is that possible?

Original issue reported on code.google.com by [email protected] on 11 Jan 2012 at 6:52

flookup does not backtrack, fails to recognize valid input

What steps will reproduce the problem?
1. Create a binary for regex

     regex [?* a] @-> d;

2. Load it into foma and apply down the following words:
     bf
     aaaaaaaf
     faaaaf
     dddaaaf

3. Send the words through flookup -ix.

What is the expected output? What do you see instead?
  The expected output is `bf` for the first word and `df` for all the others
  The foma interpreter returns the correct results but flookup does not:
    bf -> bf
    aaaaaaaf -> df
    faaaaf -> +?
    dddaaaf -> df (!)

What version of the product are you using? On what operating system?
  foma 0.9.16
  Ubuntu 11.10, 12.04

Please provide any additional information below.

  Looking at the transducer (see attachment), we can see that the paths the above computations should take are

(0) -> (3)                       // bf
(0) -> (1) -> (3)                // aaaaaaaf
(0) ->  2  -> (1) -> (3)         // faaaaf
(0) -> (1) ->  2  -> (1) -> (3)  // dddaaaf

The last sequence is successful again, I assumed it was because of the arc (0) 
-d-> (1), but I was not sure how. So I inserted a line that prints the current 
state at apply.c:910. The results are:

flookup:
-------

$ echo "aaa" | ./flookup -ix ../foo.bin
State no 0
State no 1
State no 1
State no 1
d

$ echo "ad" | ./flookup -ix ../foo.bin
State no 0
State no 1
State no 3
dd
State no 2

$ echo "caa" | ./flookup -ix ../foo.bin
State no 0
State no 3
+?

foma:
----

foma[1]: down caa
State no 0
State no 3
State no 2
State no 1
State no 1
d

foma[1]: down aadd
State no 0
State no 1
State no 1
State no 3
State no 3
ddd
State no 2
State no 2

Apparently, in case of `caa`, flookup does not backtrack from state 3, while 
the interpreter does.

Another interesting bit is the analysis of `ad` and `aadd`, where the 
interpreter moves from state 3 to state 2 after the results have been printed. 
This seems completely superfluous.

Original issue reported on code.google.com by [email protected] on 26 Jul 2012 at 3:55

Attachments:

ad.png

lexc file syntax check tool in perl

This is not a problem, but a lexc file syntac check tool in perl. Maybe, others 
find it also useful.
It checks for:
1. doubly defined LEXICONs
2. Unused LEXICONs
3. Undefined but used LEXICONs
It is handy, because it quickly checks a set of files in a directory.

Original issue reported on code.google.com by [email protected] on 2 Jul 2012 at 9:07

I do not understand this replacement case. Can you please help?

Lexc file contains:
+Posss1p:^+cPosss1p^§JPnull§§AE§im    Gen;  ! aim
citromlé:citromlev AddNoun1;

Foma file contains:
define FrontVowel [ e | é | i | í | ü | ű | ö | ő | E | É | I | Í | Ü 
| Ű | Ö | Ő ];
define BackVowel [ a | á | o | ó | u | ú | A | Á | O | Ó | U | Ú ];
define HarmRuleai §AE§ -> 0 || .#. \"^"* Vowelwithy %^ \%^* _ .o.
                  §AE§ -> a // .#. \"^"* BackVowel Cons+ %^ \%^* _ .o.
                  §AE§ -> e // .#. \"^"* FrontVowel Cons+ %^ \%^* _ ;

The word has Frontvowel-backvowel-frontwovel vowels (mixed)
I get this
foma[1]: down
apply down> citromlé+Noun+Posss1p+Gens+Dat
citromlé^im^é^n§AA§k
citromlev^aim^é^n§AA§k
when I stop immediately after replacement of §AE§.

This is wrong! the rule says:
                 §AE§ -> a // .#. \"^"* BackVowel Cons+ %^ \%^* _ .o.

Start from the beginning of the word; if you see backvowel, then only 
consonant(s) and then ^, replace §AE§ after this with 'a';

However there is no such sequence in citromlev^, since before ^ there is 
frontvowel and not backvowel.

If I use:  
define HarmRuleai §AE§ -> 0 || .#. \"^"* Vowelwithy %^ \%^* _ .o.
                  §AE§ -> e // .#. \"^"* FrontVowel Cons+ %^ \%^* _ .o.
                  §AE§ -> a // .#. \"^"* BackVowel Cons+ %^ \%^* _ ;

That is, I swap check for front and back vowel, I get correct behaviour but I 
do not understand, why the first version delivers an incorrect replacement.

Any ideas?

Original issue reported on code.google.com by [email protected] on 1 Feb 2012 at 4:28

Attachments:

fsav.tar.gz

2 defines have no ; at the end; foma behaves like dead

What steps will reproduce the problem?

My last 2 defines are:
define AccSingle [ j | l | n | r | s | z ]
define Nodupy [ Vowel | b | c | d | e | f | h | j | k | m | n | p | q | r | s | 
v | w | x | y | z ]

(then follow rules)

I have forgotten to write ';' at the end of the defines.

Foma compiles, reports no error, no warning, and then lower-words answers with 
empty list:
foma[1]: lower-words
foma[1]: 

What is the expected output? What do you see instead?
It would be much nicer, if foma reported an error or at least a warning during 
compilation, that the user knows, where to search.

Original issue reported on code.google.com by [email protected] on 27 Jan 2012 at 2:05

Attachments:

enhu2.foma

Comments in Foma don't ignore > (greater than) and " (double quotation mark) characters

What steps will reproduce the problem?

Create a Foma file with comments which contain > (greater than) or " (double 
quotation mark) characters. For example:

# Reglas para diminuitivos (para implementar con expresiones regulares):
#1. Si termina en U, añade -itu, sin importar el genero. 
#    Ejs: allqu > allqitu, 
#         "perrito" o "perrita"  
LEXICON CC_sust_diminuitivos
[Diminuitivo]:+itu CC_sust_bif;


The above example generates the following error messages:

***Syntax error on line 58 column 18 at '>'
***Syntax error on line 59 column 12 at '"'


What version of the product are you using? On what operating system?

Foma version 0.9.16alpha, running in Debian Testing Linux 3.1.0-1-amd64

Attached are the two Foma to test this.

Please provide any additional information below.

Original issue reported on code.google.com by [email protected] on 19 Sep 2012 at 11:44

Attachments:

lower-words and upper-words show only first 100 elements

What steps will reproduce the problem?
1.
!!!hun41.lexc!!!

Multichar_Symbols +N +V +Nom +Pl

!Poss
+Posss1 +Posss3 +Posss3 

!Genitiv
+Gen 
+Genpl

!Cases
+Abl +Acc +Ade +All
+Cau +Dat +Del 
+Ela +Fac +For
+Ill +Ine +Ins 
+Sub +Sup +Ter       

!Special cases
+Dis +Ess +Fam +Soc +Tem

LEXICON Root
        Noun ; 

LEXICON Noun
!+N:kar   Plur;
!+N:kéz   Plur;
!+N:kör   Plur;
!+N:hajó  Plur;
rege      Scase;
rege      Poss;

LEXICON Poss
+Posss1:^Bm      Plur;
+Posss2:^Bd      Plur;
+Posss3:^JC      Plur;
                 Plur;

LEXICON Plur
+Plur:^Ok      Fam;
               Fam;

LEXICON Fam
+Fam:^ék     Gen;
             Gen;

LEXICON Gen
+Gen:^é         Case;
+Genpl:^éi      Case;
                Case;

LEXICON Case
+Abl:^tUl      #;
!+Acc:^Gk      #;
!+Ade:^nHl     #;
!+All:^hIz     #;
!+Cau:^ért     #;
!+Dat:^nKk     #;
!+Del:^rUl     #;
!+Ela:^bUl     #;
+Fac:^VD       #;
!+For:^ként    #;
!+Ill:^nHk     #;
!+Ine:^bHn     #;
+Ins:^VFl      #;
!+Sub:^rK      #;
!+Sup:^Pn      #;
!+Ter:^ig;     #;

LEXICON Scase
!+Dis:^Lnként  #;
+Ess:^Zl       #;
!+Soc:^NstZl   #;
+Tem:^kor      #;
               #;

### hun4.foma ###

# Vowels
define Vowel [ a | á | e | é | i | í | o | ó | u | ú | ü | ű | ö | ő ];
define BackVowel [ a | á | o | ó | u | ú ];
define FrontUnroundedVowel [ e | é | i | í | ü | ű ];
define FrontRoundedVowel [ ö | ő ];
define FrontVowel [e | é | i | í | ü | ű | ö | ő ];

# E to é: if any ending e-> é
define Etoee e -> é || _ "^" [ \0 ] ;

# Cleanup: remove morpheme boundaries
define Cleanup "^" -> 0;

#define DelRule O -> 0 || Vowel %^ _ ;
define HarmRuleO O -> 0 // Vowel %^ _  .o.
                 O -> o // BackVowel \Vowel+  _ ,,
                 O -> e // FrontUnroundedVowel \Vowel+ _ ,,
                 O -> ö // FrontRoundedVowel \Vowel+  _ ;
define HarmRuleB B -> 0 // Vowel %^ _ .o.
                 B -> o // BackVowel \Vowel+  _ ,,
                 B -> e // FrontUnroundedVowel \Vowel+ _ ,,
                 B -> ö // FrontRoundedVowel \Vowel+  _ ;
define HarmRuleA A -> 0 // Vowel %^ _ .o.
                 A -> a // BackVowel \Vowel+  _ ,,
                 A -> e // FrontVowel \Vowel+ _ ;
define HarmRuleC C -> a // BackVowel \Vowel+  _ .#. .o.
                 C -> e // FrontVowel \Vowel+ _ .#. .o.
                 C -> á // BackVowel \Vowel+  _ .o.
                 C -> é // FrontVowel \Vowel+ _ ;
define HarmRuleJ J -> j ||  Vowel %^  _ .o.
                 J -> 0 //  \Vowel+ _ ;
define HarmRuleU U -> ó // BackVowel \Vowel+  _ ,,
                 U -> ő // FrontVowel \Vowel+ _ ;
define HarmRuleZ Z -> u // BackVowel \Vowel+  _ ,,
                 Z -> ü // FrontVowel \Vowel+ _ ;
define HarmRuleD D -> á // BackVowel \Vowel+  _ ,,
                 D -> é // FrontVowel \Vowel+ _ ;
define HarmRuleF F -> a // BackVowel \Vowel+  _ ,,
                 F -> e // FrontVowel \Vowel+ _ ;
define HarmRuleV V -> v || Vowel %^  _ ,,
                 V -> k || k %^ _ ,,
                 V -> m || m %^ _ ,,
                 V -> d || d %^ _ ,,
                 V -> r || r %^ _ ;

define Ablaut   é -> e || _ z "^" [ \0 ] ; 

read lexc hun41.lexc
define Lexicon

define Grammar Lexicon           .o.
               HarmRuleO         .o. 
               HarmRuleB         .o. 
               HarmRuleA         .o. 
               HarmRuleJ         .o. 
               HarmRuleU         .o. 
               HarmRuleC         .o. 
               HarmRuleZ         .o. 
               HarmRuleD         .o. 
               HarmRuleF         .o. 
               HarmRuleV         .o. 
               Ablaut            .o. 
               Etoee             .o. 
               Cleanup;

regex Grammar;

foma[1]: upper-words
rege
rege+Ins
rege+Fac
rege+Abl
rege+Genpl+Ins
rege+Genpl+Fac
rege+Genpl+Abl
rege+Gen+Ins
rege+Gen+Fac
rege+Gen+Abl
rege+Fam+Ins
rege+Fam+Fac
rege+Fam+Abl
rege+Fam+Genpl+Ins
rege+Fam+Genpl+Fac
rege+Fam+Genpl+Abl
rege+Fam+Gen+Ins
rege+Fam+Gen+Fac
rege+Fam+Gen+Abl
rege+Plur+Ins
rege+Plur+Fac
rege+Plur+Abl
rege+Plur+Genpl+Ins
rege+Plur+Genpl+Fac
rege+Plur+Genpl+Abl
rege+Plur+Gen+Ins
rege+Plur+Gen+Fac
rege+Plur+Gen+Abl
rege+Plur+Fam+Ins
rege+Plur+Fam+Fac
rege+Plur+Fam+Abl
rege+Plur+Fam+Genpl+Ins
rege+Plur+Fam+Genpl+Fac
rege+Plur+Fam+Genpl+Abl
rege+Plur+Fam+Gen+Ins
rege+Plur+Fam+Gen+Fac
rege+Plur+Fam+Gen+Abl
rege+Posss3+Ins
rege+Posss3+Fac
rege+Posss3+Abl
rege+Posss3+Genpl+Ins
rege+Posss3+Genpl+Fac
rege+Posss3+Genpl+Abl
rege+Posss3+Gen+Ins
rege+Posss3+Gen+Fac
rege+Posss3+Gen+Abl
rege+Posss3+Fam+Ins
rege+Posss3+Fam+Fac
rege+Posss3+Fam+Abl
rege+Posss3+Fam+Genpl+Ins
rege+Posss3+Fam+Genpl+Fac
rege+Posss3+Fam+Genpl+Abl
rege+Posss3+Fam+Gen+Ins
rege+Posss3+Fam+Gen+Fac
rege+Posss3+Fam+Gen+Abl
rege+Posss3+Plur+Ins
rege+Posss3+Plur+Fac
rege+Posss3+Plur+Abl
rege+Posss3+Plur+Genpl+Ins
rege+Posss3+Plur+Genpl+Fac
rege+Posss3+Plur+Genpl+Abl
rege+Posss3+Plur+Gen+Ins
rege+Posss3+Plur+Gen+Fac
rege+Posss3+Plur+Gen+Abl
rege+Posss3+Plur+Fam+Ins
rege+Posss3+Plur+Fam+Fac
rege+Posss3+Plur+Fam+Abl
rege+Posss3+Plur+Fam+Genpl+Ins
rege+Posss3+Plur+Fam+Genpl+Fac
rege+Posss3+Plur+Fam+Genpl+Abl
rege+Posss3+Plur+Fam+Gen+Ins
rege+Posss3+Plur+Fam+Gen+Fac
rege+Posss3+Plur+Fam+Gen+Abl
rege+Posss2+Ins
rege+Posss2+Fac
rege+Posss2+Abl
rege+Posss2+Genpl+Ins
rege+Posss2+Genpl+Fac
rege+Posss2+Genpl+Abl
rege+Posss2+Gen+Ins
rege+Posss2+Gen+Fac
rege+Posss2+Gen+Abl
rege+Posss2+Fam+Ins
rege+Posss2+Fam+Fac
rege+Posss2+Fam+Abl
rege+Posss2+Fam+Genpl+Ins
rege+Posss2+Fam+Genpl+Fac
rege+Posss2+Fam+Genpl+Abl
rege+Posss2+Fam+Gen+Ins
rege+Posss2+Fam+Gen+Fac
rege+Posss2+Fam+Gen+Abl
rege+Posss2+Plur+Ins
rege+Posss2+Plur+Fac
rege+Posss2+Plur+Abl
rege+Posss2+Plur+Genpl+Ins
rege+Posss2+Plur+Genpl+Fac
rege+Posss2+Plur+Genpl+Abl
rege+Posss2+Plur+Gen+Ins
rege+Posss2+Plur+Gen+Fac
rege+Posss2+Plur+Gen+Abl

Output stops here.
Using down, I can see, it knows also the rest:
foma[1]: down
apply down> rege+Posss2+Plur+Gen+Abl
regédekétől
apply down> rege+Posss1+Plur+Gen+Abl
regémekétől
apply down> rege+Posss1+Fam+Abl
regéméktől


However, Hungarian has 769 forms for each noun at a minimal test. We also have 
at least 30 noun classes, that need to be tested individually. It is impossible 
to test that much forms using up and down. I would suggest to show all valid 
forms and valid words for the command, that makes testing possible.

Thanks in advance for help or support.

Original issue reported on code.google.com by [email protected] on 1 Jan 2012 at 4:21

In lexicon missing ";" (closing of a rule) is not recognized and not reported

What steps will reproduce the problem?
1.
LEXICON Ninf

+Noun+Sg:0   #;
+Noun+Det:det   #;
+Noun+Det:acc   #;
+Noun+Pl:^s  CaseN;

2.
We loose closing of case Det:
LEXICON Ninf

+Noun+Sg:0   #;
+Noun+Det:det   
+Noun+Det:acc   #;
+Noun+Pl:^s  CaseN;

3.

What is the expected output? 

Compiler should report an error

What do you see instead?
defined Cleanup: 268 bytes. 1 state, 2 arcs, Cyclic.
Root...3, Noun...7, Verb...6, Misc...1, Ninf...3, CaseN...1, Vinf...5, Nmisc...1
Building lexicon...
Determinizing...
Minimizing...
Done!
1.9 kB. 57 states, 75 arcs, 52 paths.
defined Lexicon: 1.9 kB. 57 states, 75 arcs, 52 paths.
defined Grammar: 2.4 kB. 72 states, 102 arcs, 52 paths.
defined Grammarup: 3.0 kB. 72 states, 102 arcs, 52 paths.
3.1 kB. 72 states, 110 arcs, 101 paths.
Foma, version 0.9.16alpha


What version of the product are you using? On what operating system?
Foma, version 0.9.16alpha
Linux, debian

Please provide any additional information below.

Original issue reported on code.google.com by [email protected] on 6 Sep 2012 at 3:05

Attachments:

flookup doesn't handle unknown symbol properly

What steps will reproduce the problem?
1. Run the following code in foma:

  regex [ ? -> x ];
  save stack test.foma

2. Create a text file test.txt:

a
ab
abc

3. Run the text file through flookup:

  cat test.txt | flookup -i -x test.foma

What is the expected output? What do you see instead?
Expected:
x

xx

xxx

What I see:
+?

+?

+?

What version of the product are you using? On what operating system?
  flookup 1.02 (foma library version 0.9.16alpha) on OpenSUSE 12.1.

Original issue reported on code.google.com by [email protected] on 9 Oct 2012 at 12:58

No warning or error, it rule name mistyped, just no output

What steps will reproduce the problem?
1. define IitoiAcc3 í -> i || .#. \%^* _ [ DCons | Cons  ] "^"  ?*  [ "+" 
{Acc} ]  ?*  ;
2. ...
3. define Grammar Lexicon .o.
                  IitoiAcd3;   # mistyped Acc as Acd

Foma loads things nicely, no warning or error,
but there is no output:
apply down> ég+Noun+Nom
???

It would be nicer, if foma gave a warning or error 
message in such cases. (mistyped rule names)

Original issue reported on code.google.com by [email protected] on 13 Jan 2012 at 4:46

enter 25 combinations but get 37

What steps will reproduce the problem?

I define 25 output combination in lexc:

LEXICON Ninf
+N+Nom:0       #;
+N+Pl:^Ok      #;
+N+Abl:t^Ul    #;
+N+Gen:é       #;
+N+Fam:ék      #;
+N+Fam+Abl:ékt^Ul      #;
+N+Fam+Gen+Abl:ékét^Ul      #;
+N+Ess:^Zl     #;
+N+Posss1:^Bm  #;
+N+Posss2:^Bd  #;
+N+Posss3:^J^A #;
+N+Posss1+Abl:^Bmt^Ul  #;
+N+Posss2+Abl:^Bdt^Ul  #;
+N+Posss3+Abl:^J^Ct^Ul #;
+N+Pl+Gen:^Oké      #;
+N+Fam+Gen:éké      #;
+N+Pl+Abl:^Okt^Ul      #;
+N+Gen+Abl:ét^Ul       #;
+N+Pl+Gen+Abl:^Okét^Ul      #;
+N+Posss1+Gen:^Bmé  #;
+N+Posss2+Gen:^Bdé  #;
+N+Posss3+Gen:^J^Cé #;
+N+Posss1+Gen+Abl:^Bmét^Ul  #;
+N+Posss2+Gen+Abl:^Bdét^Ul  #;
+N+Posss3+Gen+Abl:^J^Cét^Ul #;

What is the expected output? What do you see instead?
I would expect 25 output combinations, but I get 37:

foma[28]: upper-words
rege+N+Nom
rege+N+Gen
rege+N+Gen+Abl
rege+N+Fam
rege+N+Fam+Abl
rege+N+Fam+Gen
rege+N+Fam+Gen+Abl
rege+N+Abl
rege+N+Posss2+Gen
rege+N+Posss2+Gen+Abl
rege+N+Posss2+Abl
rege+N+Posss2
rege+N+Posss1+Gen
rege+N+Posss1+Gen+Abl
rege+N+Posss1+Abl
rege+N+Posss1
rege+N+Ess
rege+N+Pl+Abl
rege+N+Pl+Gen
rege+N+Pl+Gen+Abl
rege+N+Pl
rege+N+Posss3+Gen
rege+N+Posss3+Gen+Abl
rege+N+Posss3+Abl
rege+N+Posss3
rege+N+Posss2+Gen
rege+N+Posss2+Gen+Abl
rege+N+Posss2+Abl
rege+N+Posss2
rege+N+Posss1+Gen
rege+N+Posss1+Gen+Abl
rege+N+Posss1+Abl
rege+N+Posss1
rege+N+Pl+Abl
rege+N+Pl+Gen
rege+N+Pl+Gen+Abl
rege+N+Pl

(12 get duplicated, the last 12)

What version of the product are you using? On what operating system?
Foma, version 0.9.16alpha
Linux

Please provide any additional information below.
foma and lex files attached

Original issue reported on code.google.com by [email protected] on 29 Dec 2011 at 4:53

Attachments:

Cannot allocate memory: out of memory

What steps will reproduce the problem?
1. foma -l enhu1.foma

What is the expected output? What do you see instead?

Expected compilation

Instead I see:
Fatal error: out of memory
: Cannot allocate memory


What version of the product are you using? On what operating system?
Foma, version 0.9.16alpha
Linux en-desktop 2.6.31-14-generic #48-Ubuntu SMP Fri Oct 16 14:04:26 UTC 2009 
i686 GNU/Linux

Please provide any additional information below.

Original issue reported on code.google.com by [email protected] on 6 Jan 2012 at 5:20

Attachments:

fsm_union (API) problem with empty set and empty string built using fsm_construct_*

What steps will reproduce the problem?
1. Write the attached code (filename: bug_report.cpp)
2. Compile using "g++ -ggdb3 -o bug_report bug_report.cpp -lfoma"
3. Run "valgrind --leak-check=full ./bug_report"

Expected result: no leaks and no errors.
Present result: Leak (25 allocs, 20 frees) and error (invalid write) 

What version of the product are you using? On what operating system?
Foma Api (0.9.16alpha (20111213) according to first line in the changelog). I 
changed the Makefile to compile with flag -ggdb3 for debug purposes. I also 
changed prefix to "/usr".
Archlinux quite recently updated.

Please provide any additional information below.
The attached files are the C++ code and valgrind output.

Original issue reported on code.google.com by [email protected] on 8 May 2012 at 6:25

Attachments:

error report very terse, if ,o, instead of .o. in foma file

What steps will reproduce the problem?
1. in Foma file there is:
define Grammarfnige Lexiconfnige           .o. 
               Etoee             .o. 
               Atoaa             .o. 
               Changez2          ,o,     <---- here ,o, instead of .o.
...
               Cleanup             .o.
               ToUpCase;
Compiler says: 
defined Lexiconfnige: 376.0 kB. 12797 states, 23924 arcs, 2520580612 paths.
2299.34-2299.34: error: ***syntax error at ','.
">>> read in enhufnnum <<<"        
No indication in which file or in which line is the error.            


What is the expected output? What do you see instead?
Compiler should say:
Error in file enhu2.foma
line 2277: , instead of .

What version of the product are you using? On what operating system?
Foma, version 0.9.16alpha
debian linux

Please provide any additional information below.

Original issue reported on code.google.com by [email protected] on 24 Aug 2012 at 11:13

verb alone works fine; when combined woth other word types, some cases are missing

What steps will reproduce the problem?
verb fst is called hun42.fst. It works fine alone:
en@en-desktop ~/program/foma/tktest $ kill_flookup.sh[1]+  Kilőve              

en@en-desktop ~/program/foma/tktest $ flookup -i -S -A 127.0.0.1 
/home/en/program/foma/tktest/hun42.fst &
[1] 4770
en@en-desktop ~/program/foma/tktest $ Started flookup server on 127.0.0.1 port 
6062

en@en-desktop ~/program/foma/tktest $ echo "abnormalitásoz+Verb+CondDefPl3" | 
nc -w 1 -u 127.0.0.1 6062
abnormalitásoz+Verb+CondDefPl3 Abnormalitásoznák
abnormalitásoz+Verb+CondDefPl3 abnormalitásoznák
-------------------------------------
Here I combine verbs with the other word types:
cat hfnnum.foma
read regex @"hun41.fst" | @"hun42.fst" | @"hunnum.fst" | @"hunadj.fst" | 
@"hunfxpp.fst" | @"hunmisc.fst";
en@en-desktop ~/program/foma/tktest $ cat crfnnum.sh
foma -l hfnnum.foma <savestackfnnum.sh
cat savestackfnnum.sh
save stack hunfnnum.fst
----------------------------------------------
Here I try the combined fst file:
kill_flookup.sh
[1]+  Kilőve                 
flookup -i -S -A 127.0.0.1 /home/en/program/foma/tktest/hun42.fst

$ cat do_test.sh
kill_flookup.sh
flookup -i -S -A 127.0.0.1 /home/en/program/foma/tktest/hunfnnum.fst &
en@en-desktop ~/program/foma/tktest/tools/fomaallchk $ sh do_test.sh
en@en-desktop ~/program/foma/tktest/tools/fomaallchk $ Started flookup server 
on 127.0.0.1 port 6062

en@en-desktop ~/program/foma/tktest $ echo "abnormalitásoz+Verb+CondDefPl3" | 
nc -w 1 -u 127.0.0.1 6062
abnormalitásoz+Verb+CondDefPl3 ?+

It works for lots of cases, but not for ConjIndef... , ConjDef..., 
CondIndef..., CondDef...

Also just using foma -l ....foma shows the same results; the problem is not 
flookup related.

What is the expected output? What do you see instead?
expected:
abnormalitásoz+Verb+CondDefPl3 Abnormalitásoznák
I see instead:
abnormalitásoz+Verb+CondDefPl3 ?+

What version of the product are you using? On what operating system?
foma 0.9.16alpha (from svn)
linux debian


Please provide any additional information below.

Original issue reported on code.google.com by [email protected] on 16 Aug 2012 at 2:58

Request: Weighted fsm

This is a request for future versions.
There are often grammatical versions, that are more often used, and others, 
that seldom. For example in Hungarian possession 3-rd person is expressed with 
a/e or with ja/je. I can say, tor-a, but also tor-ja for 'his tor'. For 
translation applications it would be helpful, if the more often used version 
were weighted; Program would then generate in the more often used version, but 
it would understand even the less often used version. 
What is more or less used, is individual, and must be set up for each word (in 
some cases for each word group) individually.

Original issue reported on code.google.com by [email protected] on 6 Feb 2012 at 10:27

flookup reports file format error and file error

What steps will reproduce the problem?
echo regék | flookup -x hun41.foma
File format error foma!
: Success
File error: hun41.foma


Hun41 lexc/foma look:
!!!hun41.lexc!!!

Multichar_Symbols +N +V +Nom +Pl

!Poss
+Posss1 +Posss2 +Posss3 +Possp1 +Possp2 +Possp3 
+Posss1p +Posss2p +Posss3p +Possp1p +Possp2p +Possp3p 

!Genitiv
+Gen 
+Genpl

!Cases
+Abl +Acc +Ade +All
+Cau +Dat +Del 
+Ela +Fac +For
+Ill +Ine +Ins 
+Sub +Sup +Ter       

!Special cases
+Dis +Ess +Fam +Soc +Tem

LEXICON Root
        Noun ; 

LEXICON Noun
!+N:kar   Plur;
!+N:kéz   Plur;
!+N:kör   Plur;
!+N:hajó  Plur;
rege      Poss;

LEXICON Poss
+Dis:^Lnként   #;
+Ess:^Zl       #;
+Soc:^NstZl    #;
+Tem:^kor      #;
+Posss1:^Bm       Plur;
+Posss2:^Bd       Plur;
+Posss3:^JC       Plur;
+Possp1:^Hnk      Plur;
+Possp2:^KtQk     Plur;
+Possp3:^JRk      Plur;
+Posss1p:^STim    Plur;
+Posss2p:^STid    Plur;
+Posss3p:^STi     Plur;
+Possp1p:^STink   Plur;
+Possp2p:^STitWk  Plur;
+Possp3p:^STik    Plur;
                  Plur;

LEXICON Plur
+Plur:^Ok      Fam;
               Fam;

LEXICON Fam
+Fam:^ék     Gen;
             Gen;

LEXICON Gen
+Gen:^é         Case;
+Genpl:^éi      Case;
                Case;

! H, K, unused
LEXICON Case
+Abl:^tUl      #;
+Acc:^Gt       #;
+Ade:^nDl      #;
+All:^hIz      #;
+Cau:^ért      #;
+Dat:^nFk      #;
+Del:^rUl      #;
+Ela:^bUl      #;
+Fac:^VD       #;
+For:^ként     #;
+Ill:^bF       #;
+Ine:^bFn      #;
+Ins:^VFl      #;
+Sub:^rF       #;
+Sup:^Pn       #;
+Ter:^ig;      #;

### hun4.foma ###

# Vowels
define Vowel [ a | á | e | é | i | í | o | ó | u | ú | ü | ű | ö | ő ];
define BackVowel [ a | á | o | ó | u | ú ];
define FrontUnroundedVowel [ e | é | i | í | ü | ű ];
define FrontRoundedVowel [ ö | ő ];
define FrontVowel [e | é | i | í | ü | ű | ö | ő ];

# E to é: if any ending e-> é
define Etoee e -> é || _ "^" [ \0 ] ;

# Cleanup: remove morpheme boundaries
define Cleanup "^" -> 0;

#define DelRule O -> 0 || Vowel %^ _ ;
define HarmRuleO O -> 0 // Vowel %^ _  .o.
                 O -> o // BackVowel \Vowel+  _ ,,
                 O -> e // FrontUnroundedVowel \Vowel+ _ ,,
                 O -> ö // FrontRoundedVowel \Vowel+  _ ;
define HarmRuleB B -> 0 // Vowel %^ _ .o.
                 B -> o // BackVowel \Vowel+  _ ,,
                 B -> e // FrontUnroundedVowel \Vowel+ _ ,,
                 B -> ö // FrontRoundedVowel \Vowel+  _ ;
define HarmRuleA A -> 0 // Vowel %^ _ .o.
                 A -> a // BackVowel \Vowel+  _ ,,
                 A -> e // FrontVowel \Vowel+ _ ;
define HarmRuleC C -> a // BackVowel \Vowel+  _ .#. .o.
                 C -> e // FrontVowel \Vowel+ _ .#. .o.
                 C -> á // BackVowel \Vowel+  _ .o.
                 C -> é // FrontVowel \Vowel+ _ ;
define HarmRuleJ J -> j ||  Vowel %^  _ .o.
                 J -> 0 //  \Vowel+ _ ;
define HarmRuleU U -> ó // BackVowel \Vowel+  _ ,,
                 U -> ő // FrontVowel \Vowel+ _ ;
define HarmRuleZ Z -> u // BackVowel \Vowel+  _ ,,
                 Z -> ü // FrontVowel \Vowel+ _ ;
define HarmRuleD D -> á // BackVowel \Vowel+  _ ,,
                 D -> é // FrontVowel \Vowel+ _ ;
define HarmRuleF F -> a // BackVowel \Vowel+  _ ,,
                 F -> e // FrontVowel \Vowel+ _ ;
define HarmRuleV V -> v || Vowel %^  _ ,,
                 V -> k || k %^ _ ,,
                 V -> m || m %^ _ ,,
                 V -> d || d %^ _ ,,
                 V -> r || r %^ _ ;
define HarmRuleG G -> 0 // Vowel %^ _  .o.
                 G -> o // BackVowel \Vowel+  _ ,,
                 G -> e // FrontUnroundedVowel \Vowel+ _ ,,
                 G -> ö // FrontRoundedVowel \Vowel+  _ ;
define HarmRuleI I -> o // BackVowel \Vowel+  _ ,,
                 I -> e // FrontUnroundedVowel \Vowel+ _ ,,
                 I -> ö // FrontRoundedVowel \Vowel+  _ ;
define HarmRuleP P -> 0 // Vowel %^ _  .o.
                 P -> o // BackVowel \Vowel+  _ ,,
                 P -> e // FrontUnroundedVowel \Vowel+ _ ,,
                 P -> ö // FrontRoundedVowel \Vowel+  _ ;
define HarmRuleL L -> 0 // Vowel %^ _  .o.
                 L -> o // BackVowel \Vowel+  _ ,,
                 L -> e // FrontUnroundedVowel \Vowel+ _ ,,
                 L -> ö // FrontRoundedVowel \Vowel+  _ ;
define HarmRuleN N -> 0 // Vowel %^ _  .o.
                 N -> o // BackVowel \Vowel+  _ ,,
                 N -> e // FrontUnroundedVowel \Vowel+ _ ,,
                 N -> ö // FrontRoundedVowel \Vowel+  _ ;
define HarmRuleH H -> 0 // Vowel %^ _  .o.
                 H -> u // BackVowel \Vowel+  _ ,,
                 H -> ü // FrontVowel \Vowel+ _ ;
define HarmRuleK K -> 0 // Vowel %^ _  .o.
                 K -> o // BackVowel \Vowel+  _ ,,
                 K -> e // FrontUnroundedVowel \Vowel+ _ ,,
                 K -> ö // FrontRoundedVowel \Vowel+  _ ;
define HarmRuleQ Q -> o // BackVowel \Vowel+  _ ,,
                 Q -> e // FrontUnroundedVowel \Vowel+ _ ,,
                 Q -> ö // FrontRoundedVowel \Vowel+  _ ;
define HarmRuleR R -> u // BackVowel \Vowel+  _ ,,
                 R -> ü // FrontVowel \Vowel+ _ ;
define HarmRuleS S -> j ||  Vowel %^  _ .o.
                 S -> 0 //  \Vowel+ _ ;
define HarmRuleT T -> a // BackVowel \Vowel+  _ ,,
                 T -> e // FrontVowel \Vowel+ _ ;
define HarmRuleW W -> o // BackVowel \Vowel+  _ ,,
                 W -> e // FrontVowel \Vowel+ _ ;

define Ablaut   é -> e || _ z "^" [ \0 ] ; 

read lexc hun41.lexc
define Lexicon

define Grammar Lexicon           .o.
               HarmRuleO         .o. 
               HarmRuleB         .o. 
               HarmRuleA         .o. 
               HarmRuleJ         .o. 
               HarmRuleU         .o. 
               HarmRuleC         .o. 
               HarmRuleZ         .o. 
               HarmRuleD         .o. 
               HarmRuleF         .o. 
               HarmRuleV         .o. 
               HarmRuleG         .o. 
               HarmRuleI         .o. 
               HarmRuleP         .o. 
               HarmRuleL         .o. 
               HarmRuleN         .o. 
               HarmRuleH         .o. 
               HarmRuleK         .o. 
               HarmRuleQ         .o. 
               HarmRuleR         .o. 
               HarmRuleS         .o. 
               HarmRuleT         .o. 
               HarmRuleW         .o. 
               Ablaut            .o. 
               Etoee             .o. 
               Cleanup;

regex Grammar;

Original issue reported on code.google.com by [email protected] on 1 Jan 2012 at 7:03

Question: possible to write a binary file from the command line (external script)?

What steps will reproduce the problem?

If I open foma and run 'source phonology.foma' and then 'save stack 
phon_bin.foma', I get a binary file that works as expected, i.e., I can run 
'echo "word" | flookup -x phon_bin.foma' and I get a correct parse.

However, if I try to repeat this procedure using Python's subprocess module, 
the binary file generated is not as expected.  The Python script is basically 
as follows (with absolute paths replacing the filenames).

import subprocess
process = subprocess.Popen(['foma'], shell=False, stdin=subprocess.PIPE, 
stdout=subprocess.PIPE)
process.stdin.write('source phonology.foma')
process.stdin.write('save stack phon_bin.foma')

What is the expected output? What do you see instead?

I expect to get the same output (i.e., binary file) from the command line as I 
do from the Python script.  However, the Python strategy results in an FSM with 
2 states (678 bytes. 2 states, 2 arcs, 2 paths) while the command line strategy 
correctly results in one with 20 (1.0 kB. 20 states, 24 arcs, 10 paths).

What version of the product are you using? On what operating system?

Foma, version 0.9.14alpha
Mac OS X 10.6.8
Python 2.5/2.6

Please provide any additional information below.

Is my approach incorrect?  The goal is to be able to generate FSTs using foma 
from within a python application.

Is it possible to use foma to convert a foma script into a binary 
representation right from the command line?

Original issue reported on code.google.com by [email protected] on 17 Feb 2012 at 4:16

How to represent FST network visually

What steps will reproduce the problem?
1. Create parallel Regular expression for two replacement rules
2. like: regex p -> "pl" ,, "p2" -> "p4" ;  

my question is there any tool or library to visualize the compiled regular 
expression as  FST network plotted visually.

Thanks

Original issue reported on code.google.com by [email protected] on 16 Jan 2013 at 11:23

.o or o. or o instead of .o. causes words to be shown unknown

What steps will reproduce the problem?

foma file reads in lexc files like:
define Grammar7  Filter1               .o. 
                Lexicon7               .o.                
                CleanupEndings        .o.
                HarmRuleAAA           .o. <--- if instead of .o. there is .o 
               ...
                HarmRuleszorelolmely      .o.
                Cleanup;         
one after the other.

If in one such set there is .o or o. or o instead of .o., then the effect is:
No error or warning, but every word of this lexc file is reported unknown, like:
apply up> ballag
???
apply up>

What is the expected output? What do you see instead?
It would be very nice, it this kind of error were reported at compilation time. 
It took me several hours the find out the source of the problem now.

Original issue reported on code.google.com by [email protected] on 13 Mar 2012 at 6:02

mhulden / foma Goto Github PK

foma's People

Contributors

Stargazers

Watchers

Forkers

foma's Issues

Recommend Projects

Recommend Topics

Recommend Org