aunsiels / pyformlang Goto Github PK
View Code? Open in Web Editor NEWA python library to manipulate formal languages and various automata
Home Page: https://pypi.org/project/pyformlang/
License: MIT License
A python library to manipulate formal languages and various automata
Home Page: https://pypi.org/project/pyformlang/
License: MIT License
Hi, @Aunsiels!
First of all, thanks a lot for the wonderful library!
I was translating the context-free grammar into the Chomsky normal form by using to_normal_form()
function and noticed this strange bug.
>>> from pyformlang.cfg import CFG
>>> cfg = CFG.from_text("S -> a S b S")
>>> cnf = cfg.to_normal_form()
>>> cnf.productions
set()
However, I was expecting to see something like this.
S -> S0 S
S0 -> S2 S1
S1 -> b
S2 -> S3 S
S3 -> a
Ubuntu 20.04
3.8.5
0.1.24
, 0.1.23
, 0.1.22
, 0.1.21
, 0.1.20
from pyformlang.finite_automaton import EpsilonNFA, State, Symbol, Epsilon
enfa = EpsilonNFA()
state0 = State(0)
state1 = State(1)
symb_a = Symbol("0")
symb_b = Symbol("1")
enfa.add_start_state(state0)
enfa.add_final_state(state1)
enfa.add_transition(state0, symb_a, state0)
enfa.add_transition(state1, symb_b, state0)
enfa.add_transition(state1, symb_b, state1)
# Turn a finite automaton into a regex...
regex = enfa.to_regex()
# And turn it back into an epsilon non deterministic automaton
enfa2 = regex.to_epsilon_nfa()
Source: Here
This code generates AttributeError: module 'pyformlang' has no attribute 'regular_expression
But importing Regex from pyformlang.regular_expression fixes that issue.
Why a method call to a library requires explicit import?
Importing Regex from pyformlang.regular_expression at epsilon_nfa.py causes circular import.
So importing Regex at runtime from to_regex() may resolve this. Like
def to_regex(self) -> "Regex":
from pyformlang import regular_expression
...
...
...
return regular_expression.Regex(res)
at epsilon_nfa.py
Hello,
When I was using regular expressions in my program, I found some unusual behavior.
For example, I can write this:
Python 3.9.0 (default, Nov 18 2020, 13:28:38)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from pyformlang.regular_expression import Regex
>>> Regex('a a | a')
((a.a)|a)
>>> Regex('a . a | a')
((a.a)|a)
>>> Regex('a* a | a')
(((a)*.a)|a)
So, I realise that priority of operator .
more than priority of operator |
. But when I use *
just before |
, priority of this operations changes:
>>> Regex('a a* | a')
(a.((a)*|a))
>>> Regex('a . a* | a')
(a.((a)*|a))
>>> Regex('a . a* | (a)')
(a.((a)*|a))
>>> Regex('a* a* | a')
((a)*.((a)*|a))
I can define expression ((a.(a)*)|a)
only through using parenthesis:
>>> Regex('(a a*) | a')
((a.(a)*)|a)
And error isn't in printing:
>>> Regex('b a* | a').accepts('b')
True
>>> Regex('b a* | a').accepts('a')
False
>>> Regex('(b a*) | a').accepts('a')
True
>>>
Hi,
I read your doc on readthedocs and browsed through the code, but I'm still not sure when which type of PDA is used.
When adding the transitions step by step to the PDA, should this be a PDA accepting on final state or on empty stack? Is this determined by checking whether final_state
contains some states?
I noticed that when calling to_final_state()
and to_empty_stack()
in either calls the (same) automata changed. Is it assumed that the automata is in accept by state
mode when calling to_empty_stack()
and analog for accept by empty stack
?
(the thought of the functions assuming a mode of the PDA comes to me by the comments like Turns the current PDA final state to empty stack
)
Then when converting to CFG, is it also assumed that the automata was in accept by empty stack
(that's how I'd interpret the comment Turns this PDA on empty stack accepting into a CFG
)?
Please elaborate on what is set how and what is assumed.
Hello,
I would like to use standard regular expressions along with the core functions of your library, such as building an automaton from a regular expression. How can i do this?
Thanks for pyformlang!
Is there algorith to convert epsNFA to NFA («remove epsilon transitions»)?
I could not find it, but may be I missed something.
Or should it be here? (PR with this algorithms will be welcomed here?)
Hello! I found some problem with CNF. After translating my CNF grammar to text I tried to read it from text again, but got another grammar...
cfg = CFG.from_text("S -> a S b | a b")
cnf = cfg.to_normal_form()
cnf.contains([Terminal("a"),Terminal("b")])
True
new_text = cnf.to_text()
new_cnf = CFG.from_text(new_text)
new_cnf.contains([Terminal("a"), Terminal("b")])
False
Can you fix this please?
P.S. I think it might be happening because of wrong translation of variables (creating variables with lowercase first letter from terminals (a#CNF# -> Terminal(a))
Hi, @Aunsiels!
Thank you for the awesome library! 👏
Unfortunately, while using your library, I got the error 🐛 mentioned in the title. 😞
from pyformlang.cfg import CFG
from pyformlang.rsa import RecursiveAutomaton as RSA
rsa = RSA.from_cfg(CFG.from_text("S -> ((a|((b|c))))"))
I think this is due to the fact that instead of a regular expression on the right side, CFG
breaks it down into products(using |
) of the form like this:
S -> c))))
S -> ((a
S -> ((b
Which will be converted to text like this
S -> c))))|((a|((b
I can suggest replacing the from_cfg()
function with the from_text()
function, since regex in production body is not supported in CFG
in the expected way.
But these will be backward-incompatible changes and I want to know what you think about this?
Character ranges work incorrectly when braces or brackets are used inside them. Starting with braces, the expressions r"[{}]"
and r"[\{\}]"
work correctly, e.g.
from pyformlang.regular_expression import PythonRegex
PythonRegex(r"[{}]").accepts(["{"])
returns True
. However, using r"[{-}]"
raises an error and r"[\{-\}]"
works incorrectly as the expression does not accept |
.
Using brackets does not really work at all. The expressions r"[\[]"
, r"[Z-\[]"
, r"[\[-a]"
and r"[\[-\]]"
do not accept anything and r"[\]]"
, r"[Z-\]]"
and r"[\]-a]"
raise errors.
$ python3
Python 3.7.5 (default, Apr 19 2020, 20:18:17)
[GCC 9.2.1 20191008] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from pyformlang.cfg import CFG
>>> gr = CFG.from_text('S -> a S b S\nS -> ')
>>> gr.generate_epsilon()
True
>>> new_gr = gr.to_normal_form()
>>> new_gr.generate_epsilon()
False
But I think that language generated by new_gr should be equal to language generated by gr. Including an epsilon.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.