ceedubs / irrec Goto Github PK
View Code? Open in Web Editor NEWcomposable regular expressions based on Kleene algebras and recursion schemes
Home Page: https://ceedubs.github.io/irrec/
License: Apache License 2.0
composable regular expressions based on Kleene algebras and recursion schemes
Home Page: https://ceedubs.github.io/irrec/
License: Apache License 2.0
Turn the README into a website and add a page to play around with regexes (test match candidates, generate matches, etc).
Consider using discrete interval trees for ranges. I think that this would make it a lot more straightforward to support character class intersection, union, etc. It might also improve the scalacheck generator for negative character classes.
Currently when you write regex("a{1,3}")
irrec represents this as a|aa|aaa
. This gets really out of hand if you use a large number as the upper-bound.
Currently capture groups aren't respected when parsing regular expression strings. The method could be changed to return a List
of captured groups.
I haven't encountered any issues yet, but should matching be changed to use trampolining for stack safety?
Like \d
, \w
, etc.
There are two hiccups here:
Rig
from algebra. This means introducing an algebra dependency, but that's probably fine.a|b
is equivalent to b|a
. They are effectively equivalent as they will return consistent answers for all input, but irrec doesn't yet have a good way to check this equality.Currently the pretty printer will print a character class containing a colon as [:]
. However, the parser will reject this, because when it sees [:
it expects a POSIX character class.
It may not make sense to accept POSIX character classes, since we aren't actually doing anything sensitive to the locale. Maybe \p
expressions would make more sense.
And any adjustments that make sense to make this easier.
Character classes have fewer special characters than other parts of regexes. Irrec should allow characters like *
to not be escaped within characters classes. Escaping these characters should still be accepted when parsing.
It would probably be cleaner to have the pretty-printer not escape these characters within character classes.
Ideally via a single configuration as opposed to making each Elem
check for both upper and lower case.
NFA.runNFA
is currently implemented with a foldLeft
. After any step if we have no available states left, we could terminate the fold. This could be done with foldLeftM
and for long inputs could be more efficient.
This probably shouldn't be done until #2 is complete to assess whether or not it actually helps.
It should be fairly straightforward (though perhaps not particularly efficient) to add a helper method that matches the behavior of {3,5}
in a regex.
I just realized that I created a logo that's almost identical to the one used for regexr. I thought that when I created the logo it was original, but maybe I had seen this one in the past and subconsciously remembered it ๐คทโโ๏ธ.
Ex: capturing isn't (currently?) supported; backreferences probably never will be.
Currently the Scalacheck generators aren't generating regular expressions with negative character classes, because there's not a good way to create strings that match them (we have to use Gen.filter
. Using discrete interval trees along with a Diet
that provides all allowed characters should help to resolve this.
It would be cool to be able to play with regular expressions in the browser via scala.js. It looks like all of irrec's dependencies are already built for scala.js, so in theory this should be straightforward to support.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.