moar-regex / moar Goto Github PK
View Code? Open in Web Editor NEWDeterministic Regular Expressions with Backreferences
License: MIT License
Deterministic Regular Expressions with Backreferences
License: MIT License
[+-] means either + or - and not the meta characters.
If we recognize these while checking for determinism, should we merge them?
some examples were not correct. these have to be fixed.
Dominik sent these:
()|()
(a+)+
(a|())+
These are all wrongly recognized as non deterministic. This is probably due to the EdgeGraph currently using Lists to store the edges. Maybe a switch back to Sets is a good idea.
Currently only (?:(?:a*)*)*
is allowed (but still non deterministic).
\r\n is not equal to \n.
Note to people coming accross this; This is no bug, but merely a flaw in how determinism is currently defined (it is too restrictive atm).
For example <
and >
are only needed for capturing groups. Escaping them outside of capturing gruops is a bit weird, but can be kept for simplicity of the parser.
As MOAs are more expressive than the Deterministic Regexes, a tool to build MOAs in a GUI would be nice.
Features of such a tool would be:
Behaviour:
Starting with an empty MOA that doesn't accept any input (L(A) = {}) the user can add states and edges to the graph. Edges are added by click dragging from the start to the end of the edge (a state). These edges then can be annotated with o(x),r(x),c(x) for variables. If the click drag ends on blank space a State is created automatically. (States should be able to be created on their own as well). States are marked with an equivalent representation as in the JSON format. (To see the format, try the cli tool and export a Regex containing all features and as much syntactic sugar as possible to a JSON file).
Translation into MOA:
The easiest way to translate this into a MOA would either be to create the JSON string and then building the MOA from that or by creating the MOA by putting everything into the internal data-structures by hand. The JSON version would require less tight coupling to the MOA internal API.
Possible APIs to use:
A cool use case for the regexes would be using them in a Lucene FullText Index. This can be achieved by modding the RegexQuery class from the original Lucene into an alternative version that works with this library instead of Java Patterns.
I have written a regex benchmark comparing different regex engines for Java. Lately I found your approach and would be curious how it performs compared to the other alternatives.
Yet I restricted the benchmark to projects that are available as maven library artifact. I am not certain whether your project may be distributed in such a way. If so, I would offer to integrate a new benchmark in regexbench (a pull request is also welcome).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.