Coder Social home page Coder Social logo

moar-regex / moar Goto Github PK

View Code? Open in Web Editor NEW
21.0 5.0 3.0 1.59 MB

Deterministic Regular Expressions with Backreferences

License: MIT License

Java 90.07% ANTLR 1.04% TeX 8.89%
deterministic-regular-expressions java-patterns antlr moar regex regexp regex-pattern regex-engine regex-util regular-expression

moar's People

Contributors

s4ke avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

moar's Issues

Fix too harsh determinism checks

Dominik sent these:
()|()
(a+)+
(a|())+

These are all wrongly recognized as non deterministic. This is probably due to the EdgeGraph currently using Lists to store the edges. Maybe a switch back to Sets is a good idea.

GUI tool to create MOAs

As MOAs are more expressive than the Deterministic Regexes, a tool to build MOAs in a GUI would be nice.

Features of such a tool would be:

  • Loading a Regex/JSON representation
  • Creation of new MOAs
  • Manipulation of loaded Regexes/JSON-MOAs
  • Storing the JSON representation of the modeled Graph
  • Testing of the MOA against input strings (no need to define the input alphabet. Ours is always the whole UTF-16 range)
  • Obligatory Determinism Check upon Export to JSON/Testing
  • Optional removal of unneeded States (if the graph is not connected)

Behaviour:

Starting with an empty MOA that doesn't accept any input (L(A) = {}) the user can add states and edges to the graph. Edges are added by click dragging from the start to the end of the edge (a state). These edges then can be annotated with o(x),r(x),c(x) for variables. If the click drag ends on blank space a State is created automatically. (States should be able to be created on their own as well). States are marked with an equivalent representation as in the JSON format. (To see the format, try the cli tool and export a Regex containing all features and as much syntactic sugar as possible to a JSON file).

Translation into MOA:

The easiest way to translate this into a MOA would either be to create the JSON string and then building the MOA from that or by creating the MOA by putting everything into the internal data-structures by hand. The JSON version would require less tight coupling to the MOA internal API.

Possible APIs to use:

  • JGraphX for visualization (or hand written visualization as we don't need all the features, and JGraphX's API seems outdated)
  • JGraphT for representation of the Graph (otherwise, the Graph can easily be represented in a similar way as the internal EdgeGraph of the MOAs does it, no need for a fully fletched mathematical Graph API)

Lucene Regexp Query

A cool use case for the regexes would be using them in a Lucene FullText Index. This can be achieved by modding the RegexQuery class from the original Lucene into an alternative version that works with this library instead of Java Patterns.

Planned as Maven Artifact?

I have written a regex benchmark comparing different regex engines for Java. Lately I found your approach and would be curious how it performs compared to the other alternatives.

Yet I restricted the benchmark to projects that are available as maven library artifact. I am not certain whether your project may be distributed in such a way. If so, I would offer to integrate a new benchmark in regexbench (a pull request is also welcome).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.