Coder Social home page Coder Social logo

malelabts / regexgenerator Goto Github PK

View Code? Open in Web Editor NEW
933.0 933.0 144.0 11.43 MB

This project contains the source code of a tool for generating regular expressions for text extraction: 1. automatically, 2. based only on examples of the desired behavior, 3. without any external hint about how the target regex should look like

License: GNU General Public License v3.0

Shell 0.11% Java 99.89%

regexgenerator's People

Contributors

adelorenz avatar ftarlao avatar malelabts avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

regexgenerator's Issues

Tests offered are insufficient to evaluate the effectiveness of the software

The goals of the academic paper published were VERY interesting. This got my attention and I thought it was worth looking at.

But there are problems in what is provided.

I do not see any extensive test data and the results of creating regular expressions with that test data here in this GitHub repo. Instead, the only test I saw - DataSetTest.java - is very, very simple. This is not adequate, not even for software produced by an academic institution.

It is not even the case that the test properly illustrates how one would use the software.

So here is the issue:

The kind of software described in the academic paper could be quite useful, but to be evaluated, there should be tests that use various datasets and then show the accuracy of the discovered regular expressions. In addition, those tests should make it clear to an outsider how one would use the software.

If you want your software to be used, you should make it practical to evaluate and adopt.

golf tool should offer import/export

I am sorry to post this here, but I can't find a repo for the regex golf tool. The thing is that I am trying to find a regex for all ~420 valid (actually used in the language) Mandarin syllables (as in http://pinyin.info/rules/initials_finals.html) and nothing else out of the 22*35=770 possible combinations. This task has historically lead to monster regexes, and I am quite interested in how RG works on it.

By the descriptions the golf tool seemed to be the optimal tool for such a use-case. I can manually construct a "dataset", but that's a bit weird.

How it works?

Hi!
I need something similar to parse javascript stack strings in different environments and extract the type, message and stack frames (called function, location). Almost every browser has its own stack string format and I can test only a few environments. E.g.:

old Opera:

Statement on line 44: Type mismatch (usually a non-object value used where an object is required)
Backtrace:
  Line 44 of linked script file://localhost/G:/js/stacktrace.js
    this.undef();
  Line 31 of linked script file://localhost/G:/js/stacktrace.js
    ex = ex || this.createException();

V8 (Chrome, Node):

ReferenceError: x is not defined
    at repl:1:5
    at REPLServer.self.eval (repl.js:110:21)
    at repl.js:249:20

It would be nice to write a parser which is adaptive and learns the actual environment on the fly. Can you tell me more about what algorithm you use to generate the regex or how your lib works in general?

edit:
I just read in a different issue that there is no javascript port, because of a missing feature in the js regex lib. Does that mean it is not possible to port this lib at all? If so, is it possible to write something more specific with a different algorithm to solve my problem?

It seems not support multiple language

I use simple example to data set only 5

王安齊:你好,草擬馬。[0,3]
豬豬:我是河馬王ㄎㄎㄎ[0,2]
孫紫晴:不會 麻煩你了 謝謝[0,3]
王儀婷:安卓是真的都蠻好用的就是[0,3]
羅思慶:哪條不爽 註解掉就好了[0,3]

but it can not generate result

Online version not working

The online version of the RegexGenerator++ prompts the following error when trying to run the example dataset:

The application generates the following error:

But no further error info is provided.

unwanted match cannot be removed

Entering examples on the website (http://regex.inginf.units.it/) went smoothly until my 9th example, 1c==guess_count. Why can't I enter that as a matchless example? I cannot remove the unwanted, automatic match that appears (on the entire string).

(Also, I was not able to submit the above via the website's Feedback button: there was no response to clicking Send.)

Regex not working

I am currently doing research to understand the software. The regex generated do not seem to be working when used natively in java. For example, using the available dateset References/Lead-Author, the regex "(?<=\. )[^,]++, (?:\w\.)++" does not work.
Is there something that can be done?

Thanks

Evolvoe button is not active with my own example dataset

Hi team,

I am trying to test the tool with my own sample dataset, but the "Evolve" button is not active. I followed the video tutorial and actually it was working fine a few weeks ago. Would you please double check and fix?

If I click on "Try an example", the button becomes enabled. So, I tried to manipulate the examples and replace the text there with my own text, but I got the following error:

The application generates the following error: Dataset doesn't respect the imposed size limits Ok

Thanks,
Moahemd

example config file and input

I try to run the software (without GUI) as follows:

$ git clone [email protected]:MaLeLabTs/RegexGenerator.git
$ cd RegexGenerator
$ cd "Random Regex Turtle"
$ ant -Dplatforms.JDK_1.7.home $JAVA_HOME jar
$ java -jar ./dist/Random_Regex_Turtle.jar 
Usage: java -jar "Random_Regex_Turtle.jar" configFileName [startGui]

Can you please provide an example config file and input (without using the GUI)?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.