Coder Social home page Coder Social logo

irregex's People

Contributors

ak-1 avatar ashinn avatar chaw avatar codemac avatar evhan avatar graywolf avatar lemonboy avatar sjamaan avatar snan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

irregex's Issues

Extracting non-existant submatches should be consistent

Extracting a named submatch that doesn't exist gives an error, but
extracting a numbered submatch that doesn't exist gives #f. Either the
former should return #f or the latter should raise a similar error.

#;1> (load "irregex.scm")
; loading irregex.scm ...
#;2> (define r (irregex '(seq "foo")))
#;3> (define m (irregex-match r "foo"))
#;4> (irregex-match-substring m 'x)

Error: unknown match name: x
....
#;5> (irregex-match-substring m 5)
#f

Original issue reported on code.google.com by [email protected] on 7 Apr 2010 at 6:57

Convert embedded posix strings in a preprocessing step

Currently, posix-string sub-regexes are only handled in sre->procedure, which means they are only handled by the backtracking engine, even if the strings are strictly regular expressions.

We could simply call string->sre in sre->nfa and recur on the result, but if then the compilation is aborted and we compile to a backtracking matcher we'd have to do the same again. Perhaps it would be better to "normalize" the expression beforehand so that we don't do the same thing twice? We kind of already do this in string->irregex and irregex, but only on the "main" input.

Weird bol with /all

(string=
 (irregex-replace/all '(: bol (* " ")) "## my bloody valentine" "")
 (irregex-replace '(: bol (* " ")) "## my bloody valentine" ""))

irregex-r6rs refers to undefined function unicode-range->utf8-pattern

I generated the R6RS version with:

make irregex-r6rs.scm

The resulting file includes a call to an undefined function called unicode-range->utf8-pattern. The calling function seems to be unused and so I commented it out.

Also, I see test suites, but I'm not sure how to test this generated R6RS version. Is there a predefined way?

Thanks!

viable way to track development

I have got the latest revision 0.9.2 directly from the Synthcode site; the home 
page of irregex still references revision 0.9.0 and the last tarball available 
from Google Code is 0.8.3.

I want to track development of this package (I am including it in the 
distribution of Vicare Scheme); being that Google Code is going West, what 
about pushing the code to another site?  (G*cough*hub*cough*)

Original issue reported on code.google.com by [email protected] on 6 Sep 2013 at 9:22

irregex-match-substring using subpattern under kleene star can return extra chars

In the following example that uses SRE syntax, a named sub-pattern, and a kleene star (I think the or might be necessary too), the string returned by irregex-match-substring looks like it goes from the beginning of the first match to the end of the last match (including all chars in between).

pajaro2:/tmp/irregex (git)-[master]- clements> scheme
Chez Scheme Version 9.5.4
Copyright 1984-2020 Cisco Systems, Inc.

> (import (irregex))
> (define subpat2 '(=> subby (: "@" alphanum)))
> (define pat2 `(* (or alphanum ,subpat2)))
> (define match2 (irregex-match pat2 "oeh@2tu@2n342"))
> (irregex-match-names match2)
((subby . 1))
> (irregex-match-num-submatches match2)
1
> (irregex-match-substring match2 1)
"@2tu@2”

Empty concat causes vector error

Using an empty concatenation as part of an SRE pattern can cause an out-of-range vector reference.

(irregex-search '(or (:) "was") "abc")

pajaro2:/tmp/irregex (git)-[master]- clements> scheme
Chez Scheme Version 9.5.4
Copyright 1984-2020 Cisco Systems, Inc.

> (import (irregex))
> (irregex-search '(or (:) "was") "abc")
Exception in vector-ref: 3 is not a valid index for #(0 0)
Type (debug) to enter the debugger.
>

Predicate procedure for existence of submatches

I'm not so sure about this one anymore, but here goes:

It would be useful to have a procedure to check whether a given submatch
exists at all. For example:

  (irregex-submatch-exists? match <index-or-name>) => boolean

The naming could be better and it might be good to have a similar procedure
for irregex objects.

On the other hand, one could also extract the names and check whether the
name occurs in the list (but an existence check could possibly be
implemented more efficiently)

Original issue reported on code.google.com by [email protected] on 7 Apr 2010 at 7:07

Unused variable

The variable num-states in find-reorder-commands-internal is unused. As the initializing call is free from side-effects, the variable could be eliminated.

Issues with or, eow/bow and any

This code works as expected:

(irregex-extract '(: bow "foo" eow) "foo foo bar foo") ; => '("foo" "foo" "foo")

This code doesn't:

(irregex-extract '(or (: bow "foo" eow) any) "foo foo bar foo") ; => '("foo" " " "f" "o" "o" " " "b" "a" "r" " " "f" "o" "o")
;; Expected: '("foo" " " "foo" " " "b" "a" "r" " " "foo")

If I can help out with fixing this (if it is a bug, that is) please let me know.

bol overlaps with newline

This is either an irregex pro tip or a bug:

(: bol "blablabla" newline) will create overlaps if there are sequential matches. Capturing the newline there borks up the subsequent bol. It only works if the matches are non-sequential.

(: newline "blablabla" eol) doesn't have that problem at all.

(Now, the reason to capture newline instead of just using bol and eol is so that we can replace with "" to remove theh line entirely.)

Note that the (: newline "blablabla" eol) can't match the first line of the text. So this pro tip only works when you know you don't need to do it on the first line of text in a multiline string.

Not sure if there is an even better pro tip / best practice for this particular use case.

Installation instructions for Chicken

What steps will reproduce the problem?

See http://synthcode.com/scheme/irregex/#SECTION_2


What is the expected output? What do you see instead?

I see

    "If you are using Chicken Scheme, you can just run chicken-setup, or wait for it to show up in the eggs repository."

but I expected something like

   "irregex is built in CHICKEN as a core unit, so no need to install it.  To use it, you just need (use irregex)."

Original issue reported on code.google.com by [email protected] on 21 May 2013 at 8:56

Replacements of positive lookbehinds only replace first match

Reported on chicken-users:

irregex-replace/all only replaces first match when negative lookbehinds are involved

example:

the following regexp should replace any letter a preceded by x, y, or z

problematic code

(irregex-replace/all "(?<=[xyz])a"     "xa ya za"  "-")
; or 
(irregex-replace/all '(: (look-behind (or "x" "y" "z")) "a")   "xa ya za"  "-")

should return "x- y- z-" BUT actually returns "x- ya za"

Some compilation results of `irregex` cannot be cached (write/read)

I have a growing base of 50 complex irregex strings. I precompile all of them, store them in a file, and use the result. This works very nicely, but 1 of the irregex results contains a procedure, so that precompiling cannot work here.

Which expressions lead to procedures in the irregex result?

Can this behavior be influenced by options?
(The test is run on version 0.9.10 under bigloo; if it helps, I can repeat it under Chicken or Gauche or ...)

Thanks for this excellent tool for the Scheme community!

problems with utf8

Hi,

I've compiled irregex 0.9.4 with Chez Scheme 9.4. I run the test suite and all tests do pass, but the ones in test-irregex-utf8.scm.

I'm just learning SRE, but, if I understand correctly, the following example should match:
(irregex-search (irregex `(: "fede" (~ "Λ") "!") 'utf8) "fedeλ!")
=> #f

Is this wrong? Chez Scheme seems to handle utf8 pretty well.

Thanks you for sharing your code!

Empty matches on replace/all

(irregex-replace/all
 (irregex '(: bos (* whitespace)))
 "any gosh darn string" "")

⇒ "aany gosh darn string"

With a + instead of a kleenstar it works.

Named submatches should "stack"

When creating irregex-based parsers, it is *extremely* useful to be able to
use named submatches inside alternatives. Real-life example from an URI parser:

    `(or (seq ,scheme ":" ,hostname (submatch-named path ,slashed-segments))
         (submatch-named path ,path-noscheme) ... etc)

It would rock if irregex would be smart enough to let the name "path" refer
to the second alternative when the first fails.

You said this would be hard to make consistent in case more than one of
these could match.  However, I think it would be justified to simply
document the semantics. The user could then decide for himself whether he
wants to use it or not.

The behaviour as it is right now is simply to return a "random" submatch (I
think it returns the last one defined?), which practically makes it only
usable if you ensure all submatches are uniquely named.

I propose to change the definition to return the first (or last,
whichever is more useful/efficient/consistent) submatch that has a non-#f
value (and, of course, #f if all are #f).  That would be enough for these
situations.  It would work the same as it already does for the other cases,
but it would improve usability for matches with alternatives.

I'm not sure how to handle numbered submatches.  I suppose that would just
stay the same; the first alternative has number 1, the second number 2, in
all cases. In other words, only the semantics of named submatches would change.

Original issue reported on code.google.com by [email protected] on 7 Apr 2010 at 7:18

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.