Coder Social home page Coder Social logo

ando-php's People

Contributors

aercolino avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

ando-php's Issues

[Regex] Support lexical backreferences, numbered (absolute and relative) and named.

I call lexical backreference one like

  • (?1) (numbered absolute)
  • (?-1) (numbered relative)
  • (?P>name) (named)
  • (?&name) (named)

Defined at: http://php.net/manual/en/regexp.reference.recursive.php

If the syntax for a recursive subpattern reference (either by number or by name) is used outside the parentheses to which it refers, it operates like a subroutine in a programming language. An earlier example pointed out that the pattern (sens|respons)e and \1ibility matches "sense and sensibility" and "response and responsibility", but not "sense and responsibility". If instead the pattern (sens|respons)e and (?1)ibility is used, it does match "sense and responsibility" as well as the other two strings. Such references must, however, follow the subpattern to which they refer.

I'm not sure about my name for this syntax (lexical backreference). I invented it looking at the above example about non-recursive usage. Lexical means that such a backreference represents a previous group as it was defined not as it will be matched. In fact, the following regular expressions are identical:

@(sens|respons)e and (?1)ibility@
@(sens|respons)e and (?:sens|respons)ibility@

There is also another interesting fact about my naming conventions, and it is that recursive backreferences are self lexical backreferences.

For the time being, I'll stick to it.

[Regex] Support all named groups.

Named groups are like

  • (?P<name>pattern)
  • (?<name>pattern)
  • (?'name'pattern)

Currently only the angle brackets type is supported.

Defined at: http://php.net/manual/en/regexp.reference.subpatterns.php

It is possible to name a subpattern using the syntax (?Ppattern). This subpattern will then be indexed in the matches array by its normal numeric position and also by name. PHP 5.2.2 introduced two alternative syntaxes (?pattern) and (?'name'pattern).

[Regex] Support numbered forward backreferences.

I call forward backreference one that appears before the group it refers to.

Defined at: http://php.net/manual/en/regexp.reference.back-references.php

However, if the decimal number following the backslash is less than 10, it is always taken as a back reference, and causes an error only if there are not that many capturing left parentheses in the entire pattern. In other words, the parentheses that are referenced need not be to the left of the reference for numbers less than 10. A "forward back reference" can make sense when a repetition is involved and the subpattern to the right has participated in an earlier iteration. See the section entitled "Backslash" above for further details of the handling of digits following a backslash.

[Regex] Support self backreferences.

I call self backreference one that refers to the same group it appears into.

Defined at: http://php.net/manual/en/regexp.reference.back-references.php:

A back reference that occurs inside the parentheses to which it refers fails when the subpattern is first used, so, for example, (a\1) never matches. However, such references can be useful inside repeated subpatterns. For example, the pattern (a|b\1)+ matches any number of "a"s and also "aba", "ababba" etc. At each iteration of the subpattern, the back reference matches the character string corresponding to the previous iteration. In order for this to work, the pattern must be such that the first iteration does not need to match the back reference. This can be done using alternation, as in the example above, or by a quantifier with a minimum of zero.

[Regex] Support recursive interpolations.

I call recursive interpolations those where some variables of a template may themselves be templates with more variables and so on, and the interpolation provides a selection of all the variables at any level (but all variables with same name are the same variable) and the substitution is carried on until all provided variables have been interpolated.

This feature will make it a no brainer to interpolate variables starting from high level templates and providing a flat list of variable definitions.

I wonder if it makes sense to preserve in the Regex all previously interpolated variables, so that when new variables are defined using templates with old variables, interpolations could take place immediately and automatically.

[Regex] Document the fact that expressions must have balanced parentheses.

It is a requirement in Regex::count_matches(). Exceptions are currently thrown if the pattern to count contains duplicate numbers and those parentheses are not balanced. That means that all expressions should have balanced parentheses even if they contain variables, even if they are contained in variables.

I don't think it's worth to go through the pain of relaxing this requirement.

[Regex] Support all comments.

Comments are like

  • (?# ... )
  • # ... \n -- only when the PCRE_EXTENDED is set (and '#' not escaped nor into [])

Defined at: http://php.net/manual/en/regexp.reference.comments.php

The sequence (?# marks the start of a comment which continues up to the next closing parenthesis. Nested parentheses are not permitted. The characters that make up a comment play no part in the pattern matching at all.

If the PCRE_EXTENDED option is set, an unescaped # character outside a character class introduces a comment that continues up to the next newline character in the pattern.

[Regex] Support all non-capturing groups.

Non-capturing groups are like

  • (?:a)
  • (?i)
  • (?i:a)

Defined at: http://php.net/manual/en/regexp.reference.subpatterns.php

The fact that plain parentheses fulfill two functions is not always helpful. There are often times when a grouping subpattern is required without a capturing requirement. If an opening parenthesis is followed by "?:", the subpattern does not do any capturing, and is not counted when computing the number of any subsequent capturing subpatterns. For example, if the string "the white queen" is matched against the pattern the ((?:red|white) (king|queen)) the captured substrings are "white queen" and "queen", and are numbered 1 and 2. The maximum number of captured substrings is 65535.

As a convenient shorthand, if any option settings are required at the start of a non-capturing subpattern, the option letters may appear between the "?" and the ":". Thus the two patterns

(?i:saturday|sunday)
(?:(?i)saturday|sunday)

match exactly the same set of strings. Because alternative branches are tried from left to right, and options are not reset until the end of the subpattern is reached, an option setting in one branch does affect subsequent branches, so the above patterns match "SUNDAY" as well as "Saturday".

Document features and limitations.

As it appears, the Regex class (in particular, but any other too) is still pretty immature and it only allows for a very limited set of features. It's a good time to start documenting organically what is allowed and what is not, also taking into account the many TODOs / issues.

[Regex] Support g-backreferences, absolute and relative.

I call g-backreference one like \g1 to \g99, or \g{1} to \g{99} (absolute), and \g-1 to \g-99, or \g{-1} to \g{-99} (relative).

Defined at: http://php.net/manual/en/regexp.reference.back-references.php

As of PHP 5.2.2, the \g escape sequence can be used for absolute and relative referencing of subpatterns. This escape sequence must be followed by an unsigned number or a negative number, optionally enclosed in braces. The sequences \1, \g1 and \g{1} are synonymous with one another. The use of this pattern with an unsigned number can help remove the ambiguity inherent when using digits following a backslash. The sequence helps to distinguish back references from octal characters and also makes it easier to have a back reference followed by a literal number, e.g. \g{2}1.

The use of the \g sequence with a negative number signifies a relative reference. For example, (foo)(bar)\g{-1} would match the sequence "foobarbar" and (foo)(bar)\g{-2} matches "foobarfoo". This can be useful in long patterns as an alternative to keeping track of the number of subpatterns in order to reference a specific previous subpattern.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.