Coder Social home page Coder Social logo

js-regex's Introduction

js-regex

What is it?

js-regex is a fluent regex builder for JavaScript. Its aim is to make the writing and maintenance of complicated regexes less taxing and error-prone.

Why?

Let's suppose that you've been asked to figure out why the following regex isn't working:

(SH|RE|MF)-((?:197[1-9]|19[89]\d|[2-9]\d{3})-(?:0[1-9]|1[012])-(?:0[1-9]|[12]\d|3[01]))-((?!0{5})\d{5})

If you're experienced with regexes, it's certainly possible to gain an understanding of it, but it takes longer than it should.

This is one example regex that has been built with this library; see below to see this example translated into a js-regex equivalent, or simply read on to go through most of the API before jumping into the complex examples.

Tests

In addition to the usage documented below, with a matching test suite here, there's a fair number of other test cases here.

Although there's a small number of testcase files right now, they actually cover the bases of the library and the combinations of methods you can invoke pretty well; please check them out if you're at all interested.

Usage

Simple usage with peek()

regex()
    .literals('abc')
    .peek();        // Will return 'abc'

Never stop chaining!

regex()
    .literals('abc')
    .call(function (curNode) {
        console.log(this === curNode); // Will print true
        console.log(curNode.peek());   // Will print 'abc'
    })
    .literals('def')
    .call(function (curNode) {
        console.log(curNode.peek());   // Will print 'abcdef'
    });

Special Flags

regex()
    .f.digit()
    .f.whitespace()
    .peek();       // Will return '\d\s'

Capture Groups

regex()
    .literals('aaa')
      .capture()
    .peek();        // Will return '(aaa)'

Repeating

regex()
    .literals('aaa')
      .repeat()
    .peek();        // Will return '(?:aaa)*'

regex()
    .literals('aaa')
    .call(function (curNode) {
        console.log(curNode.peek()); // Will print 'aaa'
    })
      .repeat(1, 3)
    .peek();                         // Will return '(?:aaa){1,3}'

Simple Grouping

regex()
    .sequence()
        .literals('aaa')
        .f.digit()
        .literals('bbb')
    .endSequence()
      .repeat()
    .peek();            // Will return '(?:aaa\dbbb)*'

regex().sequence('aaa', regex.flags.digit(), 'bbb')
    .repeat()
    .peek();            // Will return '(?:aaa\dbbb)*'

Character Sets

regex()
    .any('abcdefg')
    .peek();       // Will return '[abcdefg]'

regex()
    .any()
        .literals('abc')
        .f.digit()
    .endAny()
    .peek();            // Will return '[abc\d]'

regex()
    .none()
        .literals('abc')
        .f.whitespace()
    .endNone()
    .peek();            // Will return '[^abc\s]'

Or

regex()
    .either()
        .literals('abc')
        .literals('def')
    .endEither()
    .peek();             // Will return 'abc|def'

regex()
    .either('abc', regex.any('def'))
    .peek();             // Will return 'abc|[def]'

Macros

regex.create(); // Alternate form of regex()

regex
    .addMacro('any-quote') // Adding a global macro for single or double quote
        .any('\'"')
    .endMacro()
    .create()
        .macro('any-quote')
        .f.dot()
          .repeat()
        .macro('any-quote')
        .peek();           // Will return '['"].*['"]'

regex
    .addMacro('quote')
        .any('\'"')
    .endMacro()
    .create()
        .addMacro('quote') // Local macros override global ones
            .literal('"')  //  Here, restricting to double quote only
        .endMacro()
        .macro('quote')
        .f.dot()
          .repeat()
        .macro('quote')
        .peek();           // Will return '".*"'

Followed By

regex()
    .literals('aaa')
      .followedBy('bbb')
    .peek();            // Will return 'aaa(?=bbb)'

regex()
    .literals('ccc')
      .notFollowedBy('ddd')
    .peek();               // Will return 'ccc(?!ddd)

Complicated Regexes

Example 1

How quickly can you figure out what this is supposed to represent?

regex()
    .addMacro('0-255')
        .either()
            .sequence()
                .literals('25')
                .anyFrom('0', '5')
            .endSequence()
            .sequence()
                .literal('2')
                .anyFrom('0', '4')
                .anyFrom('0', '9')
            .endSequence()
            .sequence()
                .any('01').optional()
                .anyFrom('0', '9')
                .anyFrom('0', '9').optional()
            .endSequence()
        .endEither()
    .endMacro()
    .macro('0-255').capture()
    .literal('.')
    .macro('0-255').capture()
    .literal('.')
    .macro('0-255').capture()
    .literal('.')
    .macro('0-255').capture()
    .peek();

(Hint: it's described here, in the fourth section on the page.)

(Also note: this example uses the 'verbose' usage form, always closing portions with endXXX(); the Readme tests cover the same using an alternate form)

Business Logic Regex

So our 'business logic' regex looks like this:

(SH|RE|MF)-((?:197[1-9]|19[89]\d|[2-9]\d{3})-(?:0[1-9]|1[012])-(?:0[1-9]|[12]\d|3[01]))-((?!0{5})\d{5})

Written in human terms, that would be: one of three department codes, a dash, a YYYY-MM-DD date (after Jan 1, 1971), a dash, then a non 00000 5 digit number.

In converting this regex to use js-regex, we make use of macros to define the department code, the date, and the trailing number. Note that most of this example is spent setting up the date regex - if your situation called for many dates being used in the application, the cost of setting up this most complicated portion of the regex would only need to be done once, after which it would be usable in other circumstances with no code changes, and far greater readability.

Anyway, let's take a look:

regex
    // Setting up our macros...
    .addMacro('dept-prefix', regex.either('SH', 'RE', 'MF'))
    .addMacro('date',
        regex.either(
            regex.sequence(
                '197',
                regex.anyFrom('1', '9')),
            regex.sequence(
                '19',
                regex.any('89'),
                regex.flags.digit()),
            regex.sequence(
                regex.anyFrom('2', '9'),
                regex.flags.digit().repeat(3, 3))),
        '-',
        regex.either(
            regex.sequence(
                '0',
                regex.anyFrom('1', '9')),
            regex.sequence(
                '1',
                regex.any('012'))),
        '-',
        regex.either(
            regex.sequence(
                '0',
                regex.anyFrom('1', '9')),
            regex.sequence(
                regex.any('12'),
                regex.flags.digit()),
            regex.sequence(
                '3',
                regex.any('01'))))
    .addMacro('issuenum',
        regex.notFollowedBy()
            .literal('0')
                .repeat(5, 5),
        regex.flags.digit()
            .repeat(5, 5))
    // Macros are setup, let's create our actual regex now:
    .create()
        .macro('dept-prefix').capture()
        .literal('-')
        .macro('date').capture()
        .literal('-')
        .macro('issuenum').capture()
        .peek(); // Returns the string shown above this code example

Conclusion

Perhaps this library piques your interest. If so, cool! Let me know! Just know that there are a few things that I'd like to clean up before really releasing this library; see the issues page for details. That and more tests; it's probably too easy to step into a landmine of invalid or senseless regexes right now, so negative coverage (it is not possible to do these invalid things with js-regex) and more positive coverage are always helpful.

Really, Really Experimental Methods

Simple Testing

test() is still kinda pointless.

regex()
    .literal('a')
    .test('a');   // Will return true

Simple Replacing

replace() is probably pretty buggy, especially with multiple named capture groups.

regex()
    .literals('abc')
    .replace('abc', function () {
        return 'def';
    });              // Will return 'def'

Named Capture Groups

Probably buggy.

regex()
    .literals('bbb')
    .literals('aaa')
      .capture('named')
    .literals('bbb')
    .replace('aaa', function (groups) {
        console.log(groups.named);     // Will print 'aaa'
        return 'ccc' + groups.named + 'ccc';
    });                                // Will return 'cccaaaccc'

js-regex's People

Contributors

wyantb avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.