Coder Social home page Coder Social logo

ctxppc / patternkit Goto Github PK

View Code? Open in Web Editor NEW
9.0 3.0 0.0 324 KB

A Swift library for writing & manipulating patterns for matching on collections.

License: MIT License

Swift 100.00%
swift regular-expression-engine regular-languages subsequence capture

patternkit's Introduction

The PatternKit library

PatternKit is a Swift-based nondeterministic automaton parser and regular expression engine.

  • Fully designed for and written in Swift
  • Supports arbitrary Sequences with Equatable elements (String, [Int], and so on)
  • Batteries included — including full support for matching over Unicode extended grapheme clusters
  • Type-safe — syntax checked at build time by the compiler
  • No parsing/building overhead
  • Value semantics
  • Extensible architecture
  • Backreferences
  • Backtracking
  • Shorthand and longhand notation

PatternKit is still a pre-1.0 project — please use with care. :-)

Usage

(Author's note: Some examples in this document might become outdated as PatternKit evolves.)

The following matches very simple e-mail addresses.

import PatternKit

let alphanumeric = ("A"..."Z") | ("a"..."z") | ("0"..."9")
let part = alphanumeric+ • "." • alphanumeric+

let userToken = Token(part)
let hostToken = Token(part)

let emailAddress = userToken • ("@" as Literal) • hostToken

if let match = emailAddress.matches(over: "[email protected]").first {
    let userPart = match.captures(for: userToken)[0]
    let hostPart = match.captures(for: hostToken)[0]
    print("User \(userPart) at \(hostPart)")		// Prints "User johnny.appleseed at example.com"
}

Note that the Match.captures(for:) method returns an array of captured strings instead of a single string. Like in other regex engines, a token can be used within a repeating pattern. However, unlike most engines, every new capture by that token is appended to the array instead of discarded.

The following matches the request line of a simple HTTP request.

import PatternKit

let verb = "GET" | "POST" | "PUT" | "HEAD" | "DELETE" | "OPTIONS" | "TRACE"
let version = "0"..."9" • ("." • "0"..."9")+

let pathComponent = "A"..."Z" | "a"..."z" | "0"..."9" | "_" | "+"
let path = ("/" • pathComponent*)+
let pathToken = Token(path)

let requestLine = Anchor.leading • "GET" • CharacterSet.whitespace+ • pathToken • "HTTP/" • version • "\x10\x13"

The following matches subsequences of three or more 6s, followed by two rolls greater than or equal to 3, and followed by three or more 6s again.

let snakes = 6.repeated(min: 3)
let pattern = snakes • (3...6).repeated(exactly: 2) • snakes
let diceRolls = [1, 5, 2, 6, 6, 6, 4, 4, 6, 6, 6, 4, 3, 6, 6, 6, 3, 5, 6, 6, 6, 6]

for match in pattern.matches(in: diceRolls) {
    let diceRollsNumber = match.startIndex + 1
    let foundPattern = match.subsequence.joined(separator: " ,")
    print("Pattern \(foundPattern) found at roll (diceRollsNumber)")
}

// Prints "Pattern 6, 6, 6, 4, 4, 6, 6, 6 found at roll 4" and "Pattern 6, 6, 6, 3, 5, 6, 6, 6, 6 found at roll 14"

The following matches any subsequence of a's, followed by b's, and followed by the same number of a's.

let firstPart = Token("a"+)
let secondPart = "b"+
let thirdPart = Referencing(firstPart)
let pattern = firstPart • secondPart • thirdPart

print(pattern.hasMatches(in: "aaaabbbaaaa"))		// Prints "true"

PatternKit supports all target Sequences with Equatable elements. Non-Comparable elements which are Equatable are also supported, but without support of ranges (... and ..<).

Available patterns & extensibility

PatternKit is designed from the ground up to be extensible with more patterns. Patterns are values that conform to the Pattern protocol. PatternKit provides the following pattern types, and clients can implement more themselves as needed.

  • A literal pattern (Literal) matches an exact sequence. PatternKit includes literal-convertible conformances so that string, character, and integer literals can be used directly within pattern expressions, without having to resort to the Literal initialiser.

  • A repeating pattern (EagerlyRepeating and LazilyRepeating) matches a subpattern consecutively, with an optional minimum and maximum. PatternKit extends Equatable and Pattern with the repeated(min:), repeated(min:max:), and repeated(exactly:) methods as well as the *, +, and ¿ postfix operators which form a repeating pattern on the element, sequence, or subpattern it's called on.

  • An anchor (Anchor) matches a boundary or position, such as a word boundary or the beginning of the sequence. By default, pattern matching begins and ends anywhere in a sequence, so leading and trailing anchors are required if a pattern must match the whole sequence at once.

  • A token (Token) matches what its subpattern matches, but also captures it. The captured subsequences can be retrieved using match.captures(for: token).

  • A referencing pattern (Referencing) matches a subsequence previously captured by a token.

In addition, PatternKit adds conformance to Swift's range types and Foundation's CharacterSet so that they can be used readily in patterns.

patternkit's People

Contributors

ctxppc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

patternkit's Issues

Only find one match per position

Hi, nice library, I especially like the syntax.

Is there any way of parsing only one match per position? Now when I run code like this:

let myany = LazilyRepeating(one() as Wildcard<String>)
let token = Token("."  myany  Literal(" ")  myany  " ")
let parser = myany  token
parser.forwardMatches(enteringFrom: Match<String>(over: #" .1  slkjdf.2 .3  "#)).forEach {
	print($0.captures(for: token))
}

it seems to find all possible matches for each position:

[".1  "]
[".1  slkjdf.2 "]
[".1  slkjdf.2 .3 "]
[".1  slkjdf.2 .3  "]
[".1  slkjdf.2 "]
[".1  slkjdf.2 .3 "]
[".1  slkjdf.2 .3  "]
[".1  slkjdf.2 .3 "]
[".1  slkjdf.2 .3  "]
[".1  slkjdf.2 .3  "]
[".2 .3 "]
[".2 .3  "]
[".2 .3  "]
[".3  "]

How do I tell LazilyRepeating to only find the shortest possible match?

Licence

What is the licence for this project?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.