Coder Social home page Coder Social logo

propensive / kaleidoscope Goto Github PK

View Code? Open in Web Editor NEW
161.0 12.0 8.0 3.21 MB

Statically-checked inline matching on regular expressions in Scala

Home Page: https://propensive.com/kaleidoscope/

License: Apache License 2.0

Scala 94.71% Shell 5.29%
regular-expression regex pattern-matching scala

kaleidoscope's Introduction

GitHub Workflow

Kaleidoscope

Statically-checked inline matching on regular expressions

Kaleidoscope is a small library which provides pattern matching using regular expressions, and extraction of capturing groups into values, which are typed according to the repetition of the group. Patterns can be written inline, directly in a case pattern, and do not need to be predefined.

Features

  • pattern match strings against regular expressions
  • regular expressions can be written inline in patterns
  • extraction of capturing groups in patterns
  • typed extraction (into Lists or Options) of variable-length capturing groups
  • static verification of regular expression syntax
  • simpler "glob" syntax is also provided

Availability Plan

Kaleidoscope has not yet been published. The medium-term plan is to build Kaleidoscope with Fury and to publish it as a source build on Vent. This will enable ordinary users to write and build software which depends on Kaleidoscope.

Subsequently, Kaleidoscope will also be made available as a binary in the Maven Central repository. This will enable users of other build tools to use it.

For the overeager, curious and impatient, see building.

Getting Started

To use Kaleidoscope, first import its package,

import kaleidoscope.*

and you can then use a Kaleidoscope regular expression—a string prefixed with the letter r—anywhere you can use a pattern in Scala. For example,

import anticipation.Text

def describe(path: Text): Unit =
  path match
    case r"/images/.*" => println("image")
    case r"/styles/.*" => println("stylesheet")
    case _             => println("something else")

or,

import vacuous.{Optional, Unset}

def validate(email: Text): Optional[Text] = email match
  case r"^[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,6}$$" => email
  case _                                            => Unset

Such patterns will either match or not, however should they match, it is possible to extract parts of the matched string using capturing groups. The pattern syntax is exactly as described in the Java Standard Library, with the exception that a capturing group (enclosed within ( and )) may be bound to an identifier by placing it, like an interpolated string substitution, immediately prior to the capturing group, as $identifier or ${identifier}.

Here is an example:

enum FileType:
  case Image(text: Text)
  case Stylesheet(text: Text)

def identify(path: Text): FileType = path match
  case r"/images/${img}(.*)"  => FileType.Image(img)
  case r"/styles/$styles(.*)" => FileType.Stylesheet(styles)

Alternatively, this can be extracted directly in a val definition, like so:

val r"^[a-z0-9._%+-]+@$domain([a-z0-9.-]+\.$tld([a-z]{2,6}))$$" =
  "[email protected]": @unchecked

In the REPL, this would bind the following values:

> domain: Text = t"example.com"
> tld: Text = t"com"

In addition, the syntax of the regular expressionwill be checked at compile-time, and any issues will be reported then.

Repeated and optional capture groups

A normal, unitary capturing group will extract into a Text value. But if a capturing group has a repetition suffix, such as * or +, then the extracted type will be a List[Text]. This also applies to repetition ranges, such as {3}, {2,} or {1,9}. Note that {1} will still extract a Text value.

A capture group may be marked as optional, meaning it can appear either zero or one times. This will extract a value with the type Option[Text].

For example, see how init is extracted as a List[Text], below:

import gossamer.{drop, Rtl}

def parseList(): List[Text] = "parsley, sage, rosemary, and thyme" match
  case r"$only([a-z]+)"                      => List(only)
  case r"$first([a-z]+) and $second([a-z]+)" => List(first, second)
  case r"$init([a-z]+, )*and $last([a-z]+)"  => init.map(_.drop(2, Rtl)) :+ last

Escaping

Note that inside an extractor pattern string, whether it is single- (r"...") or triple-quoted (r"""..."""), special characters, notably \, do not need to be escaped, with the exception of $ which should be written as $$. It is still necessary, however, to follow the regular expression escaping rules, for example, an extractor matching a single opening parenthesis would be written as r"\(" or r"""\(""".

Status

Kaleidoscope is classified as maturescent. For reference, Scala One projects are categorized into one of the following five stability levels:

  • embryonic: for experimental or demonstrative purposes only, without any guarantees of longevity
  • fledgling: of proven utility, seeking contributions, but liable to significant redesigns
  • maturescent: major design decisions broady settled, seeking probatory adoption and refinement
  • dependable: production-ready, subject to controlled ongoing maintenance and enhancement; tagged as version 1.0.0 or later
  • adamantine: proven, reliable and production-ready, with no further breaking changes ever anticipated

Projects at any stability level, even embryonic projects, can still be used, as long as caution is taken to avoid a mismatch between the project's stability level and the required stability and maintainability of your own project.

Kaleidoscope is designed to be small. Its entire source code currently consists of 520 lines of code.

Building

Kaleidoscope will ultimately be built by Fury, when it is published. In the meantime, two possibilities are offered, however they are acknowledged to be fragile, inadequately tested, and unsuitable for anything more than experimentation. They are provided only for the necessity of providing some answer to the question, "how can I try Kaleidoscope?".

  1. Copy the sources into your own project

    Read the fury file in the repository root to understand Kaleidoscope's build structure, dependencies and source location; the file format should be short and quite intuitive. Copy the sources into a source directory in your own project, then repeat (recursively) for each of the dependencies.

    The sources are compiled against the latest nightly release of Scala 3. There should be no problem to compile the project together with all of its dependencies in a single compilation.

  2. Build with Wrath

    Wrath is a bootstrapping script for building Kaleidoscope and other projects in the absence of a fully-featured build tool. It is designed to read the fury file in the project directory, and produce a collection of JAR files which can be added to a classpath, by compiling the project and all of its dependencies, including the Scala compiler itself.

    Download the latest version of wrath, make it executable, and add it to your path, for example by copying it to /usr/local/bin/.

    Clone this repository inside an empty directory, so that the build can safely make clones of repositories it depends on as peers of kaleidoscope. Run wrath -F in the repository root. This will download and compile the latest version of Scala, as well as all of Kaleidoscope's dependencies.

    If the build was successful, the compiled JAR files can be found in the .wrath/dist directory.

Contributing

Contributors to Kaleidoscope are welcome and encouraged. New contributors may like to look for issues marked beginner.

We suggest that all contributors read the Contributing Guide to make the process of contributing to Kaleidoscope easier.

Please do not contact project maintainers privately with questions unless there is a good reason to keep them private. While it can be tempting to repsond to such questions, private answers cannot be shared with a wider audience, and it can result in duplication of effort.

Author

Kaleidoscope was designed and developed by Jon Pretty, and commercial support and training on all aspects of Scala 3 is available from Propensive OÜ.

Name

Kaleidoscope is named after the optical instrument which shows pretty patterns to its user, while the library also works closely with patterns.

In general, Scala One project names are always chosen with some rationale, however it is usually frivolous. Each name is chosen for more for its uniqueness and intrigue than its concision or catchiness, and there is no bias towards names with positive or "nice" meanings—since many of the libraries perform some quite unpleasant tasks.

Names should be English words, though many are obscure or archaic, and it should be noted how willingly English adopts foreign words. Names are generally of Greek or Latin origin, and have often arrived in English via a romance language.

Logo

The logo is a loose allusion to a hexagonal pattern, which could appear in a kaleidoscope.

License

Kaleidoscope is copyright © 2024 Jon Pretty & Propensive OÜ, and is made available under the Apache 2.0 License.

kaleidoscope's People

Contributors

jbaileyashe avatar odisseus avatar omakhasoeva avatar propensive avatar regadas avatar xarvalus avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kaleidoscope's Issues

Rename `rcut` to `cut`

It should be possible to pass a Regex to a method called cut instead of a Text, but the current implementation means that the extension methods which provide cut cannot be disambiguated based on their parameter types. So the temporary solution has been to name the version which takes a Regex as rcut.

It should be possible to reimplement the original cut method in Gossamer to delegate the implementation to a Separator typeclass.

Include offsets in error messages

Currently a pattern error applies to the entire pattern string, but we often know exactly where the problem lies, and we should point to it, both at compiletime and runtime.

Values bound in `match` have type Any instead of String

In val binding the val is of type String, which is great:

image

But in match vals have type Any unless you add type hint:

image

Considering that no type can be used, but String, it looks rather illogical to not always generate the vals of type String.

`make test` fails after installing

Running make test after installing according to CONTRIBUTING.md.

Compiling escritoire/core (1 Scala source)
Compiling magnolia/core (3 Scala sources)
Compiling contextual/data (10 Scala sources)
Compiled escritoire/core (1323ms)
Compiled contextual/data (2530ms)
[E] [E3] ext/magnolia/src/core/magnolia.scala:21:8
[E]      not found: object mercator
[E]      L21: import mercator._
[E]                  ^
[E] [E2] ext/magnolia/src/core/interface.scala:167:110
[E]      not found: type Monadic
[E]      L167:   def constructMonadic[Monad[_], PType](makeParam: Param[Typeclass, Type] => Monad[PType])(implicit monadic: Monadic[Monad]): Monad[Type]
[E]                                                                                                                         ^
[E] [E1] ext/magnolia/src/core/interface.scala:20:8
[E]      not found: object mercator
[E]      L20: import mercator._
[E]                  ^
[E] ext/magnolia/src/core/magnolia.scala: L21 [E3]
[E] ext/magnolia/src/core/interface.scala: L20 [E1], L167 [E2]
Compiled magnolia/core (3650ms)
[E] 'kaleidoscope/test' failed to compile.
[E] 'probation/cli' failed to compile.
[E] 'probation/core' failed to compile.
[E] 'probation/macros' failed to compile.
[E] 'magnolia/core' failed to compile.
Watching 9 directories... (press Ctrl-C to interrupt)

macOS 10.14.4, bloop 1.2.5, scala 2.12.6

I will try to resolve it on my behalf

Reinstate immutable arrays in place of mutable arrays

Before capture checking was fully working, opaque types like IArray were broken. So the code was modified to use Array instead in a couple of places. This should be reverted back to IArray once it can be confirmed that it's working.

Fails to parse JSL character classes

Kaleidoscope's documentation states that the syntax is exactly as described in the JSL with the obvious exception.

The following sample works:

import kaleidoscope._

"Janvier 2018" match {
  case r"^${mois:String}@([a-zA-Z]+) +${année}@([0-9]{4})$$" =>
    (mois, année)
}
    // ("Janvier", "2018")

the following don't:

"Janvier 2018" match {
  case r"^${mois:String}@([a-zA-Z]+)\\s+${année}@([0-9]{4})$$" =>
    (mois, année)
}
    // MatchError
"Janvier 2018" match {
  case r"^${mois:String}@(\\S+) +${année}@([0-9]{4})$$" =>
    (mois, année)
}
    // MatchError

Scala 2.13 release

Would you consider putting a 2.13-compatible release on Maven Central?

Support date and time literals

Date and time literals along the lines of,

val date = d"14-Mar-2018"
val time = t"12:15"
val time2 = t"12:15:06"

would be very useful.

Support pattern matching of byte arrays

It would be good to be able to match a byte array with,

array match { case hex"1b097392ac3b" => true }

As arrays are mutable, it may also be useful to provide a simple, immutable, value-class wrapper around byte arrays.

Support for regex flags

The macro seems to misinterpret flags at the beginning of a regex when capturing results. When regexTest1 matches below, you get an IndexOutOfBoundException. When regexTest1 does not match, no error occurs. When regexTest2 matches, no error occurs.

I'm using kaleidescope 0.1.0. LOVE this library - if possible could you please release an update to Maven Central?

import kaleidoscope._
object RegexTest {
  def regexTest1(s: String) = s match {
    case r"(?i)$c@(foo)" => Option(c: String)
    case _ => None
  }

  def regexTest2(s: String) = s match {
    case r"$c@(foo)" => Option(c: String)
    case _ => None
  }
}

scala> RegexTest.regexTest1("foo")
java.lang.IndexOutOfBoundsException: No group 2

scala> RegexTest.regexTest1("bar")
res3: Option[String] = None

scala> RegexTest.regexTest2("foo")
res2: Option[String] = Some(foo)

provide a "glob" extractor

The regex pattern extractor (r"...") is useful for the most general kind of string pattern, but there are a few cases where we would like to match on parts of slashed paths, e.g. "/foo/bar/baz", and it seems worthwhile to special-case this.

We would want to be able to match on a path which looks something like this:

path match {
  case g"foo/$e/bar/*/baz/**/quux" => e: String
}

This could effectively be converted into a regular expression as follows:

  • every character should be taken as a literal, except
    • ** which should match .* (i.e. any number of path elements)
    • * which should match [^/]+(i.e. a nonempty single path element)
  • variables should equate to a capturing group which matches [^/]*

Hence, the example above would be equivalent to the Kaleidoscope matcher,

path match {
  case r"foo\/$e@([^/]*)\/bar\/[^/]*\/baz\/.*/quux" => e: String
}

I think the easiest way to implement this would be to copy the code from the r matcher, and adapt it by preprocessing it into the regex form used by the r matcher. But there may be neater ways. The macro code which produces the extractor is quite hairy, so it's going to be easiest to avoid changing anything there, and just reusing what's already there.

doesn't work in ammonite

Not sure if something can be made here except renaming, but reporting anyway

When I tried it in ammonite repl, it failed

Welcome to the Ammonite Repl 1.0.5
(Scala 2.12.4 Java 1.8.0_161)
If you like Ammonite, please support our development at www.patreon.com/lihaoyi
> import $ivy.`com.propensive::kaleidoscope:0.1.0` 
import $ivy.$                                   

> import kaleidoscope._ 
import kaleidoscope._

> "o" match { case r"oops" =>  } 
cmd2.sc:1: type mismatch;
 found   : StringContext
 required: ?{def r: ?}
Note that implicit conversions are not applicable because they are ambiguous:
 both method RegexContextMaker in trait Extensions of type (s: StringContext)ammonite.ops.RegexContext
 and method RegexStringContext in package kaleidoscope of type (sc: StringContext)kaleidoscope.package.RegexStringContext
 are possible conversion functions from StringContext to ?{def r: ?}
val res2 = "o" match { case r"oops" =>  }
                            ^
Compilation Failed

>  

cc @lihaoyi

Support network-related literals

Support for compile-time checked network-related literals, such as

  • hostnames
  • IPv4 addresses
  • IPv6 addresses
  • subnet masks

For example,

val addr = ipv4"192.168.1.1"

would be useful.

These literals should support construction and pattern matching. A more advanced usage could support partial extraction, for example,

addr match {
  case ipv4"192.168.1.$a" => a // a is an Int
  case ipv4"10.$a.$b.$c" => (a, b, c)
}

Support repeated capture groups

Despite what I originally came to believe, accessing every element of a repeated capture group is theoretically possible, at least if some reasonable constraints are imposed on where they can occur: specifically, that bound capture groups (of any repetition) should be forbidden inside non-unitary capture groups.

With this constraint in place (which can be enforced by Kaleidoscope at compiletime), it should be possible to construct lists of matches by repeatedly attempting to pattern match the same regular expression on different input strings, each time deleting the characters from the previous match, and checking that the regions are contiguous. The matches will be found in the reverse order, and if more than one repeated group appears, the groups should be extracted in reverse order.

This should be implemented firstly in a non-pattern-matching context so that it can be tested easily.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.