Coder Social home page Coder Social logo

yoav-lavi / melody Goto Github PK

View Code? Open in Web Editor NEW
4.6K 17.0 54.0 3.82 MB

Melody is a language that compiles to regular expressions and aims to be more readable and maintainable

Home Page: https://yoav-lavi.github.io/melody/book/

License: Apache License 2.0

Rust 100.00%
regular-expression language regex compiler melody melodylang regexp rust

melody's People

Contributors

aclueless avatar addisoncrump avatar alpheratz0 avatar amirali avatar ilai-deutel avatar joshuakb2 avatar jyooru avatar legoandmars avatar omikorin avatar therealprohacker avatar tigermouthbear avatar yoav-lavi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

melody's Issues

Support for high-level predefined patterns like <date>

Imagine that melody had predefined patterns like

  • <date> matching a valid date,
  • <datetime> matching a valid date and time,
  • <float> matches a floating-point number, supporting scientific notation,
  • <unixtime> matches the UNIX timestamp,
  • <url> matches a valid URL,
  • <ip> matches an IP address,
    etc.

In such a case, it would offer not only a clear, easy-to-read, and maintain syntax, but also it would offer to remove a lot of boilerplate code from common expressions.

The hard part is that while for some of those (<unixtime> or <float>), the regexes would be simple, for others, they would not and there are whole StackOverflow threads on which regex is the more correct to match them, so the patterns would obviously be opinionated.

Looking at the code, adding them seems rather straightforward. Alternatively, such patterns could be provided as external libraries, but there is no such functionality.

Comparison with Pomsky

Hello Yoav,

(This is the continuation of this comment.)

I'm the maintainer of Pomksy. I wrote a page that compares Pomsky with several other tools and languages, including Melody. You can find it here.

Since I'm not all that familiar with Melody, I'd appreciate it if you could check that all the information about melody is correct, or if any part needs clarification. I put a lot of work into accumulating this data, but I want to be sure that I don't misrepresent your project, since I'm obviously biased.

Thank you in advance! And if you have any questions about Pomsky, feel free to ask!

Please make this a Crate for third party libraries

This project seems really interesting and really nice to use in things that are web related like a nodejs server.
I would love to make some small bindings to this for node and deno through their ffi systems but sadly enough this is not a crate.

Would there be any way that this could also be a crate?

Quantifiers add unnecessary non-capturing group

Describe the bug
When using some of ... with symbols, a non-capturing group is added unconditionally. This is unnecessary for most individual symbols (e.g. <word>).

To Reproduce
Steps to reproduce the behavior:

  1. Open the Melody Playground
  2. Write a program which only uses some of ... with a single symbol, e.g.:
    some of <word>;
    
  3. Review the output

Expected behavior
RegEx output should only add wrap things with non-capturing groups if necessary or if implicitly required (via match {}).

Examples

any of <word>;
// Outputs /(?:\w)*/

any of "a";
// Outputs /a*/

some of <word>;
// Outputs /(?:\w)+/

some of "a";
// Outputs /a+/

option of <word>;
// Outputs /(?:\w)?/

option of "a";
// Outputs /a?/

Some syntax ideas

Just some syntax ideas you may find useful.

Melody Regex Status
maybe a little of or lazily *? โ”
rematch ๐ท \๐ท โ”
rematch ๐‘›๐‘Ž๐‘š๐‘’ \k<๐‘›๐‘Ž๐‘š๐‘’> โ”
unicode class ... \p{...} โ”
unicode except clas ... \P{...} โ”
U+๐‘‹๐‘‹๐‘‹๐‘‹ \u๐‘‹๐‘‹๐‘‹๐‘‹ โ”
X+๐‘‹๐‘‹ \x๐‘‹๐‘‹ โ”
o๐ท๐ท๐ท (e.g. o700) \๐ท๐ท๐ท โ”
^๐‘Œ \c๐‘Œ โ”
word boundary \b โ”
word non boundary \B โ”
rematch 1 $1 โ”
insert { until match } $` โ”
insert { full match } $& โ”
Could not find this notation anywhere. Typo? x20 โ”
Could not find this notation anywhere. Typo? x{06fa} โ”

Also, allow multiple regexps per file and let them refer to previous ones, e.g.:

regexp iso-date {
  4 of digit;
  "-";
  2 of digit;
  "-";
  2 of digit;
}

regexp iso-range {
  match iso-date;
  "/";
  match iso-date;
}

Why <space>?

I don't understand why there's <space> instead of " "?

Invalid repetition range should give an error

Describe the bug
The following melody code should not compile, but it does and generates an invalid regex because the repetition range cannot be from 2 to 0.

2 to 0 of match { "test"; }

crates/melody_compiler/src/ast/source_to_ast.rs

Rule::quantifier_range => {
    let (start, end) = first_last_inner_str(kind)?;

    // maybe check here that end >= start?

    Node::Quantifier(Quantifier {
        kind: QuantifierKind::Range {
            start: start.to_owned(),
            end: end.to_owned(),
        },
        lazy,
        expression: Box::new(expression),
    })
}

To Reproduce

$ echo '2 to 0 of match { "test"; }' | melody /dev/stdin
> (?:test){2,0}

Expected behavior
An error.

Consider dedicated syntax for unit tests

Imagine something like this:

some of <word>;
<space>;
capture {
  1 to 9;
  2 of <digit>;
}

should match "Econ 101" capturing "101";
should not match "305";
should not match "Physics 022"

If you have the ability to embed tiny unit tests in your regex declaration, then this could substantially help both to catch regressions and to document the intent behind the regex. The unit tests would be run at compile time, raising something akin to a syntax error if they fail (with clear output as to why they failed, to make it easier to fix).

`not <char>` emits `[^.]`

Describe the bug
not <char> produces the incorrect output, since the dot does not match an arbitrary character within character set.

To Reproduce
Open this playground.

Expected behavior
An error message.

Consider parser generator

Might be beneficial to implement the more robust AST parsing with a parser generator like Pest.

Possible grammar draft:

single_digit =  _{ '0'..'9' }

digit = { '0'..'9' }

amount = { '0'..'9' }

number = { digit+ } 

content = @{ char* }

char = _{
    !("\"" | "\\") ~ ANY
    | "\\" ~ ("\"" | "\\" | "/" | "b" | "f" | "n" | "r" | "t")
    | "\\" ~ ("u" ~ ASCII_HEX_DIGIT{4})
}

char_keyword = {
  "char"
}

keyword = { (char_keyword) }

string = ${ ("\"" ~ content ~ "\"" | "'" ~ content ~ "'")  }

atom = _{ (string | range | keyword) ~ ";" }

expression = {
  (atom | capture_group | match_group)
}

range = {
  number ~ " " ~ any_spaces ~ "to" ~ " " ~  any_spaces ~ number
}

start = {single_digit}
end = {single_digit}

quantifier_range = {
  start ~ " " ~ any_spaces ~ "to" ~ " " ~  any_spaces ~ end
}

quantifier_expression = {
 (quantifier_range | amount) ~ " of " ~ expression
}

any_newlines = _{
 ("\n"*)?
}

any_spaces = _{
 (" "*)?
}

block = { "{" ~ any_spaces ~ any_newlines ~ any_spaces ~ (expression ~ any_spaces ~ any_newlines ~ any_spaces)+ ~ any_spaces ~ any_newlines ~ any_spaces ~ "}"}

name = { ASCII_ALPHA+ }

capture_group = { "capture" ~ (" " ~ any_spaces ~ name)? ~ any_spaces ~ block }

match_group = { "match" ~ (" " ~ any_spaces ~ name)? ~ any_spaces ~ block }

Feature: Web App for compiling and trying melody

It would be great to have a web app where users can try it out without installing it on their local system.

I think we can compile this to WASM to do that, or if you any plans let me know.

Also if you can create the wasm file I may be able to put a simple web app

Windows support?

Describe the solution you'd like
Any planned support for Windows in the future?
Windows Subsystem for Linux exists, yes. But is there a planned support for native Windows?

Allow printing regex without wrapping `/` and trailing newline

Describe the solution you'd like

Allow generating regular expressions without leading and trailing / as well as trailing newline.

Additional context

This would make it easier to use in various configuration file templating solutions, such as m4. I'm generating a configuration file for ArchibeBox, and I need to setup a URL_BLACKLIST variable, which is a regex literal. Naturally, I would prefer to write

"http";
option of "s";
"://";
`(.+)?`;
either {
    "amazon.com";
    "youtube.com";
}
`.*`;

over r'https?://(.+)?(?:amazon\.com|youtube\.com).*', but in order to interpolate melody regex results, I had to use it in conjunction with sed and tr.

URL_BLACKLIST = r'syscmd(melody link.melody | sed 's!^/!! ; s!/$!!' | tr -d '\n')'

Which is not terribly difficult to add and maintain, still worse than

URL_BLACKLIST = r'syscmd(melody --standalone-regex link.melody)'

Incorrectly escaping `-` character

Describe the bug
According to the regress crate, Melody incorrectly escapes the - character.
I am not very familiar with regex, so I am not sure if this is a bug with melody or regress.

To Reproduce
src/main.rs

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let regex = r#"
    "-";
    "#;
    _ = regress::Regex::new(&melody_compiler::compiler(regex)?)?;
    Ok(())
}

Cargo.toml

[package]
name = "re"
version = "0.1.0"
edition = "2021"

[dependencies]
melody_compiler = "0.18.1"
regress = "0.4.1"
cargo run
   Compiling re v0.1.0 (re)
    Finished dev [unoptimized + debuginfo] target(s) in 0.25s
     Running `target/debug/re`
Error: Error { text: "Invalid character escape" }

Expected behavior
Expected the code to compile the regex without any errors.

Desktop

  • OS: macOS
  • Version: 12.4

Consider renaming `<word>` and `<space>`

These have a bit of unintuitive naming (which matches the regex equivalent) since \w also matches numbers and underscores and \s matches any kind of whitespace.

Also Consider adding symbols that match [a-zA-Z] and only a space character

Examples

Hey,
really like this project!

What do you think about writing some examples? Maybe a well-written example (with comments) that'll match email adresses, to really show the benefit of using this.

Support for variable interpolation inside template strings

Copied from this Reddit conversation

I'd love to see support for interpolation of template literals in the compiled RegEx from the Babel plugin. For example, I can almost accomplish this with the Babel compiler using Melody's raw method:

// Original
new RegExp(/*melody*/ `
  "foo"; 
  \`\${bar}\`;
  "baz";
`)

// Compiled
new RegExp("foo${bar}baz")

It seems that this could be fixed just by wrapping the string output in backticks instead of quotes. I originally assumed this would have unintended consequences, but since $, {, and } are all special RegEx characters, they're automatically escaped in string literals. This protects us from misinterpreted literals:

// Original
new RegExp(/*melody*/ `
  "foo";
  "${bar}";
  "baz";
`)

// Compiled
new RegExp("foo\$\{bar\}baz")

I did come up with this while sleepy, sp=o it's entirely possible that I may be missing something. ๐Ÿค”

Unable to match a single `\` character

Describe the bug
Unable to write melody that generates correct regex that will match a single \.

To Reproduce
test.mldy

<start>;
"\";
<end>;
melody test.mldy
Error:  --> 2:1
  |
2 | "\";
  | ^---
  |
  = expected EOI, literal, raw, not, range, quantifier_quantity, group_declaration, assertion_declaration, variable_declaration, or variable_invocation

Expected behavior
Expected melody to properly escape \.

Desktop (please complete the following information):

  • OS: macOS
  • Version: 12.4

Additional context
I tried some other methods to generate regex that matches \, but none of them worked.

"\\";

compiles fine but escapes both slashes, generating a regex that matches \\ instead of \.
I also tried using the raw syntax (none of them worked):

`\`;
`\\`;
`[\\]`;

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.