Coder Social home page Coder Social logo

lazy-regex's Introduction

MIT Latest Version docs Chat on Miaou

lazy-regex

With lazy-regex macros, regular expressions

  • are checked at compile time, with clear error messages
  • are wrapped in once_cell lazy static initializers so that they're compiled only once
  • can hold flags as suffix: let case_insensitive_regex = regex!("ab*"i);
  • are defined in a less verbose way

The regex! macro returns references to normal instances of regex::Regex or regex::bytes::Regex so all the usual features are available.

Other macros are specialized for testing a match, replacing with concise closures, or capturing groups as substrings in some common situations:

  • regex_is_match!
  • regex_find!
  • regex_captures!
  • regex_replace!
  • regex_replace_all!

All of them support the B flag for the regex::bytes::Regex variant.

Some structs of the regex crate are reexported to ease dependency managment. The regex crate itself is also reexported, to avoid the need to synchronize the versions/flavor (see Features below)

Build Regexes

use lazy_regex::regex;

// build a simple regex
let r = regex!("sa+$");
assert_eq!(r.is_match("Saa"), false);

// build a regex with flag(s)
let r = regex!("sa+$"i);
assert_eq!(r.is_match("Saa"), true);

// you can use a raw literal
let r = regex!(r#"^"+$"#);
assert_eq!(r.is_match("\"\""), true);

// or a raw literal with flag(s)
let r = regex!(r#"^\s*("[a-t]*"\s*)+$"#i);
assert_eq!(r.is_match(r#" "Aristote" "Platon" "#), true);

// build a regex that operates on &[u8]
let r = regex!("(byte)?string$"B);
assert_eq!(r.is_match(b"bytestring"), true);

// there's no problem using the multiline definition syntax
let r = regex!(r#"(?x)
    (?P<name>\w+)
    -
    (?P<version>[0-9.]+)
"#);
assert_eq!(r.find("This is lazy_regex-2.2!").unwrap().as_str(), "lazy_regex-2.2");
// (look at the regex_captures! macro to easily extract the groups)
// this line doesn't compile because the regex is invalid:
let r = regex!("(unclosed");

Supported regex flags: i, m, s, x, U.

See regex::RegexBuilder.

Test a match

use lazy_regex::regex_is_match;

let b = regex_is_match!("[ab]+", "car");
assert_eq!(b, true);

Extract a value

use lazy_regex::regex_find;

let f_word = regex_find!(r#"\bf\w+\b"#, "The fox jumps.");
assert_eq!(f_word, Some("fox"));
let f_word = regex_find!(r#"\bf\w+\b"#B, b"The forest is silent.");
assert_eq!(f_word, Some(b"forest" as &[u8]));

Capture

use lazy_regex::regex_captures;

let (_, letter) = regex_captures!("([a-z])[0-9]+"i, "form A42").unwrap();
assert_eq!(letter, "A");

let (whole, name, version) = regex_captures!(
    r#"(\w+)-([0-9.]+)"#, // a literal regex
    "This is lazy_regex-2.0!", // any expression
).unwrap();
assert_eq!(whole, "lazy_regex-2.0");
assert_eq!(name, "lazy_regex");
assert_eq!(version, "2.0");

There's no limit to the size of the tuple. It's checked at compile time to ensure you have the right number of capturing groups.

You receive "" for optional groups with no value.

Replace with captured groups

The [regex_replace!] and [regex_replace_all!] macros bring once compilation and compilation time checks to the replace and replace_all functions.

Replacing with a closure

use lazy_regex::regex_replace_all;

let text = "Foo8 fuu3";
let text = regex_replace_all!(
    r#"\bf(\w+)(\d)"#i,
    text,
    |_, name, digit| format!("F<{}>{}", name, digit),
);
assert_eq!(text, "F<oo>8 F<uu>3");

The number of arguments given to the closure is checked at compilation time to match the number of groups in the regular expression.

If it doesn't match you get, at compilation time, a clear error message.

Replacing with another kind of Replacer

use lazy_regex::regex_replace_all;
let text = "UwU";
let output = regex_replace_all!("U", text, "O");
assert_eq!(&output, "OwO");

Shared lazy static

When a regular expression is used in several functions, you sometimes don't want to repeat it but have a shared static instance.

The regex! macro, while being backed by a lazy static regex, returns a reference.

If you want to have a shared lazy static regex, use the lazy_regex! macro:

use lazy_regex::*;

pub static GLOBAL_REX: Lazy<Regex> = lazy_regex!("^ab+$"i);

Like for the other macros, the regex is static, checked at compile time, and lazily built at first use.

Features and reexport

With default features, lazy-regex use the regex crate with its default features, tailored for performances and complete Unicode support.

You may enable a different set of regex features by directly enabling them when importing lazy-regex.

It's also possible to use the regex-lite crate instead of the regex crate by declaring the lite feature:

lazy-regex = { version = "3.0", default-features = false, features = ["lite"] }

The lite flavor comes with slightly lower performances and a reduced Unicode support (see crate documentation) but also a much smaller binary size.

If you need to refer to the regex crate in your code, prefer to use the reexport (i.e. use lazy_regex::regex;) so that you don't have a version or flavor conflict. When the lite feature is enabled, lazy_regex::regex refers to regex_lite so you don't have to change your code when switching regex engine.

lazy-regex's People

Contributors

alephalpha avatar alexanderkjall avatar canop avatar enet4 avatar fsmaxb avatar jamesmunns avatar jplatte avatar msrd0 avatar nc7s avatar necauqua avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

lazy-regex's Issues

Feature Request: Add regex_replace_all! macro

I have code that looks similar to this:

static REGEX: Lazy<Regex> = Lazy::new(|| Regex::new("f(?P<suffix>o+)"));
let text = "foo fuu";
let text = REGEX.replace_all(text, |captures: &Captures<'_>| format!("<{}>", &captures["suffix"]));
assert_eq!(text, "f<oo> f<uu>");

Similar to your regex_captures! macro, I'd prefer to write something like this instead:

let text = regex_replace_all!("f(o+)", "foo fuu", |(_, suffix)| format!("<{}>", suffix));
assert_eq!(text, "f<oo> f<uu>");

Request: Support /.../i syntax

Raw strings are great for avoiding escaping issues, however they add quite a bit of noise to the syntax. I was wondering if you would consider supporting the familiar /pattern/flags syntax that languages such as sed and javascript use, such that

regex!(/^\s*("[a-t]*"\s*)+$/i);

would be equivalent to:

regex!(r#"^\s*("[a-t]*"\s*)+$"#i);

(except that /'s would need to be escaped in the /pattern/flags syntax)

[Enhancement] Constant support

Currently, lazy-regex supports string literals only.

Is there any way that constants can be (maybe partially) supported?

As I exploit lazy-regex, my regex's grew more insane and I use const_format to modularize them, but that also means I cannot use lazy-regex any more.

`raw string literal` with a suffix is invalid

Hi, Im trying to use this crate for the first time, I have this Regex: server=(?'url'\\S+?)\\& which is meant to extract a part of a URL.

this regex is valid obviously and works fine but when I tried to use it with lazy_regex! or regex! I got this error:

raw string literal with a suffix is invalid

static REG: Lazy<Regex> = lazy_regex!(r#"server=(?'url'\\S+?)\\&"#im);

and it also says:

regex parse error: server=(?'url'\S+?)\& 
                   ^ error: unrecognized flag

(its pointing to the 's' in server )
which doesn't make any sense

Broken semver

It seems that required Rust version has been changed without bumping the MAJOR version. It may break compilation of crates depending on lazy-regex when the required Rust version is not available. In my case cargo-make no longer compiles.

regex_replace and regex_replace_all should expect any expression, not just a closure

Methods Regex::replace and Regex::replace_all accept a Replacer, which allows passing both closures as well as just strings / string slices in simple cases.

However, lazy_regex::replace and replace_all expect an ExprClosure, which makes it impossible to use them like this:

let text = "UwU";
let output = replace_all!("U", text, "O");
assert_eq(&output, "OwO");

Macros should lift this restriction and expect just any Expr.

no_std support

Hey. Thanks for this crate ๐Ÿ™

It'd be useful if we had a no_std feature for lazy-regex, as regex already supports via no-default-features. This should be doable just by refactoring the feature list.

Currently, lazy-regex, with no-default-features, doesn't compile as it doesn't pull regex as it is an optional dependency.

$ cd lazy-regex
$ cargo build --no-default-features
...
error[E0432]: unresolved import `regex`
   --> src/lib.rs:184:9
    |
184 |         self,
    |         ^^^^ no external crate `regex`
...

Request: add support for `regex::bytes::Regex`

The doc of regex::bytes says:

This module provides a nearly identical API to the one found in the top-level of this crate. There are two important differences:

  1. Matching is done on &[u8] instead of &str. Additionally, Vec is used where String would have been used.
  2. Unicode support can be disabled even when disabling it would result in matching invalid UTF-8 bytes.

So hopefully the change would be as easy as changing from pub regex: regex::Regex to pub regex: regex::bytes::Regex.
If there are no more to consider, I'll happily make a PR.

Cannot compile for wasm32-unknown-unknown target

thread 'rustc' panicked at 'Failed to get crate data for crate15', compiler/rustc_metadata/src/creader.rs:136:32
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

error: internal compiler error: unexpected panic

note: the compiler unexpectedly panicked. this is a bug.

note: we would appreciate a bug report: https://github.com/rust-lang/rust/issues/new?labels=C-bug%2C+I-ICE%2C+T-compiler&template=ice.md

note: rustc 1.52.1 (9bc8c42bb 2021-05-09) running on x86_64-unknown-linux-gnu

note: compiler flags: -C embed-bitcode=no -C debuginfo=2 -C incremental --crate-type lib

note: some of the compiler flags provided by cargo are hidden

query stack during panic:
end of query stack
error: could not compile `lazy-regex`

feature "std" is always required

I have packaged lazy-regex for Debian, and as part of CI routines set it up to build with each feature separately - but all of those except "std" fail like this:

error: `std` feature is currently required to build this crate
   --> /tmp/tmp.mLGTluLWVL/registry/regex-1.6.0/src/lib.rs:613:1
    |
613 | compile_error!("`std` feature is currently required to build this crate");
    | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

error[E0432]: unresolved import `crate::Error`
  --> /tmp/tmp.mLGTluLWVL/registry/regex-1.6.0/src/compile.rs:16:5
   |
16 | use crate::Error;
   |     ^^^^^^^-----
   |     |      |
   |     |      help: a similar name exists in the module: `error`
   |     no `Error` in the root

For more information about this error, try `rustc --explain E0432`.
error: could not compile `regex` due to 2 previous errors

Patching Cargo.toml to enable feature "std" for all other features succeeds, but I suspect that to be a workaround, and the real fix is to somehow allow code to build without "std" feature.

Use `OnceCell` instead of `Lazy` to enable inlining

Lazy and OnceCell types are basically the same in their semantics except for one thing.
OnceCell is constructed with an empty constructor (OnceCell::new()), and the closure that initializes the OnceCell is passed at the call site via cell.get_or_init(/* closure */). This allows for get_or_init() call to be generic over the accepted closure type, and thus rustc is able to better optimize it by inlining the closure.

However, Lazy type is designed to be more friendly at the definition site where we specify the closure that initializes the value right at the construction of the Lazy type via Lazy::new(/* closure */). The problem here is that the closure's type gets type erased here. The closure is coerced to a function pointer. If you take a look at the definition of the Lazy type, you will see that it is defined as

pub struct Lazy<T, F = fn() -> T> { /* */ } 

See that the F generic parameter defaults to a function pointer. That's why, due to this coercion to a function pointer, rustc has less type information and thus is more likely not to inline the closure invocation.

I suggest switching the type that regex!() returns from Lazy<T> to once_cell::sync::OnceCell<T>.
Unfortunately, it is a breaking change for the users that depend on the type of the value returned from regex!() macro to be &Lazy<Regex>, so it means a major update if this is implemented. I wish that the regex!() macro returned &'static Regex (via Lazy::deref impl) from the start, otherwise it would be a patch update.

`mut` closure support (`cannot borrow \`fun\` as mutable, as it is not declared as mutable`)

I'm trying to write errors to a vec in the replacer function:

let mut errors = Vec::new();
let mut content = lazy_regex::regex_replace_all!(
    "(?:from \"(\\..*)\"|import \"(\\..*)\")",
    &content,
    |whole: &str, from_request: &str, import_request: &str| {
        match replacer(from_request, import_request) {
            Ok(replacement) => replacement,
            Err(e) => {
                errors.push(e);
                whole.to_string()
            }
        }
    },
)
.to_string();

but this unfortunately leads to an error:

error[E0596]: cannot borrow `fun` as mutable, as it is not declared as mutable
{...}
404 | |                     errors.push(e);
    | |                     ------ calling `fun` requires mutable binding due to mutable borrow of `errors`

is there any way to allow mut closures inside the macros?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.