The cel-rust's discuss from clarkmcc

Interpreter panics on invalid comparisons, operations, missing keys, etc

Provide a mechanism for limiting the number of expressions

Providing massive expressions is a good way to bring down a server. It's pretty trivial to, for example, write a little script that creates an "or" chain 100,000 Booleans long. Because of this, it'd be good to provide support for optionally limiting the number of expressions evaluated by a Program.

Unresolved method/function calls panic

Could be a regression because of #59 - though looks like the unwrap() on the lookup was there before.
Anyways, what led me to start working on #64 is hitting that very problem invoking bytes():

panicked at interpreter/src/objects.rs:526:58:
called `Option::unwrap()` on a `None` value

Unsure what error should be reported, but panicking isn't certainly not the right thing. I'll look into it

Constructing rust types in cel

From the cle doc it seems like it's possible to construct strongly typed types inside cel and pass them back. Does cel-rust support this? Couldn't find documentation about this.

Example:

use cel_interpreter::{Context, Program};
use serde::Serialize;

// An example struct that derives Serialize
#[derive(Serialize)]
struct MyStruct {
    a: i32,
    b: i32,
}

fn main() {
    let program = Program::compile("MyStruct { a: 0, b: 0}").unwrap();
    let context = Context::default();

    let value = program.execute(&context).unwrap();
    assert_eq!(value, true.into());
    println!("{value:?}");
}

Make functions module public

At the moment, the functions module is only available within the crate. I think it would be useful to make those public so that people can opt into/out of specific functions in this crate when initializing their Context (for example if I only want contains, has, and exists in my context).

I'm happy to PR this.

Dynamic value support

Would be nice if you could dynamically values in case your data set is too large to provide all upfront via context variables.

I'm think of something like:

// You can resolve dynamic values by providing a custom resolver function.
let external_values = Arc::new(HashMap::from([("hello".to_string(), "world".to_string())]));
let resolver = Some(move |ident: Arc<String>| {
    let name: String = ident.to_string();
    match external_values.get(name.as_str()) {
        Some(v) => Ok(Value::String(v.clone().into())),
        None => Err(ExecutionError::UndeclaredReference(name.into())),
    }
});

let mut ctx = Context::default();
ctx.set_dynamic_resolver(resolver);
assert_eq!(test_script("hello == 'world'", Some(ctx)), Ok(true.into()));

I think the above should be doable without too many changes to the current apis and allow you to dynamically resolve simple variables.

But I don't think that enough, it would be nice if you could also write expression like this:

assert_eq!(test_script("github.orgs['clarkmcc'].projects['cel-rust'].license == 'MIT'", Some(ctx)), Ok(true.into()));

For that to work I think the resolver would need to return a Value that is treated like a Map but whose members are dynamic but it's data is backed by a user defined data type. I think I'm saying the above may need Value to carry a generic variant for custom data types. Thoughts?

Support non wasm-bindgen targets…

Currently the dependency to chrono enables all default features… which includes wasmbind. As far as I can tell tho, only alloc is really required. When you target a wasm runtime that's not the browser, chances are high some bindings won't be available and mostly not needed (i.e. no Javascript).

"Simple fix" is to chrono = { version= "0.4.26", default-features = false, features = ["alloc"] } only depends on what's needed. But I could also see a wasm/wasi feature here that'd do the appropriate… wdyt? What sounds better limit to what's needed? Or introduce a "wasm profile", have possibly wasmbind as a default feature, but then now becomes "disablable" by users (which would map to what chrono does)? I can create the PR, no worries!

Ability to check which functions/variables a script references

It would be nice if the parser took note of which functions and variables a given script actually made use of and made that information available to the embedding program.

My primary motivation for wanting this is so that I can cache some variables to the scripts that reference them, and only run scripts when certain variables change. Currently, there is no way to tell which scripts actually reference specific functions/variables, so there's no way to make any kind of mapping between references and which scripts reference them. Besides this, it could be used for general sandboxing notifications, like if certain functions/variables were only valid in certain contexts, the user of this library could print a coherent diagnostic about the offending reference, instead of reporting that a reference simply doesn't exist. And, even besides that, it would just be nice for debugging and diagnostics.

Indexed-based map access is not supported

fn main() {
    let program = Program::compile("headers[\"Content-Type\"].contains(\"application/json\")").unwrap();
    let mut context = Context::default();
    let mut headers = HashMap::new();
    headers.insert("Content-Type","application/json".to_string());
    println!("{}",headers["Content-Type"]);
    context.add_variable("headers", headers);
    let value = program.execute(&context).unwrap();
    assert_eq!(value, true.into());
}

headers["Content-Type"] 这样会报错,thread 'main' panicked at 'not implemented'
该如何实现呢,我知道headers.status这种写法可以,但是需要修改表达式

Support adding T: Serialize variables to context

We should be able to add any type that implements Serialize as a variable to the context

Context.clone() method used by macros is private to the crate

The official cel-spec has some additional macros / functions that might be useful for some of our use cases like: exists or exists_one.

Unfortunately I am not able to implement them by myself because: Context.clone() is pub(crate) and not pub:

cel-rust/interpreter/src/context.rs

Line 115 in f4fa854

pub(crate) fn clone(&self) -> Context {

Am I missing something or wouldn't it be useful to provide the clone functionality and context shadowing also for custom extension functions?

Zero-allocation* redesign

This issue is meant to be a scratchpad for ideas and feedback on improving how types and values are handled in the interpreter. There have been several different feature requests recently that could be solved by the a fundamental shift in how CEL values are managed.

#58 - Proposes a way to avoid having to serialize Rust types into CEL Value when only some fields in those types are actually referenced.
#73 - Requests a way to create actual Rust types in CEL expressions presumably without the intermediate ser/de using a specific format like JSON.
#68 (comment) - Requests a way to deserialize the result of a CEL expression to a serde_json::Value.

Today any value referenced in a CEL expression is owned by either the Program or the Context. The Program owns values that were created directly in the CEL expression, for example in the expression ("foo" == "bar") == false, there are three values owned by the Program, "foo", "bar", and false. Values that are owned by the Context are values that are dynamically provided at execution time to a Program as variables, like foo == bar, both foo and bar are values owned by the Context.

I like the idea of fundamentally changing Context so that it does not own any data, meaning that you do not need to clone your types to use them in CEL. Instead I'd like the Context to have references to that data.
In the case of deeply nested structs, we would provide a derive macro to generate field accessors that the interpreter would call when doing variable resolution. When referencing a property on a struct, CEL would operate on a reference to that data.

Questions:

Would we even need Arc/Rc's anymore or could we get away with just this since we would assume the caller owned all the data. RIght now, an Arc is required for values owned by Program because a Program can be executed one or more times simultaneously.
```
pub enum Value<'a> {
    String(&'a str)
}
```
We can easily get references to, and represent primitive types in the interpreter, but what if an expression returned a property whose type was some other user-defined struct? How would we work with that? Perhaps instead of a Value enum, we have a Value trait that exposes every behavior supported in a CEL expression, i.e.:
```
pub trait Value {
    fn get_property(&self, key: &str) -> Box<dyn Value>;

    fn add(&self, other: &dyn Value) -> Box<dyn Value>;

    fn sub(&self, other: &dyn Value) -> Box<dyn Value>;
}
```

Serialize Value to JSON String

I’m fairly new to Rust and am using this project. I’d like to convert the executed Value into a String so I can serialize it to JSON.

Does Value need to implement From for this to work?

The type of serialized unsigned integer data does not match the default type of numbers in the expression.

#[derive(Serialize)]
struct MidType<'a> {
    body: &'a [u8],
    raw: &'a [u8]
}

fn main() {
    let program = Program::compile("foo.body.contains(1)").unwrap();
    let mut context = Context::default();
    context.add_variable("foo", MidType {
        body: &[1,2,3],
        raw: &[]
    }).unwrap();
    let v = program.execute(&context).unwrap();
    println!("{:?}",v);
}

the body will be serialized as List[UInt], but the number 1 is Int type. output: Bool(false)

Can't reference a variable named like a function

This simple expression, taken straight from the spec, fails: size(requests) > size

Reproducible test case:

let program = Program::compile("size(requests) > size").unwrap();
let mut context = Context::default();
let requests = vec![Value::Int(42), Value::Int(42), Value::Int(42)];
context.add_variable("requests", Value::List(Arc::new(requests))).unwrap();
context.add_variable("size", Value::Int(42)).unwrap();
program.execute(&context) // Err` value: ValuesNotComparable(Int(3), Function("size", None))

While as per the doc:

the first size is a function, and the second is a variable.

Here both size result in the function. And the variable gets overshadowed by the function always.

Wrong evaluation result

I've been playing around with this CEL implementation and I noticed one odd thing with the following expressions:

b && (c == "string")

b && c == "string"

c == "string" && b

Given this context

{"b": True, "c": "string"}

they should all evaluate to true, but this is not what's happening:

True <= b && (c == "string")
False <= b && c == "string"
True <= c == "string" && b

Here's a simple reproducer:

use cel_interpreter::{Context,Program, Value};

fn main() {
    let expressions = [
        "b && (c == \"string\")",
        "b && c == \"string\"",
        "c == \"string\" && b",
    ];

    for expression in expressions {
        let program = Program::compile(expression).unwrap();
        let mut context = Context::default();
        context.add_variable("b", Value::Bool(true));
        context.add_variable("c", Value::String(String::from("string").into()));

        let result = program.execute(&context);

        println!("{:?} <= {}", result, expression)
    }
}

It produces the following output:

Ok(Bool(true)) <= b && (c == "string")
Ok(Bool(false)) <= b && c == "string"
Ok(Bool(true)) <= c == "string" && b

It seems like in the case of b && c == "string" the interpreter effectively evaluates this expression

(b && c) == "string"

I'm also using a Python version of CEL interpreter and it evaluates it properly:

import celpy

expressions = [
    'b && (c == "string")',
    'b && c == "string"',
    'c == "string" && b',
]

for expression in expressions:
    env = celpy.Environment()
    ast = env.compile(expression)
    prgm = env.program(ast)

    activation = celpy.json_to_cel({"a": 1, "b": True, "c": "string"})
    result = prgm.evaluate(activation)
    print(f"{result} <= {expression}")

Produces

True <= b && (c == "string")
True <= b && c == "string"
True <= c == "string" && b

Regex support?

This is more of a question really than an actual issue... for now at least.

If I read the cel spec properly, it is expected it'd support regular expressions, re2 based flavor.

Trying this out:

#[test]
fn test_matches() {
  let tests = vec![
      ("map", "{1: 'abc', 2: 'def', 3: 'ghi'}.all(key, key.matches('^[a-zA-Z]*$')) == true"),
      ("string", "'foobar'.matches('^[a-zA-Z]*$') == true"),
  ];

  for (name, script) in tests {
      assert_eq!(test_script(script, None), Ok(true.into()), "{}", name);
  }
}

I'm getting a Err(NoSuchKey("matches")) for both map & string tests. Couldn't see anything mentioning regular expressions neither. So here's the questions, is this a conscious decision to not support them? Looking around, I couldn't find a good candidate regex lib to start implementing that support actually... especially as in our use-case we're looking at targeting wasm...

Have you considered that side of the spec? Any conclusion you came to already?

Implement missing macros `exists` and `exists_one`

https://github.com/google/cel-spec/blob/master/doc/langdef.md#macros

Indexed-based map access not implemented

The library supports traversing maps using dot notation, but index notation is not supported

// Dot notation
foo.bar

// Index notation
foo["bar"]

Timestamp issue?

I'm not sure whether I'm the one doing something here, but I find this slightly confusing:

let script = "ts == timestamp('2023-05-28T00:00:00+00:00')";
let program = Program::compile(script).unwrap();
let mut context = Context::default();
let ts: DateTime<FixedOffset> = DateTime::parse_from_rfc3339("2023-05-28T00:00:00+00:00").unwrap();
context.add_variable("ts", Value::Timestamp(ts)).unwrap();
assert_eq!(program.execute(&context), Ok(true.into()));

Interestingly, this yields comparing: Timestamp(2023-05-28T00:00:00+00:00) vs String("2023-05-28T00:00:00+00:00")
Where the lhs is the timestamp('2023-05-28T00:00:00+00:00', but for some reason ts ends up being a ... String? Am I missing something here?

v0.7.0: Expression `size == "50"` causes a panic.

I would expect that no expression can panic the interpreter.

unable to compare String("50") with Function("size", None)
thread 'limit::tests::cel::size_function_and_size_var' panicked at /Users/chirino/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cel-interpreter-0.7.0/src/objects.rs:270:23:
unable to compare String("50") with Function("size", None)

Support thread-safe program execution

I'm experimenting with writing a Python extension for this library using pyo3 and running into issues when it comes down to concurrency. I'm not really well-versed in Rust, but I asked about it here. As far as I understand it boils down to using Arc instead of Rc.

I'm currently using a Python version of CEL interpreter, but its performance leaves a lot to be desired, so I'm looking for an alternatives. I use CEL for feature flags so we have multiple compiled expressions which are evaluated from different threads.

What are your thoughts about it? What would it take to make the interpreter thread-safe?

I'm willing to help, but my Rust knowledge if very very limited :)

Switch to Chumsky for parsing

Benefits of using chumsky for parsing:

Easier to read and modify than LALRPOP grammar
Much better error reporting and syntax assistance

High level plan:

Do you want to keep both parsers? If so, how should the API work to pick between them? Assume it wouldn't be too tricky to add unsigned ints and un-escaped strings to the current lalrpop version?

clarkmcc / cel-rust Goto Github PK

cel-rust's Issues

Recommend Projects

Recommend Topics

Recommend Org