clarkmcc / cel-rust Goto Github PK
View Code? Open in Web Editor NEWCommon Expression Language interpreter written in Rust
Home Page: https://crates.io/crates/cel-interpreter
License: MIT License
Common Expression Language interpreter written in Rust
Home Page: https://crates.io/crates/cel-interpreter
License: MIT License
Providing massive expressions is a good way to bring down a server. It's pretty trivial to, for example, write a little script that creates an "or" chain 100,000 Booleans long. Because of this, it'd be good to provide support for optionally limiting the number of expressions evaluated by a Program
.
Could be a regression because of #59 - though looks like the unwrap()
on the lookup was there before.
Anyways, what led me to start working on #64 is hitting that very problem invoking bytes()
:
panicked at interpreter/src/objects.rs:526:58:
called `Option::unwrap()` on a `None` value
Unsure what error should be reported, but panicking isn't certainly not the right thing. I'll look into it
From the cle doc it seems like it's possible to construct strongly typed types inside cel and pass them back. Does cel-rust support this? Couldn't find documentation about this.
Example:
use cel_interpreter::{Context, Program};
use serde::Serialize;
// An example struct that derives Serialize
#[derive(Serialize)]
struct MyStruct {
a: i32,
b: i32,
}
fn main() {
let program = Program::compile("MyStruct { a: 0, b: 0}").unwrap();
let context = Context::default();
let value = program.execute(&context).unwrap();
assert_eq!(value, true.into());
println!("{value:?}");
}
At the moment, the functions
module is only available within the crate. I think it would be useful to make those public so that people can opt into/out of specific functions in this crate when initializing their Context
(for example if I only want contains
, has
, and exists
in my context).
I'm happy to PR this.
Would be nice if you could dynamically values in case your data set is too large to provide all upfront via context variables.
I'm think of something like:
// You can resolve dynamic values by providing a custom resolver function.
let external_values = Arc::new(HashMap::from([("hello".to_string(), "world".to_string())]));
let resolver = Some(move |ident: Arc<String>| {
let name: String = ident.to_string();
match external_values.get(name.as_str()) {
Some(v) => Ok(Value::String(v.clone().into())),
None => Err(ExecutionError::UndeclaredReference(name.into())),
}
});
let mut ctx = Context::default();
ctx.set_dynamic_resolver(resolver);
assert_eq!(test_script("hello == 'world'", Some(ctx)), Ok(true.into()));
I think the above should be doable without too many changes to the current apis and allow you to dynamically resolve simple variables.
But I don't think that enough, it would be nice if you could also write expression like this:
assert_eq!(test_script("github.orgs['clarkmcc'].projects['cel-rust'].license == 'MIT'", Some(ctx)), Ok(true.into()));
For that to work I think the resolver would need to return a Value that is treated like a Map but whose members are dynamic but it's data is backed by a user defined data type. I think I'm saying the above may need Value to carry a generic variant for custom data types. Thoughts?
Currently the dependency to chrono
enables all default features… which includes wasmbind
. As far as I can tell tho, only alloc
is really required. When you target a wasm runtime that's not the browser, chances are high some bindings won't be available and mostly not needed (i.e. no Javascript).
"Simple fix" is to chrono = { version= "0.4.26", default-features = false, features = ["alloc"] }
only depends on what's needed. But I could also see a wasm/wasi
feature here that'd do the appropriate… wdyt? What sounds better limit to what's needed? Or introduce a "wasm profile", have possibly wasmbind
as a default feature, but then now becomes "disablable" by users (which would map to what chrono
does)? I can create the PR, no worries!
It would be nice if the parser took note of which functions and variables a given script actually made use of and made that information available to the embedding program.
My primary motivation for wanting this is so that I can cache some variables to the scripts that reference them, and only run scripts when certain variables change. Currently, there is no way to tell which scripts actually reference specific functions/variables, so there's no way to make any kind of mapping between references and which scripts reference them. Besides this, it could be used for general sandboxing notifications, like if certain functions/variables were only valid in certain contexts, the user of this library could print a coherent diagnostic about the offending reference, instead of reporting that a reference simply doesn't exist. And, even besides that, it would just be nice for debugging and diagnostics.
fn main() {
let program = Program::compile("headers[\"Content-Type\"].contains(\"application/json\")").unwrap();
let mut context = Context::default();
let mut headers = HashMap::new();
headers.insert("Content-Type","application/json".to_string());
println!("{}",headers["Content-Type"]);
context.add_variable("headers", headers);
let value = program.execute(&context).unwrap();
assert_eq!(value, true.into());
}
headers["Content-Type"] 这样会报错,thread 'main' panicked at 'not implemented'
该如何实现呢,我知道headers.status这种写法可以,但是需要修改表达式
We should be able to add any type that implements Serialize as a variable to the context
The official cel-spec has some additional macros / functions that might be useful for some of our use cases like: exists
or exists_one
.
Unfortunately I am not able to implement them by myself because: Context.clone()
is pub(crate)
and not pub
:
cel-rust/interpreter/src/context.rs
Line 115 in f4fa854
Am I missing something or wouldn't it be useful to provide the clone functionality and context shadowing also for custom extension functions?
This issue is meant to be a scratchpad for ideas and feedback on improving how types and values are handled in the interpreter. There have been several different feature requests recently that could be solved by the a fundamental shift in how CEL values are managed.
Value
when only some fields in those types are actually referenced.serde_json::Value
.Today any value referenced in a CEL expression is owned by either the Program
or the Context
. The Program
owns values that were created directly in the CEL expression, for example in the expression ("foo" == "bar") == false
, there are three values owned by the Program
, "foo", "bar", and false
. Values that are owned by the Context
are values that are dynamically provided at execution time to a Program
as variables, like foo == bar
, both foo
and bar
are values owned by the Context.
Context
so that it does not own any data, meaning that you do not need to clone your types to use them in CEL. Instead I'd like the Context
to have references to that data.Questions:
Arc
/Rc
's anymore or could we get away with just this since we would assume the caller owned all the data. RIght now, an Arc
is required for values owned by Program
because a Program
can be executed one or more times simultaneously.
pub enum Value<'a> {
String(&'a str)
}
Value
enum, we have a Value
trait that exposes every behavior supported in a CEL expression, i.e.:
pub trait Value {
fn get_property(&self, key: &str) -> Box<dyn Value>;
fn add(&self, other: &dyn Value) -> Box<dyn Value>;
fn sub(&self, other: &dyn Value) -> Box<dyn Value>;
}
I’m fairly new to Rust and am using this project. I’d like to convert the executed Value into a String so I can serialize it to JSON.
Does Value need to implement From for this to work?
#[derive(Serialize)]
struct MidType<'a> {
body: &'a [u8],
raw: &'a [u8]
}
fn main() {
let program = Program::compile("foo.body.contains(1)").unwrap();
let mut context = Context::default();
context.add_variable("foo", MidType {
body: &[1,2,3],
raw: &[]
}).unwrap();
let v = program.execute(&context).unwrap();
println!("{:?}",v);
}
the body will be serialized as List[UInt], but the number 1 is Int type. output: Bool(false)
This simple expression, taken straight from the spec, fails: size(requests) > size
Reproducible test case:
let program = Program::compile("size(requests) > size").unwrap();
let mut context = Context::default();
let requests = vec![Value::Int(42), Value::Int(42), Value::Int(42)];
context.add_variable("requests", Value::List(Arc::new(requests))).unwrap();
context.add_variable("size", Value::Int(42)).unwrap();
program.execute(&context) // Err` value: ValuesNotComparable(Int(3), Function("size", None))
While as per the doc:
the first size is a function, and the second is a variable.
Here both size
result in the function. And the variable gets overshadowed by the function always.
I've been playing around with this CEL implementation and I noticed one odd thing with the following expressions:
b && (c == "string")
b && c == "string"
c == "string" && b
Given this context
{"b": True, "c": "string"}
they should all evaluate to true
, but this is not what's happening:
True <= b && (c == "string")
False <= b && c == "string"
True <= c == "string" && b
Here's a simple reproducer:
use cel_interpreter::{Context,Program, Value};
fn main() {
let expressions = [
"b && (c == \"string\")",
"b && c == \"string\"",
"c == \"string\" && b",
];
for expression in expressions {
let program = Program::compile(expression).unwrap();
let mut context = Context::default();
context.add_variable("b", Value::Bool(true));
context.add_variable("c", Value::String(String::from("string").into()));
let result = program.execute(&context);
println!("{:?} <= {}", result, expression)
}
}
It produces the following output:
Ok(Bool(true)) <= b && (c == "string")
Ok(Bool(false)) <= b && c == "string"
Ok(Bool(true)) <= c == "string" && b
It seems like in the case of b && c == "string"
the interpreter effectively evaluates this expression
(b && c) == "string"
I'm also using a Python version of CEL interpreter and it evaluates it properly:
import celpy
expressions = [
'b && (c == "string")',
'b && c == "string"',
'c == "string" && b',
]
for expression in expressions:
env = celpy.Environment()
ast = env.compile(expression)
prgm = env.program(ast)
activation = celpy.json_to_cel({"a": 1, "b": True, "c": "string"})
result = prgm.evaluate(activation)
print(f"{result} <= {expression}")
Produces
True <= b && (c == "string")
True <= b && c == "string"
True <= c == "string" && b
This is more of a question really than an actual issue... for now at least.
If I read the cel spec properly, it is expected it'd support regular expressions, re2 based flavor.
Trying this out:
#[test]
fn test_matches() {
let tests = vec![
("map", "{1: 'abc', 2: 'def', 3: 'ghi'}.all(key, key.matches('^[a-zA-Z]*$')) == true"),
("string", "'foobar'.matches('^[a-zA-Z]*$') == true"),
];
for (name, script) in tests {
assert_eq!(test_script(script, None), Ok(true.into()), "{}", name);
}
}
I'm getting a Err(NoSuchKey("matches"))
for both map
& string
tests. Couldn't see anything mentioning regular expressions neither. So here's the questions, is this a conscious decision to not support them? Looking around, I couldn't find a good candidate regex lib to start implementing that support actually... especially as in our use-case we're looking at targeting wasm...
Have you considered that side of the spec? Any conclusion you came to already?
The library supports traversing maps using dot notation, but index notation is not supported
// Dot notation
foo.bar
// Index notation
foo["bar"]
I'm not sure whether I'm the one doing something here, but I find this slightly confusing:
let script = "ts == timestamp('2023-05-28T00:00:00+00:00')";
let program = Program::compile(script).unwrap();
let mut context = Context::default();
let ts: DateTime<FixedOffset> = DateTime::parse_from_rfc3339("2023-05-28T00:00:00+00:00").unwrap();
context.add_variable("ts", Value::Timestamp(ts)).unwrap();
assert_eq!(program.execute(&context), Ok(true.into()));
Interestingly, this yields comparing: Timestamp(2023-05-28T00:00:00+00:00) vs String("2023-05-28T00:00:00+00:00")
Where the lhs is the timestamp('2023-05-28T00:00:00+00:00'
, but for some reason ts
ends up being a ... String
? Am I missing something here?
I would expect that no expression can panic the interpreter.
unable to compare String("50") with Function("size", None)
thread 'limit::tests::cel::size_function_and_size_var' panicked at /Users/chirino/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cel-interpreter-0.7.0/src/objects.rs:270:23:
unable to compare String("50") with Function("size", None)
I'm experimenting with writing a Python extension for this library using pyo3
and running into issues when it comes down to concurrency. I'm not really well-versed in Rust, but I asked about it here. As far as I understand it boils down to using Arc
instead of Rc
.
I'm currently using a Python version of CEL interpreter, but its performance leaves a lot to be desired, so I'm looking for an alternatives. I use CEL for feature flags so we have multiple compiled expressions which are evaluated from different threads.
What are your thoughts about it? What would it take to make the interpreter thread-safe?
I'm willing to help, but my Rust knowledge if very very limited :)
Benefits of using chumsky for parsing:
High level plan:
&&
, ||
, in
and ternary operations'1'.double()
or 10.double()
Do you want to keep both parsers? If so, how should the API work to pick between them? Assume it wouldn't be too tricky to add unsigned ints and un-escaped strings to the current lalrpop version?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.