Background
This came up in #18 as an implementation detail, but I think it's incredibly useful in its own right.
Sample use cases and prior art:
I pushed two util functions a few days ago (getByPath()
and setByPath()
but I think we should clean up the code, make it more flexible, and expose at the top level.
Some design decisions, requirements, questions below.
Signature
Function names
I’m leaning towards the simple get()
and set()
that are already established in prior work. Is there anything else we may want to save the names get()
and set()
for?
Arguments
Strawman:
get(obj, path [, options])
set(obj, path, value [, options])
Might be worth to later have overloads that allow specifying value
, path
as part of the options object, but don't see compelling motivation to include that in the MVP.
Should there be a way to set value
via a function that takes the path and current value as parameters? Or is it encroaching too much into transform()
territory at this point?
Path structure
Data type
Paths should be provided as arrays, we don’t want to deal with string parsing and trying to distinguish paths from property names. Strings/numbers should be accepted as well, but they’re just a path of length 1.
We may want to also support objects to provide additional metadata (see below).
Predicates
It seems obvious that entirely literal paths will not suffice (at the very least we need wildcards).
Should we just use JSON Path? Hell no! First it's overkill for these use cases, and second once you go beyond literal property names + wildcards, the syntax becomes cryptic AF. And despite its complexity, there are some pretty common use cases like case insensitivity it doesn’t seem to support.
So since we can’t just use JSON Path, what do we use? What predicates do we want to support? Examples:
- Wildcards (any property at this level)
- Case-insensitive property names?
- Alternatives? (e.g. "foo or bar")
- Ranges of numbers? (e.g. "top 3 items")
- Property queries (e.g. "get items with id=foo") — essentially the path version of CSS
:has()
, so we'd probably want to frame it that way, i.e. "children that match this path", so I’ll call them child queries from now on
- Property names that start/end with a given string?
- Property name regex?
We generally want to keep the MVP simple until use cases emerge, but it helps to take these things into account at the design stage so that the API has room to expand.
As mentioned above, wildcards are certainly needed.
Case-insensitive matching might be worth to include in the MVP, since at least the Mavo use cases need it.
The rest we can probably ship without and add as needed.
Syntax for predicates
So that begs the question, how do we express these predicates?
Special syntax. This works decently for some of them:
- Wildcards:
*
- Alternatives:
foo|bar
- Number ranges:
0-3
or 0 .. 3
- Property queries:
id=foo
However, but there is no obvious fit for any of the others. Also, inventing a new microsyntax has several drawbacks:
- The larger the syntax space for special syntax, the more challenging to distinguish it from literal property names. How do you do the escaping? Backslashes? How do you distinguish a literal
"\*"
property then? More backslashes? It's backslashes all the way down!
- Devs would need to use string concatenation when the criteria is variable, which is awkward. E.g. Mavo paths support property queries like
id=foo
and I now think that's a terrible idea and we dropped that kind of support from get()
(it's now only supported in mv-path
, which being an HTML attribute it only takes strings so it can't take anything more structured).
- It forces you to come up with syntax for things where there is no obvious syntax to use, resulting in a cryptic language.
So instead, I think we should go with an approach of strings for literals + wildcards as the only exception, since these are very common and have a very obvious syntax. Anything else would require making that part of the path an object literal.
This means even if we only ship wildcard as the only predicate, we need to support object literals at least to escape that and specify that something is a literal property name. If we have that escape hatch, we could in the future explore more options to add syntax for certain things where a readable syntactic option is obvious, as a shortcut (e.g. "foo|bar"
for alternatives)
Predicate schema
Strawman for all of the above predicates (even though we don't plan to implement them all):
- Path:
string | (string | PathSegment)[]
- PathSegment: Object with keys (all optional):
name
: Literal property name (string
) but maybe could also be a RegExp
?
ignoreCase
(boolean
)
range
: Numerical range (number[2]
or {from, to}
or even {gt, gte, lt, lte}
?)
or
: Alternatives ((string | PathSegment[])
)
has
: Return only children for which this would be non-empty (Path
)
startsWith
endsWith
regexp
Notes:
ignoreCase
is special. All other criteria are independent, but ignoreCase
affects how other criteria work, i.e. is a modifier rather than a predicate:
name
: from strict equality to equality after .toLowerCase()
regexp
: Adds the i
flag if not present
startsWith
/endsWith
: applies .toLowerCase()
before matching
or
and has
: inherits to any path segments that don't have their own ignoreCase
- Are there any other modifiers that we may conceivably want to support in the future (so we can take them into account in the design)?
- Multiple independent criteria can be specified, and the result is the intersection. This way, since we already have
or
complex logical criteria can be created by just nesting these. 😁
- Should we also handle arrays as sugar for
{or: array}
?
How do predicates work with set()
?
Setting is only an issue for the last part of the path — until then it's still a getting task.
So if the last part of the path is a…
- Wildcard: Set every property that exists?
- Alternatives: Set every property among the alternatives or only those that already exist on the object?
- Numerical ranges: set every number in the range?
- Child queries: 🤷🏽♀️ Replace these objects with the value?
- Regexps or Starts/ends with: 🤷🏽♀️🤷🏽♀️🤷🏽♀️
Return value
- One ore more values? For static paths, there can only be a single return value. However, when predicates are involved, there could be multiple, and it's impossible to tell whether one or more values are expected.
- Array or object subset?: when returning multiple values, how much of the original object structure do we want to preserve? There are use cases for getting a completely flat array, and use cases for subsetting the object, i.e. using the path as an allowlist of properties while preserving the original object structure.
Following the design principle that function return values should not vary wildly based on the options passed, perhaps we actually need more than just a single get()
function:
get()
: Array of all values
first()
: First value only
subset()
: Subset of object
Or perhaps get()
for one value and getAll()
for multiple?
Options for the whole path
These will be passed to the functions as part of the options
dictionary.
- Case insensitive matching (when we want it for the whole path)
set()
only: What object to create when part of the path doesn’t exist? {}
by default. Might be useful to take a function to customize based on the path.
- We definitely don't want to throw if the path doesn't exist, since avoiding that is one of the primary reason to use such a helper. Is there value in having an opt-in to stricter handling?