Introduction
Today we have two different array traversal semantics:
- The one in production in Sanity.io today (v1)
- An improved one that's currently implemented in GROQ-JS and the latest internal version of GROQ in Sanity.io (v2).
This is a third proposal which attempts to make
The main problem with the original proposal for improved array traversal
In v2 this will no longer work: *[_type == "user"]._id
. This is because .id
accesses an attribute and arrays don't have attributes. Instead, you would have to write *[_type == "user"][]._id
. This is a highly breaking change. However, the following still works: *[_type == "user"]{foo{bar}}
. Why doesn't this have to be *[_type == "user"][]{foo{bar}}
? (The answer is because {…}
has special behavior.)
This leads to a strange situation: We are introducing a highly breaking change in one area, but we're not achieving full consistency.
It should also be mentioned that *[_type == "user"]._id
doesn't have any other sensible meaning other than "apply ._id
to each element". It seems unfortunate that we're discarding a syntax which is unambiguous and clear.
The new proposal
Goals
The goal of this proposal is to
- try to break as little existing GROQ as possible.
- be as consistent as possible.
- enable new use cases
- be completely determined at compile/parse time.
Overview
The main complication of supporting *[…]._id
is knowing why ._id
should be mapped inside the array without depending on run time information. In *[_type == "user"]{"a": bar._id}
we want ._id
to mean "access _id on the bar object" and never treat it as an array projection.
The solution in this proposal is to treat traversal ([]
), filtering ([_type == "user"]
) and slicing ([0..5]
) as array coercing markers. These will coerce the left-hand side to an array (meaning that if it's not an array it will be converted to null), and as such we know that any .foo
coming after it makes sense to treat as mapping inside the array.
Details
The core idea behind this proposal is to introduce a concept of a mapper. .foo
, [_type == "bar"]
and ->name
are all mappers. A mapper is an operation which you can apply to a value, but are not valid expressions by themselves. We can group mappers into two categories: Simple mappers and array mappers, with the distinction being that array mappers work on arrays.
We have the following simple mappers:
- Attribute access:
.foo
- Object projection:
{foo}
(in *{foo}
, not as a standalone expression)
- Dereferencing:
->
and ->foo
.
- Group projection:
.(foo)
.
This is a new invention which would allow more flexibility in the way you compose mappers. This allows e.g. *[_type == "user"].(count(books))
which would return an array of numbers.
And then we have the following array mappers:
- Slicing:
[0..5]
- Filtering:
[_type == "user"]
- Traversal:
[]
.
This is the same as a [true]
or [0..-1]
.
It acts merely as a way of marking that the value is an array.
Pipes (|
) are supported in various places to handle backwards compatibility.
Here's the grammar for composing mappers:
MapSimple ::= …
MapArray ::= …
MapSimpleMulti ::=
MapSimple+ MapArrayMulti?
MapArrayMulti ::=
MapArray+ MapSimpleMulti?
ArrExpr ::= Star
BasicExpr ::= …
Expr ::=
BasicExpr | ArrExpr |
BasicExpr MapSimpleMulti |
BasicExpr MapArrayMulti |
ArrExpr MapSimpleMulti |
ArrExpr MapArrayMulti
Explanation:
MapArrayMulti
and MapSimpleMulti
represents composed mappers. They are mappers which are built on top of other mappers.
MapSimpleMulti
is quite simple: When applied to a value it will apply the simple mappers and the MapArrayMulti
in order.
MapArrayMulti
is a bit more complicated:
- When applied to a value it will first coerce the value to an array. If the value is not an array then it returns
null
immediately.
- Then it applies all the array mappers (e.g. filtering, slicing) on that array.
- If there's a
MapSimpleMulti
it will apply that mapper on each element of the arrry.
- In addition,
*
is interpreted as an array expression. The only impact this has is that a MapSimpleMulti
applied on an ArrExpr
will apply the mapping on each element instead of on the value itself. This casues *{foo}
to be interpreted as intended.
Implications
*[_type == "user"].id
returns the ID of all documents.
*[_type == "user"].slug.title
returns slug.title
.
*[_type == "user"].roles[].title
returns a nested array of role titles. If there's two users who have the roles (A,B) and (C), then this will return [["A", "B"], ["C"]]
.
- In
*[_type == "user"]{foo{bar}}
, then foo
must be an object. If it's an array then it will end up being null
.
- In
*[_type == "user"]{foo[]{bar}}
, then foo
must be an array.
How do we teach this?
Here are some phrases which can be used for explaining the behavior:
- "When GROQ knows that you're dealing with an array then you can add
.foo
to get the foo
attribute of all of the documents/elements/objects."
- "We can also dereference array of objects the same way: Just add
->
at the end."
- "Here GROQ doesn't know that it's an array, so we'll have to add
[]
."
How to deal with flattening?
There's never any flattening happening here. I propose that we separately introduce a flat
-function (that's what it is called in JavaScript): flat(*[_type == "user"].roles[].title)
will flatten it one level.