expath / xpath-ng Goto Github PK

12.0 11.0 4.0 9 KB

Wishlist for XPath Syntax Extensions

License: Creative Commons Attribution 4.0 International

xpath xquery xslt xforms xproc

xpath-ng's Introduction

XPath Syntax Extensions Wishlist 🧚

The purpose of this repository is to collaborate on and collate syntax extensions that XPath users would like to see in a future version of XPath.

This repository is not for proposing extension modules or functions for XPath (or XPath derived languages), such efforts belong elsewhere. The focus here is extending the core grammar of the XPath language.

Sometimes it is not immediately obvious whether a language extension is applicable to the whole XPath sphere or just to XQuery. We are also happy to accept proposals which may only be destined for XQuery. When a proposal is only relevant for XQuery, we will mark it as such.

Proposal Process

Fork the GitHub repository.
Make a copy of the proposal-template.md file to your-short-proposal-name.md.
Write up your proposal in your your-short-proposal-name.md file using Markdown syntax.
Git commit the your-short-proposal-name.md file.
Modify this README.md file to add your proposal to the Proposed Syntax Extensions list, and commit it.
Send a Pull Request just for your-short-proposal.md file.
Await comments and feedback in the Pull Request.

Pull requests comments will be used to discuss a proposal. Commits can be further added by the author to a Pull Request to incorporate feedback etc.

TODO - we need some sort of disagreement resolution policy / vote procedure.

Code of Conduct

Yes we all have different ideas about what we consider to be beautiful syntax. Please consider the underlying ideas of a proposal when commenting and not just whether you like the syntax or not. Once the idea being proposed is concrete enough, we can at the end discuss exact syntax etc.

Please treat others better than you would like to be treated yourself and avoid comments which might be interpreted as personally inappropriate. We have adopted the Contributor Covenant, so if in doubt please see the Code of Conduct.

Proposed Syntax Extensions

Here we categorise, list and link to each syntax extension that has been proposed so far.

Any proposals are XQuery specific, should be clearly marked as such with: "(XQuery only)"

TODO link proposals here.

xpath-ng's People

Contributors

Stargazers

Watchers

Forkers

rhdunn saxonica bmix liamquin

xpath-ng's Issues

Reducing proliferation of function namespaces

As the number of function libraries increases, the number of namespace declarations required proliferates. There are several practical consequences: the code becomes more cluttered, and it also becomes slower, because namespace declarations have to be maintained at run-time, they aren't just used for compile time disambiguation. Even when exclude-result-prefixes="#all" is used in XSLT to prevent the namespaces finding their way into the result tree, there are other operations where a large static context becomes a nuisance. For example, when we compile code to SEF files, a significant proportion of the size of the compiled file is taken up with namespace information, much of which is never used. The more namespaces there are, the more likely it becomes that two functions have different namespace contexts, and when functions are inlined, the different namespace contexts need to be maintained in the optimized code.

Many languages have some kind of mechanism to allow functions to have a "full name" and a "short name" of some kind, with a reasonably flexible mechanism to allow the short name to be expanded statically to the full name. It would be good to do this without the necessity to proliferate namespace declarations that have to be maintained at run-time.

An idea for doing this, using XSLT syntax, is to allow something like this:

<xsl:function-library>
    <xsl:import-functions namespace="http://www.w3.org/2005/xpath-functions">
       <xsl:alias-function name="put#1" as="update-put"/>
    </xsl:import-functions>
    <xsl:import-functions namespace="http://www.w3.org/2005/xpath-functions/map">
       <xsl:alias-function name="put" as="map-put"/>
    </xsl:import-functions>
    <xsl:import-functions namespace="http://www.w3.org/2005/xpath-functions/array">
       <xsl:alias-function name="put" as="array-put"/>
       <xsl:alias-function name="get" as="array-get"/>
    </xsl:import-functions>
</xsl:function-library>

When an unprefixed function name is referenced, the local name (and arity) is resolved using the declared function library. The basic rule is that the reference must be unambiguous: if two of the imported function namespaces overlap, making the local-name/arity combination ambiguous, then that function is not accessible by local-name/arity, unless it has been assigned an alias.

(This rule is designed so that if one of the function libraries expands over time, causing an ambiguity to arise where there was none before, then (a) there is no failure unless the affected function is actually used, and (b) if it is used, then a static error is reported; this situation never causes the wrong function to be executed.)

To keep independence between modules, it would probably make sense for a function library to be named and for the name to be scoped to a package, and for individual modules to say explicitly what library they are using with a declaration such as <xsl:use-function-library name="xxx"/> which has module scope.

The impact on XPath, I think, is that the concept of "default function namespace" would be replaced by "unqualified function name resolution algorithm", whose value is a procedure for statically resolving a local-name/arity to a fully qualified function name; different host languages could use different resolution algorithms.

Propose Type Aliases

Often I find that I want to have a name alias for a complex type definition. This would be particularly helpful in a number of scenarios:

When defining an API, it would be nice to have a more descriptive name for the users of the API. I believe that this can increase code readability.
Less typing, some type definitions are quite complex, a simple alias could reduce the amount of copy-paste semantics.

Suppose I have a function that takes an argument which is rather complex:

declare function local:my-function($a as (function() as map(xs:string, map(xs:string, (function(xs:integer) as xs:float+)*)))+)

It would be easier/nicer if I could have something like the following:

declare type alias local:lazy-math-functions
    as (function() as map(xs:string, map(xs:string, (function(xs:integer) as xs:float+)*)))+;

declare function local:my-function($a as local:lazy-math-functions) {
...
};

declare function local:my-other-function($a as local:lazy-math-functions, $b as xs:int+) {
...
};

Aliases are likely only syntactic sugar, they are not types themselves. The query processor should always report errors based on the real types, of course if the processor wants to keep the name of the type alias around for better error reporting, then it is free to do so.

We could also consider the vsisibility of type aliases of a module, perhaps we want to allow %public and %private like annotations.

Propose an Enumeration Type

I have no idea right now how this would work or what it would look like! However, I have a feeling that the ability to define a sealed type of enumerations could be useful.

For example if I have a function like compute-checksum($algorithm, $data), it might be useful if I could define that the $algorithm is one of a number of enumerated (supported) values and nothing else.

A further example to start a discussion:

declare enumeration local:animal := {
    CAT
    DOG
    PIG
};

declare function local:sound($animal as enumeration(local:animal)) as xs:string {
    if ($animal eq local:animal:CAT) then
        "meow"
    else if ($animal eq local:animal:DOG) then
        "woof"
    else
        "oink"
};

Slicing (sequences and arrays)

I'm proposing an extension to range expressions so you can do

0 by 2 to 10 => 0,2,4,6,8,10
5 by -1 to 1 => 5,4,3,2,1

And then a slice() function so that

slice((A,B,C,D,E), 3 to 5) => C,D, E
slice((A,B,C,D,E), -1) => E
slice((A,B,C,D,E), 1 by 2 to 5) => A, C, E
slice((A,B,C,D,E), 5 by -1 to 2) => E, D, C, B
slice((A,B,C,D,E), -1 to 4) => E, A, B, C, D
slice((A,B,C,D,E), (1,5,4)) => A, E, D

Propose several CSS Selector style shorthands (first-child, last-child, etc.)

I've found myself wanting to write XPath expressions equivalent or similar to the CSS E:first-child selector. The two use cases I have so far is:

first child element is a given element -- child::*[1]/self::element-name;
first child node is a text node -- child::node()[1]/self::text().

These are harder to read in XPath as they are spanning multiple steps, so having shorthand selectors for these could be useful.

NOTE: I don't have any concrete proposals for this.

Using first-child::element-name and first-child::text() would be easier to read. The former is CSS-like (match elements only), while the latter is XPath-like (match any nodes). Using different context behaviour would be confusing, and inconsistent with the other XPath axis selectors. It would also prevent supporting a child::node()[1]/self::element-name equivalent.

Another possibility would be to create two separate axes, such as first-child and first-child-element. This would be consistent with axes like ancestor/ancestor-or-self, but could quickly increase the number of axes. This version would add at least 4 new axes:

first-child:: (forward);
first-child-element:: (forward);
last-child:: (reverse);
last-child-element:: (reverse).

NOTE: Only the tree traversal CSS selectors would be applicable. CSS selectors like E:hover require context information that is not available, nor relevant to XPath.

Reference: https://drafts.csswg.org/selectors-4/#overview

Expression for exploding "complex atomics" to maps

This is a counterproposal to the extension of the ? syntax to "complex atomics" such as dates.

The idea is that

Expr 🎆

explodes an atomic (like date, etc...) returned by Expr into a map (with fields year, month, etc).

(the choice of 🎆 is arbitrary and a placeholder for any other reasonable choice of syntax)

It can also work on a sequence of atomics, returning a sequence of maps.

Then one can write

(current-date():fireworks: )?year

where the semantics of ? is completely unchanged.

Parentheses are here for clarity on precedence, but may or may not be needed depending on the relative precedence of 🎆 and ?.

This way, 🎆 could be combined and used with all other functionality applying to maps (functions, obtaining the keys, the values, etc), providing a more general functionality than only for ?-based lookup.

As raised in Slack, it would be clear to anybody seeing 🎆 and getting an error with an XQuery 3.1 query that this is something new, rather than an existing ? with modified semantics

Provide a QT3 style test file for each proposal?

This is something I have been thinking about. Should proposals add a conformance test XML file for the behaviour of the proposal like is available for XPath and XQuery?

Some advantages to this would be that it would help making different implementations consistent, and make it easier to add the proposals to the existing test suite.

They should ideally be in the same structure as the test suite (prod/[GrammarSymbol].xml) to make it easier to run the tests and add them to the QT3 test suite.

Compatibility between implementations

I had been thinking about how we can release proposals or unleash experimental syntax features on our users without sacrificing the portability of XQuery code between implementations.

I hadn't formulated any particularly strong ideas, apart from to note the obvious, that the XQuery Version Declaration is meant for that purpose.

Previously xquery version "1.0; very much meant W3C XML Query Language version 1.0.

Today I note that BaseX has released version 9.1, and for us, it is very exciting to see that they have included XQuery syntax extensions for: ternary if, elvis operator and, if without else.

It is not clear for me from their documentation if these are enabled by default, or can be enabled and disabled by some configuration.

My concern is that BaseX users will happily use these functions under the xquery version "3.1"; declaration, but by doing so their code become incompatible with other implementations, we have many users out there that learn from each other by sharing XQuery code between implementations. Furthermore, the features that BaseX has added are not even yet accepted "Proposals" here.

Now I want to be clear, I am not criticising BaseX in any way, I have the upmost respect for the product, @ChristianGruen and the team. I congratulate them on their new release and adding these wonderful extensions.

For the sake of portability, I wonder if we need to create the notion of xpath-ng profiles, where we can say define something like:

xpath-ng profile 20181101

Syntax extensions from the following proposals:

Of course none of the above are at "proposal" level yet, so we would need to get there first! However the idea would then be that implementations could do something like:

xquery version "3.1+xng20181101";

Multiple profiles would be allowed if necessary, e.g.

xquery version "3.1+xng20181101+xng20190101";

Thoughts?

Provide a combined grammar and grammar checker

This has been brought up in various proposal discussions (e.g. #2 (comment)) regarding compatibility of the proposals with the current XPath and XQuery grammars, and the various extensions (Full Text, Updating, Scripting).

At a minimum, this should provide an xpath and an xquery ebnf grammar file, and tools to verify that (as a separate project).

Should this also include full text (xpath and xquery variants), updating (xquery), and scripting (xquery) grammars?

Some options:

Reuse the tools used by the W3C WG to verify the grammar.
Use the parser that is used by the xquerydoc tool that generates an xquery parser in xquery from an ebnf.
Use some other parser generation tools (Java based?).