Coder Social home page Coder Social logo

pycql2's Introduction

PyCQL2

Overview

Pydantic models and lark parser for OGC CQL2. As the the specification is still in draft format, changes may be made and cause this to become incorrect.

Representations are not perfectly transitive. In cql2-json and cql2-text there are slightly different ways to represent everything. Internally everything is represented as cql2-json and some details of the cql2-text are no longer needed after being parsed. So, it is impossible to guarantee a round trip operation: cql2-text -> cql2-json -> cql2-text will result in an identical string. The meaning will not be changed but the representation might.

When parsing cql2-text to cql2-json.

  • Due to how the JSON representation works, NOT is pulled out in front of comparison predicates.
    • ... NOT LIKE ... becomes NOT ... LIKE ...
    • ... NOT BETWEEN ... becomes NOT ... BETWEEN ...
    • ... NOT IN ... become NOT ... IN ...
    • ... IS NOT NULL becomes NOT ... IS NULL
  • Negative arithmetic operands become a multiply by -1: {"op": "*", "params": [-1, <arithmetic_operand>]}
  • Within a Character Literal, the character (' or \) used to escape a single quote is not preserved.
    • Any '' will become \'

The cql2-text output from the pydantic models is opinionated and explicit. These choices have been made to keep the logic simple while ensuring the correctness of the output.

  • All property names are double quoted ".
  • Character Literals escape single quotes with a backslash \'.
  • Parenthesis () are placed around all comparison and arithmetic operations.
    • This means that many outputs include a set of parentheses around the whole string.
    • This may not be not ideal, but it is also not incorrect.
    • Additional testing may be done in the future to determine if a safe and easy way exists to remove them.
  • Timestamps always contain decimal seconds out to 6 decimal places even when 0: .000000. It uses strftime with %f currently. Logic may be added later to adjust this.
  • Floats ending in .0 will include the .0 in the text. Where other libraries such as shapely will not include them in WKT.

The cql2-text spec was not strictly followed for WKT. Some tweaks were made to increase it is compatible with geojson-pydantic, as well as accept the WKT output.

  • Added optional Z to each geometry.
    • This does not enforce 2d / 3d, just allows the character to be there.
  • LineString coordinates require a minimum of 2 coordinates.
  • Added 'Linear Ring' for use in Polygons with a minimum of 4 coordinates.
    • This does not enforce the ring being closed, just that it contains enough coordinates to be one.
  • Moved BBOX so it cannot be included in GeometryCollection.
  • Added alternative MultiPoint syntax to maintain compatibility with other spatial tools and libraries.
    • Allows reading of MUTLIPOINT(0 0, 1 1) without the inner parenthesis.
    • This may be removed in the future if it is no longer necessary for compatibility. Removal will be considered a breaking change and be versioned as such.
    • Note: Output will always include the inner parenthesis (via geojson-pydantic).

There are a few things which may be issues with the spec but have not been fully addressed yet.

  • spatial_literal includes bbox, and geometry_collection allows for all spatial_literal within it. But bbox does not seem to be a part of WKT. This would mean the cql2-text -> cql2-json conversion would break where geojson-pydantic doesn't accept these cases.
  • The spec does not allow for EMPTY geometries.

Testing

The tests have been created to exercise various parts of the parsers, and are not meant to serve as realistic examples. Parts like geometries may not make sense but are valid per the specs.

Each file in tests/data/json/ is a standalone cql2-json example. There will be at least one corresponding file in tests/data/text which is a cql2-text equivalent. These corresponding examples should always convert back and forth identically. Since there are multiple ways to write the same thing in cql2-text there may be additional numbered alternative examples like -alt01. These will all parse to the same json, which in turn will output the main text example.

While 100% of the lines of code are covered, more complex examples with more nested logic will be added in the future. As well as more variety to various inputs, the current examples are mostly PropertyRef and numbers. Such as:

  • More complex identifiers with _, ., :, and non ascii letters.
  • Deeply nested logic.
  • Each type of scalar_expression on each side of a binary_comparison_predicate, etc.

Hypothesis

Support has been added for Hypothesis for cql2-text. The grammar is quite complex so this can be very slow, but a few bugs have been found with it. Strategies had to be tweaked to handle date / datetime as the grammar allows for dates like 0000-00-00 but python will not parse them. Additionally, a custom strategy was added for polygons, since the grammar has no ability to convey closing a polygon.

In addition to this, HypoFuzz was used to run coverage based testing. It ran 33,000 different tests including 22,961 of them without finding a new branches in the code to cover. This seems to indicate the cql2-text -> cql2-json transformation has been fairly thoroughly tested.

Support will be added for cql2-json later. There are additional custom strategies which will be necessary.

CQL2 Spec

Writing this parser has resulted in feedback and contributions to the ogcapi-features CQL2 spec:

  • Reported issue with Alpha and Symbols (fixed): #787
  • Submit PR for minor inconsistencies between schema formats (pending): #794
  • Added notes to Updating examples ticket and offered these tests back: #783
  • Reported observations about WKT grammar (pending): #800

pycql2's People

Contributors

eseglem avatar dependabot[bot] avatar pre-commit-ci[bot] avatar bitner avatar

Stargazers

 avatar James Harrison avatar James Harrison avatar Andreas Motl avatar  avatar Francesco Bartoli avatar Francis Charette-Migneault avatar Christos avatar Vincent Sarago avatar Pete Gadomski avatar

Watchers

 avatar Andreas Motl avatar  avatar  avatar

pycql2's Issues

Add Hypothesis tests for the Pydantic models.

There are Hypothesis tests for the Lark grammar, but still need to add them to the Pydantic models.

This requires some custom strategies to ensure the various strings are valid within the grammar. Certain characters are problematic. Can probably do from_regex and pull them from the grammar.

Cannot import module

Working from 8d93545 on Python 3.11, I cannot import the module:

python3.11/typing.py:1551: in __subclasscheck__
    return issubclass(cls, self.__origin__)
E   TypeError: issubclass() arg 1 must be a class

To reproduce (from a bare venv):

pip install poetry && poetry install && pytest

Output:

-- 8< --
_________________________________________________________ ERROR collecting tests/test_cql2_pydantic.py _________________________________________________________
tests/test_cql2_pydantic.py:1: in <module>
    from pycql2.cql2_pydantic import join_list, make_char_literal
pycql2/cql2_pydantic.py:231: in <module>
    class ArrayExpression(BaseModel):
pydantic/main.py:222: in pydantic.main.ModelMetaclass.__new__
    ???
pydantic/fields.py:506: in pydantic.fields.ModelField.infer
    ???
pydantic/fields.py:436: in pydantic.fields.ModelField.__init__
    ???
pydantic/fields.py:552: in pydantic.fields.ModelField.prepare
    ???
pydantic/fields.py:668: in pydantic.fields.ModelField._type_analysis
    ???
../../.local/share/rtx/installs/python/3.11.2/lib/python3.11/typing.py:1551: in __subclasscheck__
    return issubclass(cls, self.__origin__)
E   TypeError: issubclass() arg 1 must be a class
____________________________________________________________ ERROR collecting tests/test_pycql2.py _____________________________________________________________
tests/test_pycql2.py:6: in <module>
    from pycql2.cql2_pydantic import BooleanExpression
pycql2/cql2_pydantic.py:231: in <module>
    class ArrayExpression(BaseModel):
pydantic/main.py:222: in pydantic.main.ModelMetaclass.__new__
    ???
pydantic/fields.py:506: in pydantic.fields.ModelField.infer
    ???
pydantic/fields.py:436: in pydantic.fields.ModelField.__init__
    ???
pydantic/fields.py:552: in pydantic.fields.ModelField.prepare
    ???
pydantic/fields.py:668: in pydantic.fields.ModelField._type_analysis
    ???
../../.local/share/rtx/installs/python/3.11.2/lib/python3.11/typing.py:1551: in __subclasscheck__
    return issubclass(cls, self.__origin__)
E   TypeError: issubclass() arg 1 must be a class
-- 8< --

Explore Generics for Accenti / Casei

Currently Accenti and Casei are a bit tricky, where each one has subclasses for CharacterExpression and PatternExpression

  • Accenti
    • AccentiCharacterExpression
    • AccentiPatternExpression
  • Casei
    • CaseiCharacterExpression
    • CaseiPatternExpression

This allows the proper enforcement of types, but generics could be nice to have. The issue is with Pydantic and runtime typing not completely lining up. A lot of messy attempts have been made to get it working but still trying to find a good solution.

Document Usage

Currently the only way to determine usage is looking at the tests.

  • Include explanations about the extensive usage of __root__ throughout the models.
  • Explain cases where the grammar can parse, but not transform. Such as bad dates.
  • Explain cases where the pydantic model may output bad text. Such as weird characters.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.