Coder Social home page Coder Social logo

XSD regular expression flavor about ids HOT 5 OPEN

atomczak avatar atomczak commented on July 24, 2024
XSD regular expression flavor

from ids.

Comments (5)

CBenghi avatar CBenghi commented on July 24, 2024 2

Off memory I think I've probably removed that regex in the current Development branch, because it was conflicting with the datatype, anyway:

IDS/Development/IDS_oma.ids

Lines 149 to 161 in 6d71cdf

<ids:property cardinality="optional" dataType="IFCLENGTHMEASURE" instructions="Derived length, for example length of the corridor.">
<ids:propertySet>
<ids:simpleValue>Qto_SpaceBaseQuantities</ids:simpleValue>
</ids:propertySet>
<ids:baseName>
<ids:simpleValue>NominalLength</ids:simpleValue>
</ids:baseName>
<ids:value>
<xs:restriction base="xs:string">
<xs:pattern value="^(0*(\.\d+)|[1-9]\d*(\.\d+)?)$"/>
</xs:restriction>
</ids:value>
</ids:property>

My view is that IFCLENGTHMEASURE requires xs:double in the base type, which in turn disallows the pattern node.

Your point is of course still valid with respect to the need of documentation on regex flavour. My hope is to enforce it appropriately via the audit tool.

from ids.

gverduci avatar gverduci commented on July 24, 2024 1

I'm not sure about the shorthand \d. I think it is supported by XSD and matches all Unicode digits: 0-9¹¾六௰Ⅹ೬Дに... but it would be good if someone could confirm.

@atomczak I think the shorthand \d is valid: this link shows all supported multi-character escapes:

https://www.w3.org/TR/2012/REC-xmlschema11-2-20120405/datatypes.html#cces-mce

and matches only \p{Nd} (Number of decimal digits - General category properties https://www.unicode.org/reports/tr18/#General_Category_Property).

Using the unicode database it is possible to find all characters in this set:

https://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt

from ids.

janbrouwer avatar janbrouwer commented on July 24, 2024

I think I made that regex, an experiment to see if it is possible to validate a positivelengthmeasure, I believe the regex validation site mentioned in the IDS docs thought it ok, but they're are probably better ways to do this

from ids.

aothms avatar aothms commented on July 24, 2024

Great suggestions @gverduci, this indeed confirms @atomczak's suspicion:

$ grep ';Nd;' UnicodeData.txt | cut -d\; -f1 | xargs -I{} printf \\U000{} 2> /dev/null
𐒠𐒡𐒢𐒣𐒤𐒥𐒦𐒧𐒨𐒩𐴰𐴱𐴲𐴳𐴴𐴵𐴶𐴷𐴸𐴹𑁦𑁧𑁨𑁩𑁪𑁫𑁬𑁭𑁮𑁯𑃰𑃱𑃲𑃳𑃴𑃵𑃶𑃷𑃸𑃹𑄶𑄷𑄸𑄹𑄺𑄻𑄼𑄽𑄾𑄿𑇐𑇑𑇒𑇓𑇔𑇕𑇖𑇗𑇘𑇙𑋰𑋱𑋲𑋳𑋴𑋵𑋶𑋷𑋸𑋹...

(these are just a couple of them, I couldn't quickly figure out how to generically get the hex formatted code points to printable characters)

from ids.

atomczak avatar atomczak commented on July 24, 2024

Thanks all, I mainly wanted to be sure if I'm not mistaken. And yes, this example is already removed from latest Dev branch.

My hope is to enforce it appropriately via the audit tool.

I see a potential problem with auditing regex - ^ABC$ is not an invalid pattern. But it is checking for literal strings starting with caret and ending with dollar, and the user probably only wanted to allow 'ABC' value. So not an error but a soft warning :)

Using the unicode database it is possible to find all characters in this set

Thanks! If I read this right, \d in XSD represents 100 allowed digits. While this is fine for most cases, for my purpose [0-9] serves better, as I only want those 10.

from ids.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.