Coder Social home page Coder Social logo

Comments (6)

WelliSolutions avatar WelliSolutions commented on July 16, 2024

More details:
In package_model.py (of pyecma376_2), the check is made:

def check_part_name(part_name: str) -> None:
    """ Check if `part_name` is a valid OPC part name in URI (not IRI) representation.

    :raises ValueError: if it is not.
    """
    if not RE_PART_NAME.match(part_name):
        raise ValueError("{} is not an URI path with multiple segments (each not empty and not starting with '.') "
                         "or not starting with '/' or ending wit '/'".format(repr(part_name)))

And RE_PART_NAME does not allow spaces:

RE_PART_NAME = re.compile(r'^(/[A-Za-z0-9\-\._~%:@!$&\'()*+,;=]*[A-Za-z0-9\-_~%:@!$&\'()*+,;=])+$')

Let's first conclude that \. and \' are equivalent to . and ', because it's inside a bracket expression and those don't need to be escaped there.

The first bracket expression contains the ., the second doesn't. Also, the first bracket expression is optional, the second isn't. So, to me, the Regex is checking whether something does not end with a dot, rather than "not starting with '.'" as mentioned in the error message. So, at least this part may be a bug in the pyecma library (either the error message or the Regex).

Other than that, I don't have enough knowledge to judge whether ECMA 376 allows spaces in file names or not.

from basyx-python-sdk.

WelliSolutions avatar WelliSolutions commented on July 16, 2024

OPC Specification IMHO is here: https://standards.iso.org/ittf/PubliclyAvailableStandards/c077818_ISO_IEC_29500-2_2021(E).zip
(download link suggested by Wikipedia does not work).

The relevant chapter seems to be 6.2.2 Part Names. And 6.2.2.2 Syntax specifies:

No I18N segments shall end with a dot (“.”) character.

So this is really about ending with a dot.

Furthermore:

[...] where an I18N segment is a Unicode string that
matches the non-terminal isegment-nz and percent-encoding represents a character by the percent
character "%" followed by two hexadecimal digits, as specified in RFC 3986

So it seems to me that the Basyx LIbrar should percent-encode the space to %20.

from basyx-python-sdk.

jkhsjdhjs avatar jkhsjdhjs commented on July 16, 2024

I agree, perhaps we should already call check_part_name() within DictSupplementaryFileContainer.add_file(), and if that raises an exception, percent encode the part name? Or just always percent encode?

from basyx-python-sdk.

jkhsjdhjs avatar jkhsjdhjs commented on July 16, 2024

Alright, I did some digging:

When writing supplementary files, the AASXWriter does the following:

logger.debug("Writing supplementary file {} to AASX package ...".format(file_name))
with self.writer.open_part(file_name, content_type) as p:
file_store.write_file(file_name, p)
supplementary_file_names.append(pyecma376_2.package_model.normalize_part_name(file_name))
self._supplementary_part_names[file_name] = hash

For each supplementary file, it calls OPCPackageReader.open_part() with the filename supplied by the user, the way the file was added to the SupplementaryFileContainer. This determines the name the file will have in the AASX file.
Afterwards, it adds the normalized filename to the list supplementary_file_names, which will later be used to write the relationship files.

Normalization percent-encodes the string and converts it to lower case:
https://github.com/rwth-iat/PyECMA376-2/blob/e7c28ca9c9fbebfae3d3ab9a8abc104aa57f7913/pyecma376_2/package_model.py#L642-L645

Thus, it seems to me, that only the relationship files are expected to contain normalized, percent-encoded URIs, and the filenames are actually not expected to follow this format. If the filenames would be percent-encoded and converted to lower case, it would also be impossible to determine the original filename before packaging, and the assertion at the end of tutorial_aasx.py would never be true:

assert actual_file_name in new_file_store

Yet, OPCPackageWriter.open_part() and OPCPackageWriter.create_fragmented_part() both call check_part_name() on the passed filename, which validates the filename against the regular expression for part URIs.

So, I had a look at the specification you linked and it doesn't confirm my thoughts (well, partially):

7.3.4 Mapping logical item names to ZIP item names
For each logical item, the process of mapping of logical item names to ZIP item names shall involve the following steps, in order:

  1. Remove the leading forward slash (“/”) from the logical item name or, in the case of interleaved parts, from each of the logical item names within the complete sequence.
    Remove the leading forward slash (“/”) from the logical item name or, parts, from each of the logical item names within the complete sequence.
  2. Percent-encode every item non-ASCII character.

Validating the filenames via check_part_name() is wrong, as spaces are ASCII characters and thus allowed in filenames.
However, non-ASCII characters still need to be percent-encoded. Thus, the correct way would be to replace the call to check_part_name() by a call to a function that percent-encodes the passed filename, or at least to change the current validation.

from basyx-python-sdk.

s-heppner avatar s-heppner commented on July 16, 2024

Did I understand correctly, that this is a bug in the check_part_name() function of the pyecma376_2 library?

from basyx-python-sdk.

jkhsjdhjs avatar jkhsjdhjs commented on July 16, 2024

I think the check_part_name() function is correct, what's incorrect is that this function is used to validate filenames for the zip package. IMO, a function to correctly percent-encode filenames for AASX packages should be implemented in pyecma376_2.

pyecma376_2 already has a function to percent-encode and convert to lower-case part names. However, the way I understand it is that this percent-encoding for part-names is different from the percent-encoding for filenames, as it

  1. converts the entire string to lower-case and
  2. also encodes characters that would be allowed in filenames, such as spaces.

from basyx-python-sdk.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.