Coder Social home page Coder Social logo

Comments (27)

maxvol avatar maxvol commented on May 21, 2024 1

Aha, the reason is whitespace characters (newline, spaces) between equal sign and first quote:

</row><row r=
    "78" spans="1:5" x14ac:dyDescent="0.25">

from corexlsx.

MaxDesiatov avatar MaxDesiatov commented on May 21, 2024

Hi @maxvol, many thanks for reporting this. I hope that issues with sheet3.xml and sheet2.xml are fixed in #14. As for sheet1.xml, could you please share that sheet1.xml file or .xlsx archive itself? You can also share it privately to this email.

from corexlsx.

maxvol avatar maxvol commented on May 21, 2024

Regarding sheet1.xml - my guess it has something to do with encoding. I can unpack it and get proper XML string from data with .utf8 encoding. Not sure which encoding XMLCoder is using internally.

from corexlsx.

maxvol avatar maxvol commented on May 21, 2024

Error Domain=NSXMLParserErrorDomain Code=39 "(null)" UserInfo={NSXMLParserErrorColumn=16328, NSXMLParserErrorLineNumber=2, NSXMLParserErrorMessage=AttValue: " or ' expected
}

Apparently because quotes are escaped like: ="

<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>\r\n<worksheet xmlns=\"http://schemas.openxmlformats.org/spreadsheetml/2006/main\" xmlns:r=\"http://schemas.openxmlformats.org/officeDocument/2006/relationships\" xmlns:mc=\"http://schemas.openxmlformats.org/markup-compatibility/2006\" mc:Ignorable=\"x14ac\" xmlns:x14ac=\"http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac\"><dimension ref=\"A1:E128\"/><sheetViews><sheetView tabSelected=\"1\" workbookViewId=\"0\"/></sheetViews><sheetFormatPr defaultRowHeight=\"15\" x14ac:dyDescent=\"0.25\"/><cols><col min=\"1\" max=\"1\" width=\"26.42578125\" customWidth=\"1\"/><col min=\"2\" max=\"2\" width=\"16.7109375\" customWidth=\"1\"/><col min=\"3\" max=\"3\" width=\"9.140625\" style=\"1\"/><col min=\"4\" max=\"4\" width=\"82\" customWidth=\"1\"/><col min=\"6\" max=\"6\" width=\"16\" customWidth=\"1\"/></cols><sheetData> ...```

from corexlsx.

MaxDesiatov avatar MaxDesiatov commented on May 21, 2024

XMLCoder internally uses XMLParser from Foundation, which I guess picks up encoding from a standard XML header like this <?xml version="1.0" encoding="UTF-8"?>. Quotes could be an issue, but really interesting what could generate these quotes. Could you please share the source XML? I'm not sure I'd be able to investigate without having the actual XML at hand to investigate and test.

from corexlsx.

maxvol avatar maxvol commented on May 21, 2024

Here it is (without rows within sheetData, due to sensitive data) -

<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>\r\n<worksheet xmlns=\"http://schemas.openxmlformats.org/spreadsheetml/2006/main\" xmlns:r=\"http://schemas.openxmlformats.org/officeDocument/2006/relationships\" xmlns:mc=\"http://schemas.openxmlformats.org/markup-compatibility/2006\" mc:Ignorable=\"x14ac\" xmlns:x14ac=\"http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac\"><dimension ref=\"A1:E128\"/><sheetViews><sheetView tabSelected=\"1\" workbookViewId=\"0\"/></sheetViews><sheetFormatPr defaultRowHeight=\"15\" x14ac:dyDescent=\"0.25\"/><cols><col min=\"1\" max=\"1\" width=\"26.42578125\" customWidth=\"1\"/><col min=\"2\" max=\"2\" width=\"16.7109375\" customWidth=\"1\"/><col min=\"3\" max=\"3\" width=\"9.140625\" style=\"1\"/><col min=\"4\" max=\"4\" width=\"82\" customWidth=\"1\"/><col min=\"6\" max=\"6\" width=\"16\" customWidth=\"1\"/></cols><sheetData></sheetData><conditionalFormatting sqref=\"C56:C1048576 C1:C54\"><cfRule type=\"duplicateValues\" dxfId=\"2\" priority=\"48\"/></conditionalFormatting><conditionalFormatting sqref=\"C1:C1048576\"><cfRule type=\"duplicateValues\" dxfId=\"1\" priority=\"1\"/></conditionalFormatting><conditionalFormatting sqref=\"C2:C54\"><cfRule type=\"duplicateValues\" dxfId=\"0\" priority=\"359\"/></conditionalFormatting><pageMargins left=\"0.7\" right=\"0.7\" top=\"0.75\" bottom=\"0.75\" header=\"0.3\" footer=\"0.3\"/></worksheet>

from corexlsx.

MaxDesiatov avatar MaxDesiatov commented on May 21, 2024

Sorry, I'm a bit confused, is that the whole XML without sheetData, just two symbols \r\n?

from corexlsx.

maxvol avatar maxvol commented on May 21, 2024

Huh? Apparently you cannot see it for some reason, I will check formatting in my previous post.
P.S. Done! I forgot to add triple-quote-xml. Pls have a look at the previous post again.

from corexlsx.

MaxDesiatov avatar MaxDesiatov commented on May 21, 2024

Thanks, can see it now, will have a closer look. BTW, if this string was printed from the debugger, the quotes are escaped by default, to print an unescaped string you can use something like po NSString(string: worksheetString). By using that printing command we could verify if the original XML had quotes escaped or not.

from corexlsx.

maxvol avatar maxvol commented on May 21, 2024

Aha! Here it is -

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" mc:Ignorable="x14ac" xmlns:x14ac="http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac"><dimension ref="A1:E128"/><sheetViews><sheetView tabSelected="1" workbookViewId="0"/></sheetViews><sheetFormatPr defaultRowHeight="15" x14ac:dyDescent="0.25"/><cols><col min="1" max="1" width="26.42578125" customWidth="1"/><col min="2" max="2" width="16.7109375" customWidth="1"/><col min="3" max="3" width="9.140625" style="1"/><col min="4" max="4" width="82" customWidth="1"/><col min="6" max="6" width="16" customWidth="1"/></cols><sheetData>
</sheetData><conditionalFormatting sqref="C56:C1048576 C1:C54"><cfRule type="duplicateValues" dxfId="2" priority="48"/></conditionalFormatting><conditionalFormatting sqref="C1:C1048576"><cfRule type="duplicateValues" dxfId="1" priority="1"/></conditionalFormatting><conditionalFormatting sqref="C2:C54"><cfRule type="duplicateValues" dxfId="0" priority="359"/></conditionalFormatting><pageMargins left="0.7" right="0.7" top="0.75" bottom="0.75" header="0.3" footer="0.3"/></worksheet>

P.S. The entire document printed with NSString(string: is deemed valid by XML editor. So perhaps that NSString(string: is not being called before feeding the worksheet to XMLCoder?

from corexlsx.

MaxDesiatov avatar MaxDesiatov commented on May 21, 2024

Many thanks for this @maxvol! I think I've found a source of the issue and this can be parsed now in a separate branch that I'm preparing as a PR. Would you mind if I add this XML as a unit-test to CoreXLSX source code?

from corexlsx.

maxvol avatar maxvol commented on May 21, 2024

Sure, please do!

from corexlsx.

MaxDesiatov avatar MaxDesiatov commented on May 21, 2024

Great, thanks again. I've just merged #18 to master, could you please try master then to parse your file? The test with that XML is now passing both for me locally and on CI, so unless there's some more breaking stuff in sheetData node, I hope this should work.

from corexlsx.

maxvol avatar maxvol commented on May 21, 2024

Still crashing with the same error. Why not make most attributes optional? Here's a few more lines from the same sheet -

        <row r="1" spans="1:5" s="2" customFormat="1" ht="15.75" thickBot="1" x14ac:dyDescent="0.3">
            <c r="A1" s="3" t="s">

        <row r="3" spans="1:5" x14ac:dyDescent="0.25">
            <c r="A3" t="s">

BTW is it possible to modify/save XLSX?

from corexlsx.

MaxDesiatov avatar MaxDesiatov commented on May 21, 2024

I've just added these rows and cells to the test and it passes with no modifications in CoreXLSX itself, I highly doubt it has something to do with optionality at this point. When you're saying "with the same error" do you mean this one?

Error Domain=NSXMLParserErrorDomain Code=39 "(null)" UserInfo={NSXMLParserErrorColumn=16328, NSXMLParserErrorLineNumber=2, NSXMLParserErrorMessage=AttValue: " or ' expected
}
▿ DecodingError
▿ dataCorrupted : Context
- codingPath : 0 elements
- debugDescription : "The given data was not valid XML."
▿ underlyingError : Optional
- some : Error Domain=NSXMLParserErrorDomain Code=111 "(null)"

If so, that's most probably caused by unscaped quotes somewhere in sheetData, but very hard to say without having the complete file to test. Maybe you could mangle the cell values in that file or isolate the error to only a few cells that could be shared publicly or privately to an email that I shared here before?

Currently there's no API in CoreXLSX for writing to an XLSX file, but in principle it wouldn't be very hard to add as XMLCoder supports both decoding and encoding. We'd only need to make sure to write correct model types to correct file names and create a zip archive with those file before writing a resulting output.

from corexlsx.

maxvol avatar maxvol commented on May 21, 2024

Unlikely, XML is valid when pasted into validating editor. Can it be missing 's' and 't' attributes in 'c' element? xml <c r="E2">

Error Domain=NSXMLParserErrorDomain Code=39 "(null)" UserInfo={NSXMLParserErrorColumn=16328, NSXMLParserErrorLineNumber=2, NSXMLParserErrorMessage=AttValue: " or ' expected
}

from corexlsx.

MaxDesiatov avatar MaxDesiatov commented on May 21, 2024

Just to be completely sure, I've added this cell with a single r attribute as well to the test, but it still passes.

It looks like an error is thrown from XMLParser in Foundation module, as neither CoreXLSX nor XMLCoder use NSXMLParserErrorDomain for their error reporting. This might mean that Apple's XMLParser is not able to parse this XML at all.

Looking at XMLParser docs, I'm not sure we could get more info out of the debugger directly than we currently have:

When this method is invoked, parsing is stopped. For further information about the error, you can query parseError or you can send the parser a parserError message. You can also send the parser lineNumber and columnNumber messages to further isolate where the error occurred. Typically you implement this method to display information about the error to the user.

These line and column numbers are already printed in the error. Here it looks like the line number is 2 (I assume line 1 is the XML header) and the column is 16328. I don't know an easy way to get a substring from a string by line and column within the debugger, but you could unzip the XLSX file (e.g. with unzip in command-line), open sheet1.xml and check that line/column in a text editor that's able to navigate to the column.

Another approach could to be to reformat sheet1.xml to contain one node (or even better one attribute) per line. If you use VSCode, there's an XML formatter plugin available here.
This way, you could unzip the XLSX file (e.g. with unzip in command-line), open sheet1.xml, reformat it and zip those files again into a single archive and debug again. This would probably take more time though as it requires archiving everything back and repeating the debugging cycle.

from corexlsx.

maxvol avatar maxvol commented on May 21, 2024

(attributeNotStartedError = 39)

Error Domain=NSXMLParserErrorDomain Code=39 "(null)" UserInfo={NSXMLParserErrorColumn=16328, NSXMLParserErrorLineNumber=2, NSXMLParserErrorMessage=AttValue: " or ' expected
}
["NSUnderlyingError": Error Domain=NSXMLParserErrorDomain Code=111 "(null)", "NSCodingPath": [], "NSDebugDescription": "The given data was not valid XML."]

from corexlsx.

MaxDesiatov avatar MaxDesiatov commented on May 21, 2024

Thanks, this gives the same info though, the error is coming from Apple's XMLParser due to use of NSXMLParserErrorDomain and the offending character is at column 16328 line 2. I'd appreciate if you shared at least a single XML node with attributes around that column 16328 at line 2. Thinking more about it, you wouldn't even need to decompress the XLSX as you already have access to that raw XML in the debugger as you've shared snippets from it previously. You can just paste the whole XML to vim and navigate to line 2 and column 16328 as described here or any other editor of your choice that allows navigating to a specific column (I've also found a tip for Sublime Text here).

from corexlsx.

maxvol avatar maxvol commented on May 21, 2024

So, do you think you can fix it? Because that break occurs automatically, I have no control over it.

from corexlsx.

MaxDesiatov avatar MaxDesiatov commented on May 21, 2024

I've added this snippet and a few other similar variations to the test suite with a newline and also \r\n combination between equal sign and first quote here. CoreXLSX master branch without any changes is able to pass these tests as is and these snippets are parsed without a problem, which means the cause is probably different. Maybe quotes are different or some other (hidden?) symbols were lost before sharing on GitHub?

If you're getting these files some other system or app, would it be possible to generate a file with fake data that could be shared so that I would be able to reproduce it on my side? Or maybe you could delete all cells except the offending cell, which I hope wouldn't leave any sensitive data and it could be shared? That would tremendously help in diagnosing the issue. Thanks!

from corexlsx.

maxvol avatar maxvol commented on May 21, 2024

Still the same error. Here's the snippet including the offending row -

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

<worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" mc:Ignorable="x14ac" xmlns:x14ac="http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac"><dimension ref="A1:E128"/><sheetViews><sheetView tabSelected="1" workbookViewId="0"/></sheetViews><sheetFormatPr defaultRowHeight="15" x14ac:dyDescent="0.25"/><cols><col min="1" max="1" width="26.42578125" customWidth="1"/><col min="2" max="2" width="16.7109375" customWidth="1"/><col min="3" max="3" width="9.140625" style="1"/><col min="4" max="4" width="82" customWidth="1"/><col min="6" max="6" width="16" customWidth="1"/></cols><sheetData><row r=
    "78" spans="1:5" x14ac:dyDescent="0.25"><c r="A78" t="s"><v>3</v></c><c r="B78" t="s"><v>4</v></c><c r="C78" s="1"><v>421</v></c><c r="D78" t="s"><v>56</v></c><c r="E78"><v>57</v></c></row></sheetData><conditionalFormatting sqref="C56:C1048576 C1:C54"><cfRule type="duplicateValues" dxfId="2" priority="48"/></conditionalFormatting><conditionalFormatting sqref="C1:C1048576"><cfRule type="duplicateValues" dxfId="1" priority="1"/></conditionalFormatting><conditionalFormatting sqref="C2:C54"><cfRule type="duplicateValues" dxfId="0" priority="359"/></conditionalFormatting><pageMargins left="0.7" right="0.7" top="0.75" bottom="0.75" header="0.3" footer="0.3"/></worksheet>

from corexlsx.

MaxDesiatov avatar MaxDesiatov commented on May 21, 2024

I've added the complete snippet to the test suite here, it is parsed successfully with all cells. If you unzip the file, remove all cells except the offending row and zip it back again, is the error still reproducible? Or maybe you could open the file in Microsoft Excel or Apple Numbers, delete all rows except row 78 and save a new version, would that version cause any issues in CoreXLSX? I suspect there might be something going on with encoding or compression that's lost when testing isolated XML snippets, so I hope testing a complete XLSX file end-to-end on my side would help.

from corexlsx.

nIgect avatar nIgect commented on May 21, 2024

Hi @MaxDesiatov , I have a similar problem
Error Domain=NSXMLParserErrorDomain Code=41 "(null)" UserInfo={NSXMLParserErrorColumn=16328, NSXMLParserErrorLineNumber=2, NSXMLParserErrorMessage=Specification mandates value for attribute s

from corexlsx.

MaxDesiatov avatar MaxDesiatov commented on May 21, 2024

Hi @nIgect, could you please share the file that causes this issue? This would allow me to reproduce the problem on my side, find the cause and fix it. Thank you!

from corexlsx.

MaxDesiatov avatar MaxDesiatov commented on May 21, 2024

Hi folks, sorry for the delay. I hope you could try the latest 0.4.0 version and pass a sufficiently large value (10, maybe 20 for example) as errorContextLength argument to XLSXFile initializer. This would give more debug info and provide a snippet of XML that failed to parse, which would help us to debug the issue. Thanks!

from corexlsx.

MaxDesiatov avatar MaxDesiatov commented on May 21, 2024

Hi everyone, if you've previously had issues, please try the latest 0.4.0 version or if those persist – the master branch and post the details here or create new issues. Otherwise if no updates are posted here I'll have to close this issue as outdated.

Thanks for understanding!

from corexlsx.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.