Comments (27)
Aha, the reason is whitespace characters (newline, spaces) between equal sign and first quote:
</row><row r=
"78" spans="1:5" x14ac:dyDescent="0.25">
from corexlsx.
Hi @maxvol, many thanks for reporting this. I hope that issues with sheet3.xml
and sheet2.xml
are fixed in #14. As for sheet1.xml
, could you please share that sheet1.xml
file or .xlsx archive itself? You can also share it privately to this email.
from corexlsx.
Regarding sheet1.xml
- my guess it has something to do with encoding. I can unpack it and get proper XML string from data with .utf8
encoding. Not sure which encoding XMLCoder
is using internally.
from corexlsx.
Error Domain=NSXMLParserErrorDomain Code=39 "(null)" UserInfo={NSXMLParserErrorColumn=16328, NSXMLParserErrorLineNumber=2, NSXMLParserErrorMessage=AttValue: " or ' expected
}
Apparently because quotes are escaped like: ="
<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>\r\n<worksheet xmlns=\"http://schemas.openxmlformats.org/spreadsheetml/2006/main\" xmlns:r=\"http://schemas.openxmlformats.org/officeDocument/2006/relationships\" xmlns:mc=\"http://schemas.openxmlformats.org/markup-compatibility/2006\" mc:Ignorable=\"x14ac\" xmlns:x14ac=\"http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac\"><dimension ref=\"A1:E128\"/><sheetViews><sheetView tabSelected=\"1\" workbookViewId=\"0\"/></sheetViews><sheetFormatPr defaultRowHeight=\"15\" x14ac:dyDescent=\"0.25\"/><cols><col min=\"1\" max=\"1\" width=\"26.42578125\" customWidth=\"1\"/><col min=\"2\" max=\"2\" width=\"16.7109375\" customWidth=\"1\"/><col min=\"3\" max=\"3\" width=\"9.140625\" style=\"1\"/><col min=\"4\" max=\"4\" width=\"82\" customWidth=\"1\"/><col min=\"6\" max=\"6\" width=\"16\" customWidth=\"1\"/></cols><sheetData> ...```
from corexlsx.
XMLCoder
internally uses XMLParser from Foundation
, which I guess picks up encoding from a standard XML header like this <?xml version="1.0" encoding="UTF-8"?>
. Quotes could be an issue, but really interesting what could generate these quotes. Could you please share the source XML? I'm not sure I'd be able to investigate without having the actual XML at hand to investigate and test.
from corexlsx.
Here it is (without rows within sheetData, due to sensitive data) -
<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>\r\n<worksheet xmlns=\"http://schemas.openxmlformats.org/spreadsheetml/2006/main\" xmlns:r=\"http://schemas.openxmlformats.org/officeDocument/2006/relationships\" xmlns:mc=\"http://schemas.openxmlformats.org/markup-compatibility/2006\" mc:Ignorable=\"x14ac\" xmlns:x14ac=\"http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac\"><dimension ref=\"A1:E128\"/><sheetViews><sheetView tabSelected=\"1\" workbookViewId=\"0\"/></sheetViews><sheetFormatPr defaultRowHeight=\"15\" x14ac:dyDescent=\"0.25\"/><cols><col min=\"1\" max=\"1\" width=\"26.42578125\" customWidth=\"1\"/><col min=\"2\" max=\"2\" width=\"16.7109375\" customWidth=\"1\"/><col min=\"3\" max=\"3\" width=\"9.140625\" style=\"1\"/><col min=\"4\" max=\"4\" width=\"82\" customWidth=\"1\"/><col min=\"6\" max=\"6\" width=\"16\" customWidth=\"1\"/></cols><sheetData></sheetData><conditionalFormatting sqref=\"C56:C1048576 C1:C54\"><cfRule type=\"duplicateValues\" dxfId=\"2\" priority=\"48\"/></conditionalFormatting><conditionalFormatting sqref=\"C1:C1048576\"><cfRule type=\"duplicateValues\" dxfId=\"1\" priority=\"1\"/></conditionalFormatting><conditionalFormatting sqref=\"C2:C54\"><cfRule type=\"duplicateValues\" dxfId=\"0\" priority=\"359\"/></conditionalFormatting><pageMargins left=\"0.7\" right=\"0.7\" top=\"0.75\" bottom=\"0.75\" header=\"0.3\" footer=\"0.3\"/></worksheet>
from corexlsx.
Sorry, I'm a bit confused, is that the whole XML without sheetData
, just two symbols \r\n
?
from corexlsx.
Huh? Apparently you cannot see it for some reason, I will check formatting in my previous post.
P.S. Done! I forgot to add triple-quote-xml. Pls have a look at the previous post again.
from corexlsx.
Thanks, can see it now, will have a closer look. BTW, if this string was printed from the debugger, the quotes are escaped by default, to print an unescaped string you can use something like po NSString(string: worksheetString)
. By using that printing command we could verify if the original XML had quotes escaped or not.
from corexlsx.
Aha! Here it is -
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" mc:Ignorable="x14ac" xmlns:x14ac="http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac"><dimension ref="A1:E128"/><sheetViews><sheetView tabSelected="1" workbookViewId="0"/></sheetViews><sheetFormatPr defaultRowHeight="15" x14ac:dyDescent="0.25"/><cols><col min="1" max="1" width="26.42578125" customWidth="1"/><col min="2" max="2" width="16.7109375" customWidth="1"/><col min="3" max="3" width="9.140625" style="1"/><col min="4" max="4" width="82" customWidth="1"/><col min="6" max="6" width="16" customWidth="1"/></cols><sheetData>
</sheetData><conditionalFormatting sqref="C56:C1048576 C1:C54"><cfRule type="duplicateValues" dxfId="2" priority="48"/></conditionalFormatting><conditionalFormatting sqref="C1:C1048576"><cfRule type="duplicateValues" dxfId="1" priority="1"/></conditionalFormatting><conditionalFormatting sqref="C2:C54"><cfRule type="duplicateValues" dxfId="0" priority="359"/></conditionalFormatting><pageMargins left="0.7" right="0.7" top="0.75" bottom="0.75" header="0.3" footer="0.3"/></worksheet>
P.S. The entire document printed with NSString(string:
is deemed valid by XML editor. So perhaps that NSString(string:
is not being called before feeding the worksheet to XMLCoder?
from corexlsx.
Many thanks for this @maxvol! I think I've found a source of the issue and this can be parsed now in a separate branch that I'm preparing as a PR. Would you mind if I add this XML as a unit-test to CoreXLSX
source code?
from corexlsx.
Sure, please do!
from corexlsx.
Great, thanks again. I've just merged #18 to master
, could you please try master
then to parse your file? The test with that XML is now passing both for me locally and on CI, so unless there's some more breaking stuff in sheetData
node, I hope this should work.
from corexlsx.
Still crashing with the same error. Why not make most attributes optional? Here's a few more lines from the same sheet -
<row r="1" spans="1:5" s="2" customFormat="1" ht="15.75" thickBot="1" x14ac:dyDescent="0.3">
<c r="A1" s="3" t="s">
<row r="3" spans="1:5" x14ac:dyDescent="0.25">
<c r="A3" t="s">
BTW is it possible to modify/save XLSX?
from corexlsx.
I've just added these rows and cells to the test and it passes with no modifications in CoreXLSX itself, I highly doubt it has something to do with optionality at this point. When you're saying "with the same error" do you mean this one?
Error Domain=NSXMLParserErrorDomain Code=39 "(null)" UserInfo={NSXMLParserErrorColumn=16328, NSXMLParserErrorLineNumber=2, NSXMLParserErrorMessage=AttValue: " or ' expected
}
▿ DecodingError
▿ dataCorrupted : Context
- codingPath : 0 elements
- debugDescription : "The given data was not valid XML."
▿ underlyingError : Optional
- some : Error Domain=NSXMLParserErrorDomain Code=111 "(null)"
If so, that's most probably caused by unscaped quotes somewhere in sheetData
, but very hard to say without having the complete file to test. Maybe you could mangle the cell values in that file or isolate the error to only a few cells that could be shared publicly or privately to an email that I shared here before?
Currently there's no API in CoreXLSX for writing to an XLSX file, but in principle it wouldn't be very hard to add as XMLCoder
supports both decoding and encoding. We'd only need to make sure to write correct model types to correct file names and create a zip archive with those file before writing a resulting output.
from corexlsx.
Unlikely, XML is valid when pasted into validating editor. Can it be missing 's' and 't' attributes in 'c' element? xml <c r="E2">
Error Domain=NSXMLParserErrorDomain Code=39 "(null)" UserInfo={NSXMLParserErrorColumn=16328, NSXMLParserErrorLineNumber=2, NSXMLParserErrorMessage=AttValue: " or ' expected
}
from corexlsx.
Just to be completely sure, I've added this cell with a single r
attribute as well to the test, but it still passes.
It looks like an error is thrown from XMLParser
in Foundation
module, as neither CoreXLSX
nor XMLCoder
use NSXMLParserErrorDomain
for their error reporting. This might mean that Apple's XMLParser
is not able to parse this XML at all.
Looking at XMLParser
docs, I'm not sure we could get more info out of the debugger directly than we currently have:
When this method is invoked, parsing is stopped. For further information about the error, you can query
parseError
or you can send the parser aparserError
message. You can also send the parserlineNumber
andcolumnNumber
messages to further isolate where the error occurred. Typically you implement this method to display information about the error to the user.
These line and column numbers are already printed in the error. Here it looks like the line number is 2 (I assume line 1 is the XML header) and the column is 16328. I don't know an easy way to get a substring from a string by line and column within the debugger, but you could unzip the XLSX file (e.g. with unzip
in command-line), open sheet1.xml
and check that line/column in a text editor that's able to navigate to the column.
Another approach could to be to reformat sheet1.xml
to contain one node (or even better one attribute) per line. If you use VSCode, there's an XML formatter plugin available here.
This way, you could unzip the XLSX file (e.g. with unzip
in command-line), open sheet1.xml
, reformat it and zip those files again into a single archive and debug again. This would probably take more time though as it requires archiving everything back and repeating the debugging cycle.
from corexlsx.
(attributeNotStartedError = 39)
Error Domain=NSXMLParserErrorDomain Code=39 "(null)" UserInfo={NSXMLParserErrorColumn=16328, NSXMLParserErrorLineNumber=2, NSXMLParserErrorMessage=AttValue: " or ' expected
}
["NSUnderlyingError": Error Domain=NSXMLParserErrorDomain Code=111 "(null)", "NSCodingPath": [], "NSDebugDescription": "The given data was not valid XML."]
from corexlsx.
Thanks, this gives the same info though, the error is coming from Apple's XMLParser
due to use of NSXMLParserErrorDomain
and the offending character is at column 16328 line 2. I'd appreciate if you shared at least a single XML node with attributes around that column 16328 at line 2. Thinking more about it, you wouldn't even need to decompress the XLSX as you already have access to that raw XML in the debugger as you've shared snippets from it previously. You can just paste the whole XML to vim
and navigate to line 2 and column 16328 as described here or any other editor of your choice that allows navigating to a specific column (I've also found a tip for Sublime Text here).
from corexlsx.
So, do you think you can fix it? Because that break occurs automatically, I have no control over it.
from corexlsx.
I've added this snippet and a few other similar variations to the test suite with a newline and also \r\n
combination between equal sign and first quote here. CoreXLSX master
branch without any changes is able to pass these tests as is and these snippets are parsed without a problem, which means the cause is probably different. Maybe quotes are different or some other (hidden?) symbols were lost before sharing on GitHub?
If you're getting these files some other system or app, would it be possible to generate a file with fake data that could be shared so that I would be able to reproduce it on my side? Or maybe you could delete all cells except the offending cell, which I hope wouldn't leave any sensitive data and it could be shared? That would tremendously help in diagnosing the issue. Thanks!
from corexlsx.
Still the same error. Here's the snippet including the offending row -
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" mc:Ignorable="x14ac" xmlns:x14ac="http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac"><dimension ref="A1:E128"/><sheetViews><sheetView tabSelected="1" workbookViewId="0"/></sheetViews><sheetFormatPr defaultRowHeight="15" x14ac:dyDescent="0.25"/><cols><col min="1" max="1" width="26.42578125" customWidth="1"/><col min="2" max="2" width="16.7109375" customWidth="1"/><col min="3" max="3" width="9.140625" style="1"/><col min="4" max="4" width="82" customWidth="1"/><col min="6" max="6" width="16" customWidth="1"/></cols><sheetData><row r=
"78" spans="1:5" x14ac:dyDescent="0.25"><c r="A78" t="s"><v>3</v></c><c r="B78" t="s"><v>4</v></c><c r="C78" s="1"><v>421</v></c><c r="D78" t="s"><v>56</v></c><c r="E78"><v>57</v></c></row></sheetData><conditionalFormatting sqref="C56:C1048576 C1:C54"><cfRule type="duplicateValues" dxfId="2" priority="48"/></conditionalFormatting><conditionalFormatting sqref="C1:C1048576"><cfRule type="duplicateValues" dxfId="1" priority="1"/></conditionalFormatting><conditionalFormatting sqref="C2:C54"><cfRule type="duplicateValues" dxfId="0" priority="359"/></conditionalFormatting><pageMargins left="0.7" right="0.7" top="0.75" bottom="0.75" header="0.3" footer="0.3"/></worksheet>
from corexlsx.
I've added the complete snippet to the test suite here, it is parsed successfully with all cells. If you unzip the file, remove all cells except the offending row and zip it back again, is the error still reproducible? Or maybe you could open the file in Microsoft Excel or Apple Numbers, delete all rows except row 78 and save a new version, would that version cause any issues in CoreXLSX? I suspect there might be something going on with encoding or compression that's lost when testing isolated XML snippets, so I hope testing a complete XLSX file end-to-end on my side would help.
from corexlsx.
Hi @MaxDesiatov , I have a similar problem
Error Domain=NSXMLParserErrorDomain Code=41 "(null)" UserInfo={NSXMLParserErrorColumn=16328, NSXMLParserErrorLineNumber=2, NSXMLParserErrorMessage=Specification mandates value for attribute s
from corexlsx.
Hi @nIgect, could you please share the file that causes this issue? This would allow me to reproduce the problem on my side, find the cause and fix it. Thank you!
from corexlsx.
Hi folks, sorry for the delay. I hope you could try the latest 0.4.0 version and pass a sufficiently large value (10, maybe 20 for example) as errorContextLength
argument to XLSXFile
initializer. This would give more debug info and provide a snippet of XML that failed to parse, which would help us to debug the issue. Thanks!
from corexlsx.
Hi everyone, if you've previously had issues, please try the latest 0.4.0 version or if those persist – the master
branch and post the details here or create new issues. Otherwise if no updates are posted here I'll have to close this issue as outdated.
Thanks for understanding!
from corexlsx.
Related Issues (20)
- Unable to sort columns by intValue HOT 2
- Cannot read xlsx with inline string.
- Cannot initialize SchemaType from invalid String value http://customschemas.google.com/relationships/workbookmetadata HOT 2
- Cannot initialize SchemaType from invalid String value HOT 7
- Next Release? HOT 2
- Xcode 12.5 Compile Error HOT 1
- cell type is null HOT 1
- Get a dropdown list HOT 1
- Invalid default buffer size HOT 1
- Unable to parse excel with following code HOT 5
- XLSXFile fails to open .xlsx file located on iPad ("XLSX file at \(file_path) is corrupted or does not exist") HOT 5
- XLSX File is not Read from documentPicker HOT 8
- Support macOS APP development? HOT 2
- Distinguish date and number HOT 4
- File can not parsed HOT 2
- Excel Parser Truncates Leading 0's and Maybe Trailing 0's HOT 2
- Cell Formulas
- Not working when using FILTER function HOT 3
- Get named Area of a sheet
- SharedStrings doesn't include/process the XML tags <b/> <I/> <u/> and <strike/> contained in rich text elements <rPr>. HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from corexlsx.