Comments (12)
v3 is live to handle this issue
from fast-xml-parser.
Published
from fast-xml-parser.
I believe it should not parse HTML entities for CDATA.
from fast-xml-parser.
@amitguptagwl CDATA also can contain HTML entities https://en.wikipedia.org/wiki/CDATA
It just indicate that it's text value of node, not markup (children)
from fast-xml-parser.
Yes. :)
So what I expect from <tag><![CDATA[<sender>John Smith</sender>]]></tag>
is
{
"tag" : "<sender>John Smith</sender>"
}
But what I'm getting from current parser is
{
"tag" : "<sender>John Smith</sender>"
}
Expectation 2: <tag><![CDATA[<sender>John Smith</sender>]]></tag>
is
{
"tag" : "<sender>John Smith</sender>"
}
Expectation 3: <tag><sender>John Smith</sender></tag>
which is equivalent of <tag><![CDATA[<sender>John Smith</sender>]]></tag>
is
{
"tag" : "<sender>John Smith</sender>"
}
from fast-xml-parser.
But what I'm getting from current parser is
And it's right
I placed holywar at work )
Specs only specified
left angle brackets and ampersands may occur in their literal form; they need not (and cannot) be escaped using " < " and " & "
but no any word about other entities, and even not specified correctly NEED NOT & CAN NOT is not the same as MUST NOT
For example CDATA can contain HTML parts
<tag><![CDATA[<sender>&John Smith©</sender>]]></tag>
For consistensy You can disable parse CDATA with this or make it optional. But it need more refactor due in code as I know you join it together
from fast-xml-parser.
As per the wiki,
Character data is character data, regardless of whether it is expressed via a CDATA section or ordinary markup. CDATA sections are useful for writing XML code as text data within an XML document.
if the numeric character reference
ð
appears in element content, it will be interpreted as the single Unicode character 00F0 (small letter eth). But if the same appears in a CDATA section, it will be parsed as six characters: ampersand, hash mark, digit 2, digit 4, digit 0, semicolon.
Above phrase in my words (understanding),
if
ð
(represents to ð) appears in CDATA section should not be parsed to ð but toð
only.
from fast-xml-parser.
@amitguptagwl I will make some refactoring, and place PR tomorrow
from fast-xml-parser.
Thanks @Delagen . In case if you are busy with some other prior work, we can just roll back the changes for the time being. And will handle them later. I'm anyhow planning to rewrite parser but it'll take at least a month 😸
from fast-xml-parser.
Thanks @amitguptagwl for work, I placed PR to not decode CDATA
But seems parser ignore sibling text when CDATA present
<a>
<![CDATA[asdf]]>a
</a>
result in:
{
"a":{
"#text":"asdf"
}
}
I know it rare case, but I am sure it must be asdfa
xm2js
parse it correctly.
Example:
<a>a
<![CDATA[asdf&]]>&<![CDATA[asdf]]>
a</a>
{ "a": "a\nasdf&&asdf\na" }
from fast-xml-parser.
Sorry I'm holding this PR as I'm rewriting the parser So it'll become to handle this situation, large files, reduce time for separate validation etc. I'm 60% complete with the change. Will update you with the progress.
from fast-xml-parser.
Thanks for your project. I have a pleasure to contribute if project owner interested in making it code better. )
from fast-xml-parser.
Related Issues (20)
- Regex Injection Via Doctype Entities HOT 4
- unreliable for parsing html HOT 2
- Introduce isObject option for unpaired tags HOT 2
- Parsing of Empty Tags HOT 4
- Entities are being processed in CDATA sections HOT 1
- Attribute in oneListGroup is not parsed correctly HOT 1
- All special characters are not correctly parsed HOT 4
- Ability to read & write DOCTYPE information HOT 2
- parser malfunctions when textNodeName is the same as a tag name HOT 8
- Node attributes get lost on XML parsing HOT 1
- Decimal point is not preserved after parsed to json. HOT 6
- Difficulty treating some numeric tags as string HOT 2
- XML parse looks good on Online tool, but when using API crashes HOT 2
- Any tag which has a format like <!D%s> where %s - any sequence of characters is perceived as <!DOCTYPE> and causes error HOT 2
- Not getting attribute value using XMLBuilder.build function HOT 7
- Object doesn't support property or method 'trimStart' HOT 3
- Parser erroneously adds `"@_/": true` attr for self-closing stop nodes HOT 2
- Missing `entities` option in XMLBuilder HOT 2
- XMLBuilder Incorrectly Serializes String "true" as Boolean Attribute HOT 3
- XMLParser - `tagValueProcessor` called only on leaf nodes HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fast-xml-parser.