Checklist <input type

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Yes. :) So what I expect from <tag><![CDATA[&am

But what I'm getting from current parser is <p dir="aut

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

Decode HTML entities in text values,about naturalintelligence/fast-xml-parser

Comments (12)

amitguptagwl commented on July 18, 2024 1

v3 is live to handle this issue

from fast-xml-parser.

amitguptagwl commented on July 18, 2024

Published

from fast-xml-parser.

amitguptagwl commented on July 18, 2024

I believe it should not parse HTML entities for CDATA.

from fast-xml-parser.

Delagen commented on July 18, 2024

@amitguptagwl CDATA also can contain HTML entities https://en.wikipedia.org/wiki/CDATA
It just indicate that it's text value of node, not markup (children)

from fast-xml-parser.

amitguptagwl commented on July 18, 2024

Yes. :)
So what I expect from <tag><![CDATA[<sender>John Smith</sender>]]></tag> is

{
   "tag" : "&lt;sender&gt;John Smith&lt;/sender&gt;"
}

But what I'm getting from current parser is

{
   "tag" : "<sender>John Smith</sender>"
}

Expectation 2: <tag><![CDATA[<sender>John Smith</sender>]]></tag> is

{
   "tag" : "<sender>John Smith</sender>"
}

Expectation 3: <tag><sender>John Smith</sender></tag> which is equivalent of <tag><![CDATA[<sender>John Smith</sender>]]></tag> is

{
   "tag" : "<sender>John Smith</sender>"
}

from fast-xml-parser.

Delagen commented on July 18, 2024

But what I'm getting from current parser is

And it's right

I placed holywar at work )
Specs only specified

left angle brackets and ampersands may occur in their literal form; they need not (and cannot) be escaped using " < " and " & "

but no any word about other entities, and even not specified correctly NEED NOT & CAN NOT is not the same as MUST NOT

For example CDATA can contain HTML parts

<tag><![CDATA[<sender>&amp;John Smith&copy;</sender>]]></tag>

For consistensy You can disable parse CDATA with this or make it optional. But it need more refactor due in code as I know you join it together

from fast-xml-parser.

amitguptagwl commented on July 18, 2024

As per the wiki,

Character data is character data, regardless of whether it is expressed via a CDATA section or ordinary markup. CDATA sections are useful for writing XML code as text data within an XML document.

if the numeric character reference ð appears in element content, it will be interpreted as the single Unicode character 00F0 (small letter eth). But if the same appears in a CDATA section, it will be parsed as six characters: ampersand, hash mark, digit 2, digit 4, digit 0, semicolon.

Above phrase in my words (understanding),

if ð (represents to ð) appears in CDATA section should not be parsed to ð but to ð only.

from fast-xml-parser.

Delagen commented on July 18, 2024

@amitguptagwl I will make some refactoring, and place PR tomorrow

from fast-xml-parser.

amitguptagwl commented on July 18, 2024

Thanks @Delagen . In case if you are busy with some other prior work, we can just roll back the changes for the time being. And will handle them later. I'm anyhow planning to rewrite parser but it'll take at least a month 😸

from fast-xml-parser.

Delagen commented on July 18, 2024

Thanks @amitguptagwl for work, I placed PR to not decode CDATA
But seems parser ignore sibling text when CDATA present

<a>
<![CDATA[asdf]]>a
</a>

result in:

{
"a":{
"#text":"asdf"
}
}

I know it rare case, but I am sure it must be asdfa
xm2js parse it correctly.

Example:

<a>a
<![CDATA[asdf&amp;]]>&amp;<![CDATA[asdf]]>
a</a>

{ "a": "a\nasdf&amp;&asdf\na" }

from fast-xml-parser.

amitguptagwl commented on July 18, 2024

Sorry I'm holding this PR as I'm rewriting the parser So it'll become to handle this situation, large files, reduce time for separate validation etc. I'm 60% complete with the change. Will update you with the progress.

from fast-xml-parser.

Delagen commented on July 18, 2024

Thanks for your project. I have a pleasure to contribute if project owner interested in making it code better. )

from fast-xml-parser.

Decode HTML entities in text values about fast-xml-parser HOT 12 CLOSED

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent