Coder Social home page Coder Social logo

Comments (12)

amitguptagwl avatar amitguptagwl commented on July 18, 2024 1

v3 is live to handle this issue

from fast-xml-parser.

amitguptagwl avatar amitguptagwl commented on July 18, 2024

Published

from fast-xml-parser.

amitguptagwl avatar amitguptagwl commented on July 18, 2024

I believe it should not parse HTML entities for CDATA.

from fast-xml-parser.

Delagen avatar Delagen commented on July 18, 2024

@amitguptagwl CDATA also can contain HTML entities https://en.wikipedia.org/wiki/CDATA
It just indicate that it's text value of node, not markup (children)

from fast-xml-parser.

amitguptagwl avatar amitguptagwl commented on July 18, 2024

Yes. :)
So what I expect from <tag><![CDATA[&lt;sender&gt;John Smith&lt;/sender&gt;]]></tag> is

{
   "tag" : "&lt;sender&gt;John Smith&lt;/sender&gt;"
}

But what I'm getting from current parser is

{
   "tag" : "<sender>John Smith</sender>"
}

Expectation 2: <tag><![CDATA[<sender>John Smith</sender>]]></tag> is

{
   "tag" : "<sender>John Smith</sender>"
}

Expectation 3: <tag>&lt;sender&gt;John Smith&lt;/sender&gt;</tag> which is equivalent of <tag><![CDATA[<sender>John Smith</sender>]]></tag> is

{
   "tag" : "<sender>John Smith</sender>"
}

from fast-xml-parser.

Delagen avatar Delagen commented on July 18, 2024

But what I'm getting from current parser is

And it's right

I placed holywar at work )
Specs only specified

left angle brackets and ampersands may occur in their literal form; they need not (and cannot) be escaped using " < " and " & "

but no any word about other entities, and even not specified correctly NEED NOT & CAN NOT is not the same as MUST NOT

For example CDATA can contain HTML parts

<tag><![CDATA[<sender>&amp;John Smith&copy;</sender>]]></tag>

For consistensy You can disable parse CDATA with this or make it optional. But it need more refactor due in code as I know you join it together

from fast-xml-parser.

amitguptagwl avatar amitguptagwl commented on July 18, 2024

As per the wiki,

Character data is character data, regardless of whether it is expressed via a CDATA section or ordinary markup. CDATA sections are useful for writing XML code as text data within an XML document.

if the numeric character reference &#240; appears in element content, it will be interpreted as the single Unicode character 00F0 (small letter eth). But if the same appears in a CDATA section, it will be parsed as six characters: ampersand, hash mark, digit 2, digit 4, digit 0, semicolon.

Above phrase in my words (understanding),

if &#240; (represents to ð) appears in CDATA section should not be parsed to ð but to &#240; only.

from fast-xml-parser.

Delagen avatar Delagen commented on July 18, 2024

@amitguptagwl I will make some refactoring, and place PR tomorrow

from fast-xml-parser.

amitguptagwl avatar amitguptagwl commented on July 18, 2024

Thanks @Delagen . In case if you are busy with some other prior work, we can just roll back the changes for the time being. And will handle them later. I'm anyhow planning to rewrite parser but it'll take at least a month 😸

from fast-xml-parser.

Delagen avatar Delagen commented on July 18, 2024

Thanks @amitguptagwl for work, I placed PR to not decode CDATA
But seems parser ignore sibling text when CDATA present

<a>
<![CDATA[asdf]]>a
</a>

result in:

{
"a":{
"#text":"asdf"
}
}

I know it rare case, but I am sure it must be asdfa
xm2js parse it correctly.

Example:

<a>a
<![CDATA[asdf&amp;]]>&amp;<![CDATA[asdf]]>
a</a>
{ "a": "a\nasdf&amp;&asdf\na" }

from fast-xml-parser.

amitguptagwl avatar amitguptagwl commented on July 18, 2024

Sorry I'm holding this PR as I'm rewriting the parser So it'll become to handle this situation, large files, reduce time for separate validation etc. I'm 60% complete with the change. Will update you with the progress.

from fast-xml-parser.

Delagen avatar Delagen commented on July 18, 2024

Thanks for your project. I have a pleasure to contribute if project owner interested in making it code better. )

from fast-xml-parser.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.