Coder Social home page Coder Social logo

Comments (7)

gilbsgilbs avatar gilbsgilbs commented on June 14, 2024

Even worse, it stops minifying after the > token. Meaning it definitely erroneously considers the first > as the end of tag.

$ cat test.html
<tag attr="var > 1" foo bar >
$ cat test.min.html
<tag attr="var> 1" foo bar >% 

from htmlclean.

anseki avatar anseki commented on June 14, 2024

Hi @gilbsgilbs, thank you for the comment.

That is correct behavior for htmlclean because that HTML code is definitely wrong. htmlclean supposes valid HTML code.
Also, some language parsers (web browsers) may mistake by wrong HTML code.
That is, you should replace that > with &gt; regardless of htmlclean.

See: https://github.com/anseki/htmlclean#note

htmlclean is not validator, htmlclean does not check that the code is valid.
htmlclean should be simple, light weight and small.

from htmlclean.

gilbsgilbs avatar gilbsgilbs commented on June 14, 2024

@anseki Thanks for your answer. You are right. It's not valid HTML, I totally understand that. However, it's very common to write this kind of things in Angular 1 and pretty much all browsers doesn't consider this as a closing token. Can't imagine having to write ng-if statements with a &gt; b, I'm not even sure it would work.

That being said, I reckon htmlclean should at least raise a warning (because minification obviously failed in this case) or be a bit more permissive, and it should avoid altering attribute values at all price anyways. If not, the "safe" keyword should definitely be removed from the readme; it's just as unsafe as htmlmin.

from htmlclean.

anseki avatar anseki commented on June 14, 2024

Thank you for your proposal.
That "safe" means that htmlclean never changes structure of document.
See: https://github.com/anseki/htmlclean#note

htmlclean supposes valid HTML code, and htmlclean doesn't understand HTML at all.
To check the HTML code is valid or not, HTML parser is required. I think that htmlclean should not do that because others already do that.
See: https://github.com/anseki/htmlclean#see-also

You can use protect or unprotect option to control the changing code.

https://github.com/anseki/htmlclean#protect
https://github.com/anseki/htmlclean#unprotect

from htmlclean.

gilbsgilbs avatar gilbsgilbs commented on June 14, 2024

Thanks. After a quick check in HTML5 spec it appears that we were both wrong: this is not invalid HTML.

8.2.4.38 Attribute value (double-quoted) state

Consume the next input character:

U+0022 QUOTATION MARK (")
    Switch to the after attribute value (quoted) state.
U+0026 AMPERSAND (&)
    Switch to the character reference in attribute value state, with the additional allowed character being U+0022 QUOTATION MARK (").
U+0000 NULL
    Parse error. Append a U+FFFD REPLACEMENT CHARACTER character to the current attribute's value.
EOF
    Parse error. Switch to the data state. Reconsume the EOF character.
Anything else
    Append the current input character to the current attribute's value. 

https://www.w3.org/TR/html5/single-page.html#attribute-value-(double-quoted)-state

Meaning that the only characters that have a special meaning for a HTML attribute value are:

  • U+0022 QUOTATION MARK (")
  • U+0026 AMPERSAND (&) (only if it is ambiguous I guess)
  • U+0000 NULL => Parse error
  • EOF => Parse error

Anything else is valid and considered as the attribute value.

from htmlclean.

anseki avatar anseki commented on June 14, 2024

Thank you for the important information.
I will update htmlclean to support that spec in future version.
Anyhow, we had better escape those characters for HTML parsers.

from htmlclean.

anseki avatar anseki commented on June 14, 2024

I updated the code.
Please try new version.

E.g.

INPUT:

  A  B  C  <element    attr1  =  "  value   'value'  <   >   &lt;  &gt;  &quot; "   attr2 = '  value   "value"  <   >   &lt;  &gt;  &quot; '    attrNoValue   >   D  E  F

OUTPUT:

A B C<element attr1="  value   'value'  <   >   &lt;  &gt;  &quot; " attr2='  value   "value"  <   >   &lt;  &gt;  &quot; ' attrNoValue> D E F

from htmlclean.

Related Issues (11)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.