Coder Social home page Coder Social logo

highlighter's People

Contributors

dependabot[bot] avatar domenic avatar ms2ger avatar sideshowbarker avatar tabatkins avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

highlighter's Issues

Server doesn’t return 400 for invalid Web IDL

As far as I can tell, server.py doesn’t actually return a 400 for invalid Web IDL.

When I test with invalid Web IDL, I see IDL SYNTAX ERROR messages reported up from the widlparser code — as expected — but in spite of those syntax errors, server.py still returns a 200 instead of the expected 400.


I don’t understand the widlparser code well, but as far as I can see, it isn’t causing any exception to be raised when it finds a syntax error. And since the server.py code only returns a 400 if it has caught an exception, it just reports a 200 even if the widlparser found a syntax error.

I would try to submit a patch that does something other than trying to catch an exception — but looking at the widlparser source, I don’t understand what mechanism it actually exposes for consuming code to otherwise programatically check if there have been any syntax errors.

Loses some escapes in pre blocks?

See whatwg/html#4661.

When I pass the input

<pre>
&lt;iframe sandbox srcdoc="&lt;p>Yeah, you can see it &lt;a href=&amp;quot;/gallery?mode=cover&amp;amp;amp;page=1&amp;quot;>in my gallery&lt;/a>.">&lt;/iframe>
</pre>

through Wattsi alone, I get the expected output, i.e. that same HTML, which renders as shown in http://software.hixie.ch/utilities/js/live-dom-viewer/?saved=6959.

When I pass it through Wattsi + highlighter, I get

<c- p="">&lt;</c-><c- f="">iframe</c-> <c- e="">sandbox</c-> <c- e="">srcdoc</c-><c- o="">=</c-><c- s="">"&lt;p&gt;Yeah, you can see it &lt;a href="</c->/<c- e="">gallery</c->?<c- e="">mode</c-><c- o="">=</c-><c- s="">cover&amp;amp;amp;page=1"</c-><c- p="">&gt;</c->in my gallery<c- p="">&lt;/</c-><c- f="">a</c-><c- p="">&gt;</c->."&gt;<c- p="">&lt;/</c-><c- f="">iframe</c-><c- p="">&gt;</c->

which notably has a href=" instead of a href=&amp;quot;.

/cc @sideshowbarker in case this is a problem with Wattsi.

HTML serializer incorrectly emits <br></br> instead of just <br>

Given JSON input with a br element, and with --output html, the serializer emits <br></br>:

$ echo '["pre",{},"&lt;pre>",["br",{}],"Hello&lt;/pre>"]' \
  | ./highlighter/__init__.py --output html --just html html

<pre><c- ni>&amp;lt;</c->pre&gt;<br></br>Hello<c- ni>&amp;lt;</c->/pre&gt;</pre>

It should instead emit only <br>. The </br> end tag is a document-conformance error:

https://html.spec.whatwg.org/multipage/syntax.html#elements-2:void-elements-2

Void elements only have a start tag; end tags must not be specified for void elements.

Failures for all CSS examples from HTML spec

In the HTML spec, there are 29 examples of CSS stylesheet fragments, and processing of any of them in the highlighter fails with the same stack trace and error message:

AssertionError: token type must be simple type or callable, not Token.Comment

Example input:

$ echo '["pre",{"class": "css"},"img { width: 300px; height: 300px }\n@media (min-width: 32em) { img { width: 500px; height:300px } }\n@media (min-width: 45em) { img { width: 700px; height:400px } }"]' \
  | ./highlighter/__init__.py css

…and resulting stack trace and error message:

Traceback (most recent call last):
  File "./highlighter/__init__.py", line 47, in <module>
    cli()
  File "./highlighter/__init__.py", line 35, in cli
    html,css = highlight(input, **options)
  File "/Users/mike/workspace/html-build/highlighter/highlighter/highlight.py", line 26, in highlight
    html = highlightEl(html, lang)
  File "/Users/mike/workspace/html-build/highlighter/highlighter/highlight.py", line 77, in highlightEl
    coloredText = highlightWithPygments(text, lang)
  File "/Users/mike/workspace/html-build/highlighter/highlighter/highlight.py", line 162, in highlightWithPygments
    lexer = lexerFromLang(lang)
  File "/Users/mike/workspace/html-build/highlighter/highlighter/highlight.py", line 334, in lexerFromLang
    return customLexers[lang]()
  File "/Users/mike/workspace/html-build/highlighter/highlighter/highlight.py", line 13, in loadCSSLexer
    return CSSLexer()
  File "/Users/mike/workspace/html-build/highlighter/highlighter/pygments/pygments/lexer.py", line 580, in __call__
    cls._tokens = cls.process_tokendef('', cls.get_tokendefs())
  File "/Users/mike/workspace/html-build/highlighter/highlighter/pygments/pygments/lexer.py", line 519, in process_tokendef
    cls._process_state(tokendefs, processed, state)
  File "/Users/mike/workspace/html-build/highlighter/highlighter/pygments/pygments/lexer.py", line 503, in _process_state
    token = cls._process_token(tdef[1])
  File "/Users/mike/workspace/html-build/highlighter/highlighter/pygments/pygments/lexer.py", line 432, in _process_token
    'token type must be simple type or callable, not %r' % (token,)
AssertionError: token type must be simple type or callable, not Token.Comment

Investigate possible performance/speed improvements

The normal HTML spec build time is on the order of 15 seconds — that is, without the highlighter.

As discussed on #whatwg IRC around https://freenode.logbot.info/whatwg/20180607#c1579082, running the HTML spec build with the highlighter incorporated causes the build time to increases to more than 4 minutes.

While it’s true that that highlighter is called multiple times for each <pre>, that ~4 minutes of time indicates it takes about 80 milliseconds to process each <pre> once.

We have 1121 <pre> elements in the HTML spec, so if we were to call the highlighter on each one only once, that would take about 90 seconds total.

So the highlighter integration is adding ~90 seconds to the build time (for each spec-version build).

To isolate where that time is being spent, I did some simple investigation with wattsi by replacing the call to the highlighter in the wattsi code with just a call to /bin/cat.

The resulting build time was 29 seconds. That is, it increased by just 13 seconds (from 16 to 29).

So that seems to indicate that only 13 seconds are spent executing the code I added to wattsi for extracting all the examples and IDL blocks from the spec and piping each to an external process and getting that resulting output back and writing it to the generated spec HTML.

And that suggests the bulk of the ~90 additional seconds needed to run the highlighter is spent in the highlighter code itself — and not in the I/O on the wattsi side or the internal wattsi code.

Therefore it might be useful to investigate whether any performance/speed improvements can be made in the highlighter code.

Update color scheme to meet WCAG

It appears the current color scheme in the highlighter doesn't meet WCAG color contrast requirements. I'd like to contribute updates to the colors.

In whatwg/whatwg.org#392, I worked on color contrast issues in WHATWG, and syntax highlights are one of the issues we haven't yet tackled there.

I don't have strong preferences around color schemes, my suggestion would be to use a11y-syntax-highlighting as that meets WCAG.

WebIDL parsing failures for six WebIDL blocks from HTML spec

The highlighter fails with Exception: IDL SYNTAX ERROR when processing each of the six JSON arrays for WebIDL content from the HTML spec in the following file:

Those correspond to the following six WebIDL blocks in the HTML spec:

I assume all of the failures are likely due to those instances not being valid WebIDL — and so to make them work as expected with the hightlighter, we probably need to fix the source in the HTML spec — but I’ve not actually confirmed yet whether or not they’re valid (so for now I’m filing this issue here).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.