tabatkins / highlighter Goto Github PK
View Code? Open in Web Editor NEWA self-contained *markup-preserving* syntax highlighter, powered by Pygments. Usable from command-line or as a Python module.
License: Other
A self-contained *markup-preserving* syntax highlighter, powered by Pygments. Usable from command-line or as a Python module.
License: Other
As far as I can tell, server.py doesn’t actually return a 400 for invalid Web IDL.
When I test with invalid Web IDL, I see IDL SYNTAX ERROR
messages reported up from the widlparser code — as expected — but in spite of those syntax errors, server.py still returns a 200 instead of the expected 400.
I don’t understand the widlparser code well, but as far as I can see, it isn’t causing any exception to be raised when it finds a syntax error. And since the server.py code only returns a 400 if it has caught an exception, it just reports a 200 even if the widlparser found a syntax error.
I would try to submit a patch that does something other than trying to catch an exception — but looking at the widlparser source, I don’t understand what mechanism it actually exposes for consuming code to otherwise programatically check if there have been any syntax errors.
See whatwg/html#4661.
When I pass the input
<pre>
<iframe sandbox srcdoc="<p>Yeah, you can see it <a href=&quot;/gallery?mode=cover&amp;amp;page=1&quot;>in my gallery</a>."></iframe>
</pre>
through Wattsi alone, I get the expected output, i.e. that same HTML, which renders as shown in http://software.hixie.ch/utilities/js/live-dom-viewer/?saved=6959.
When I pass it through Wattsi + highlighter, I get
<c- p=""><</c-><c- f="">iframe</c-> <c- e="">sandbox</c-> <c- e="">srcdoc</c-><c- o="">=</c-><c- s="">"<p>Yeah, you can see it <a href="</c->/<c- e="">gallery</c->?<c- e="">mode</c-><c- o="">=</c-><c- s="">cover&amp;amp;page=1"</c-><c- p="">></c->in my gallery<c- p=""></</c-><c- f="">a</c-><c- p="">></c->."><c- p=""></</c-><c- f="">iframe</c-><c- p="">></c->
which notably has a href="
instead of a href=&quot;
.
/cc @sideshowbarker in case this is a problem with Wattsi.
Given JSON input with a br
element, and with --output html
, the serializer emits <br></br>
:
$ echo '["pre",{},"<pre>",["br",{}],"Hello</pre>"]' \
| ./highlighter/__init__.py --output html --just html html
<pre><c- ni>&lt;</c->pre><br></br>Hello<c- ni>&lt;</c->/pre></pre>
It should instead emit only <br>
. The </br>
end tag is a document-conformance error:
https://html.spec.whatwg.org/multipage/syntax.html#elements-2:void-elements-2
Void elements only have a start tag; end tags must not be specified for void elements.
which makes it a pain to successfully install both bikeshed and bs-highlighter.
In the HTML spec, there are 29 examples of CSS stylesheet fragments, and processing of any of them in the highlighter fails with the same stack trace and error message:
AssertionError: token type must be simple type or callable, not Token.Comment
Example input:
$ echo '["pre",{"class": "css"},"img { width: 300px; height: 300px }\n@media (min-width: 32em) { img { width: 500px; height:300px } }\n@media (min-width: 45em) { img { width: 700px; height:400px } }"]' \
| ./highlighter/__init__.py css
…and resulting stack trace and error message:
Traceback (most recent call last):
File "./highlighter/__init__.py", line 47, in <module>
cli()
File "./highlighter/__init__.py", line 35, in cli
html,css = highlight(input, **options)
File "/Users/mike/workspace/html-build/highlighter/highlighter/highlight.py", line 26, in highlight
html = highlightEl(html, lang)
File "/Users/mike/workspace/html-build/highlighter/highlighter/highlight.py", line 77, in highlightEl
coloredText = highlightWithPygments(text, lang)
File "/Users/mike/workspace/html-build/highlighter/highlighter/highlight.py", line 162, in highlightWithPygments
lexer = lexerFromLang(lang)
File "/Users/mike/workspace/html-build/highlighter/highlighter/highlight.py", line 334, in lexerFromLang
return customLexers[lang]()
File "/Users/mike/workspace/html-build/highlighter/highlighter/highlight.py", line 13, in loadCSSLexer
return CSSLexer()
File "/Users/mike/workspace/html-build/highlighter/highlighter/pygments/pygments/lexer.py", line 580, in __call__
cls._tokens = cls.process_tokendef('', cls.get_tokendefs())
File "/Users/mike/workspace/html-build/highlighter/highlighter/pygments/pygments/lexer.py", line 519, in process_tokendef
cls._process_state(tokendefs, processed, state)
File "/Users/mike/workspace/html-build/highlighter/highlighter/pygments/pygments/lexer.py", line 503, in _process_state
token = cls._process_token(tdef[1])
File "/Users/mike/workspace/html-build/highlighter/highlighter/pygments/pygments/lexer.py", line 432, in _process_token
'token type must be simple type or callable, not %r' % (token,)
AssertionError: token type must be simple type or callable, not Token.Comment
This is a dependency of whatwg/html-build#220.
The normal HTML spec build time is on the order of 15 seconds — that is, without the highlighter.
As discussed on #whatwg IRC around https://freenode.logbot.info/whatwg/20180607#c1579082, running the HTML spec build with the highlighter incorporated causes the build time to increases to more than 4 minutes.
While it’s true that that highlighter is called multiple times for each <pre>
, that ~4 minutes of time indicates it takes about 80 milliseconds to process each <pre>
once.
We have 1121 <pre>
elements in the HTML spec, so if we were to call the highlighter on each one only once, that would take about 90 seconds total.
So the highlighter integration is adding ~90 seconds to the build time (for each spec-version build).
To isolate where that time is being spent, I did some simple investigation with wattsi by replacing the call to the highlighter in the wattsi code with just a call to /bin/cat
.
The resulting build time was 29 seconds. That is, it increased by just 13 seconds (from 16 to 29).
So that seems to indicate that only 13 seconds are spent executing the code I added to wattsi for extracting all the examples and IDL blocks from the spec and piping each to an external process and getting that resulting output back and writing it to the generated spec HTML.
And that suggests the bulk of the ~90 additional seconds needed to run the highlighter is spent in the highlighter code itself — and not in the I/O on the wattsi side or the internal wattsi code.
Therefore it might be useful to investigate whether any performance/speed improvements can be made in the highlighter code.
Probably the cause of whatwg/html#4832
Some more new stuff we'd like to use in HTML has landed in https://github.com/plinss/widlparser/commits/master
It appears the current color scheme in the highlighter doesn't meet WCAG color contrast requirements. I'd like to contribute updates to the colors.
In whatwg/whatwg.org#392, I worked on color contrast issues in WHATWG, and syntax highlights are one of the issues we haven't yet tackled there.
I don't have strong preferences around color schemes, my suggestion would be to use a11y-syntax-highlighting as that meets WCAG.
The highlighter fails with Exception: IDL SYNTAX ERROR
when processing each of the six JSON arrays for WebIDL content from the HTML spec in the following file:
Those correspond to the following six WebIDL blocks in the HTML spec:
I assume all of the failures are likely due to those instances not being valid WebIDL — and so to make them work as expected with the hightlighter, we probably need to fix the source in the HTML spec — but I’ve not actually confirmed yet whether or not they’re valid (so for now I’m filing this issue here).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.