Comments (10)
I'm surprised that IO was so little, and the highlighting itself was comparatively so much. That said, I've done absolutely zero perf-hacking on this so far, and am happy to try this out. I'll adapt the Bikeshed profiling machinery to see what I can tighten up.
from highlighter.
Sounds great — I’m happy to help with testing or in any other way needed
from highlighter.
So I'm running all ~1100 samples from https://gist.githubusercontent.com/sideshowbarker/8284404/raw/8efdf8fe9713839b839955c9a099ff90e2e559a1/examples.json, which should be the entirety of HTML, and printing the results to the console, and the entire things takes about 2.3s:
real 0m2.289s
user 0m2.216s
sys 0m0.056s
This is in line with what I thought should happen, which suggests that the actual slowdown is somewhere else. I suspect that it's in freshly invoking the Python interpreter 1100 times.
Instead of running /bin/cat
, could you just make an empty .py file somewhere with the following contents:
#!/usr/bin/env python2.7
# -*- coding: utf-8 -*-
pass
And invoke that instead of highlighter? I'll bet that should still take the majority of the time you're seeing.
If that's the case, then the only thing we can do to improve things is batch the calls together into a single script invocation. Is that possible to do in the Wattsi pipeline?
from highlighter.
And confirmed, I wrote another test that shelled out to highlight for each line, and it's taking much longer. Having trouble getting time
to spit out something useful, but it's definitely at least a minute.
from highlighter.
If that's the case, then the only thing we can do to improve things is batch the calls together into a single script invocation. Is that possible to do in the Wattsi pipeline?
No — while anything’s possible, that would not at all be practical to do it in Wattsi in its current form
from highlighter.
Okay, I just uploaded some new code that might work. Instead of invoking highlighter/__init__.py
, could you instead invoke highlighter/server.py &
(the &
will put it into the background), then curl -g localhost:8080/lang-goes-here?json-goes-here
? It should spit out the desired html string.
For example, curl -g http://localhost:8080/webidl?[%22pre%22,%7B%22class%22:%22idl%22%7D,%22interface%20Foo%20{%20readonly%20attribute%20long%20bar;%20};%22]
responds with <pre class='idl'><c- b>interface</c-> <c- g>Foo</c-> { <c- b>readonly</c-> <c- b>attribute</c-> <c- b>long</c-> <c- g>bar</c->; };</pre>
.
Note that you'll have to url-encode the json or else the server gets angry.
Hopefully this should be much faster, since it only starts up Python once. You'll be paying some HTTP overhead, but it's all local, so it shouldn't be too expensive.
from highlighter.
Thanks extremely much. I hooked the highligh server into the wattsi pipeline and it worked perfectly.
It’s just an initial interim integration using curl
as described above — but as expected, much much faster than re-calling the script each time.
As far as how fast it is, it takes the build time from the 5 minutes and 45 seconds it was requiring, and takes it down to about 1 minute and 45 seconds. That is, so far it’s reduced the time by 4 minutes.
While 105 seconds is still a big jump up from the 16 seconds the build was taking without this, it is worth remembering that the build isn’t doing the highlighting just once — instead it’s doing it 4 times.
So that means it’s taking about 23 seconds for each pass of the 1121 pre
elements — or about 20 milliseconds on average for each pre
.
But in this initial implementation, that has the cost of using shell process I/O by invoking curl
to make the request to the server API and get the response back.
It seems like it will be faster to instead replace that shell-process invocation with compiled code that does the same thing using the FPC native internal fphttpclient
http://wiki.freepascal.org/fphttpclient to handle the request and response.
So next I’ll try writing that up and see what we get.
from highlighter.
Out of curiousity, is there any way for you to cache the highlight results across runs, so that you only actually call into highlighter for the first run, and all subsequent runs can just immediately get their highlight results back? That should be ~free for the following runs, dropping 75% of the runtime.
from highlighter.
Out of curiousity, is there any way for you to cache the highlight results across runs
Yeah, I just now went ahead and added a simple caching mechanism to the Wattsi code. Adding that shaved an additional two seconds off the time needed for the syntax-highlighting build. So the total build time with syntax highlighting enabled is now 25 seconds in my environment — compared to 16 seconds without syntax highlighting enabled. So the performance cost is just 9 seconds total.
It’s imaginable there’s a faster way to do the caching than the way I did it. But we’re at the point where any additional speedup it might bring be wouldn’t be noticeable enough in practice to matter much.
from highlighter.
Closing, at the server option has eliminated the performance issue
from highlighter.
Related Issues (11)
- Loses some escapes in pre blocks? HOT 4
- Update widlparser HOT 1
- Roll idlparser again HOT 1
- Server doesn’t return 400 for invalid Web IDL HOT 2
- Upgrade to Python 3 HOT 4
- bs-highlighter and bikeshed require different/incompatible versions of pygments HOT 8
- Update color scheme to meet WCAG HOT 6
- WebIDL parsing failures for six WebIDL blocks from HTML spec HOT 5
- Failures for all CSS examples from HTML spec HOT 2
- HTML serializer incorrectly emits <br></br> instead of just <br> HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from highlighter.