The normal HTML spec build time is on the order of 15 seconds — that is, without the h

So I'm running all ~1100 samples from <a href="https://gist.githubusercontent.com/side

Investigate possible performance/speed improvements about highlighter HOT 10 CLOSED

tabatkins commented on August 18, 2024

Investigate possible performance/speed improvements

from highlighter.

Comments (10)

tabatkins commented on August 18, 2024

I'm surprised that IO was so little, and the highlighting itself was comparatively so much. That said, I've done absolutely zero perf-hacking on this so far, and am happy to try this out. I'll adapt the Bikeshed profiling machinery to see what I can tighten up.

from highlighter.

sideshowbarker commented on August 18, 2024

Sounds great — I’m happy to help with testing or in any other way needed

from highlighter.

tabatkins commented on August 18, 2024

So I'm running all ~1100 samples from https://gist.githubusercontent.com/sideshowbarker/8284404/raw/8efdf8fe9713839b839955c9a099ff90e2e559a1/examples.json, which should be the entirety of HTML, and printing the results to the console, and the entire things takes about 2.3s:

real	0m2.289s
user	0m2.216s
sys	0m0.056s

This is in line with what I thought should happen, which suggests that the actual slowdown is somewhere else. I suspect that it's in freshly invoking the Python interpreter 1100 times.

Instead of running /bin/cat, could you just make an empty .py file somewhere with the following contents:

#!/usr/bin/env python2.7
# -*- coding: utf-8 -*-
pass

And invoke that instead of highlighter? I'll bet that should still take the majority of the time you're seeing.

If that's the case, then the only thing we can do to improve things is batch the calls together into a single script invocation. Is that possible to do in the Wattsi pipeline?

from highlighter.

tabatkins commented on August 18, 2024

And confirmed, I wrote another test that shelled out to highlight for each line, and it's taking much longer. Having trouble getting time to spit out something useful, but it's definitely at least a minute.

from highlighter.

sideshowbarker commented on August 18, 2024

If that's the case, then the only thing we can do to improve things is batch the calls together into a single script invocation. Is that possible to do in the Wattsi pipeline?

No — while anything’s possible, that would not at all be practical to do it in Wattsi in its current form

from highlighter.

tabatkins commented on August 18, 2024

Okay, I just uploaded some new code that might work. Instead of invoking highlighter/__init__.py, could you instead invoke highlighter/server.py & (the & will put it into the background), then curl -g localhost:8080/lang-goes-here?json-goes-here? It should spit out the desired html string.

For example, curl -g http://localhost:8080/webidl?[%22pre%22,%7B%22class%22:%22idl%22%7D,%22interface%20Foo%20{%20readonly%20attribute%20long%20bar;%20};%22] responds with <pre class='idl'><c- b>interface</c-> <c- g>Foo</c-> { <c- b>readonly</c-> <c- b>attribute</c-> <c- b>long</c-> <c- g>bar</c->; };</pre>.

Note that you'll have to url-encode the json or else the server gets angry.

Hopefully this should be much faster, since it only starts up Python once. You'll be paying some HTTP overhead, but it's all local, so it shouldn't be too expensive.

from highlighter.

sideshowbarker commented on August 18, 2024

Thanks extremely much. I hooked the highligh server into the wattsi pipeline and it worked perfectly.

It’s just an initial interim integration using curl as described above — but as expected, much much faster than re-calling the script each time.

As far as how fast it is, it takes the build time from the 5 minutes and 45 seconds it was requiring, and takes it down to about 1 minute and 45 seconds. That is, so far it’s reduced the time by 4 minutes.

While 105 seconds is still a big jump up from the 16 seconds the build was taking without this, it is worth remembering that the build isn’t doing the highlighting just once — instead it’s doing it 4 times.

So that means it’s taking about 23 seconds for each pass of the 1121 pre elements — or about 20 milliseconds on average for each pre.

But in this initial implementation, that has the cost of using shell process I/O by invoking curl to make the request to the server API and get the response back.

It seems like it will be faster to instead replace that shell-process invocation with compiled code that does the same thing using the FPC native internal fphttpclient http://wiki.freepascal.org/fphttpclient to handle the request and response.

So next I’ll try writing that up and see what we get.

from highlighter.

tabatkins commented on August 18, 2024

Out of curiousity, is there any way for you to cache the highlight results across runs, so that you only actually call into highlighter for the first run, and all subsequent runs can just immediately get their highlight results back? That should be ~free for the following runs, dropping 75% of the runtime.

from highlighter.

sideshowbarker commented on August 18, 2024

Out of curiousity, is there any way for you to cache the highlight results across runs

Yeah, I just now went ahead and added a simple caching mechanism to the Wattsi code. Adding that shaved an additional two seconds off the time needed for the syntax-highlighting build. So the total build time with syntax highlighting enabled is now 25 seconds in my environment — compared to 16 seconds without syntax highlighting enabled. So the performance cost is just 9 seconds total.

It’s imaginable there’s a faster way to do the caching than the way I did it. But we’re at the point where any additional speedup it might bring be wouldn’t be noticeable enough in practice to matter much.

from highlighter.

sideshowbarker commented on August 18, 2024

Closing, at the server option has eliminated the performance issue

from highlighter.

Investigate possible performance/speed improvements about highlighter HOT 10 CLOSED

Comments (10)

Related Issues (11)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent