Coder Social home page Coder Social logo

Comments (10)

tabatkins avatar tabatkins commented on August 18, 2024

I'm surprised that IO was so little, and the highlighting itself was comparatively so much. That said, I've done absolutely zero perf-hacking on this so far, and am happy to try this out. I'll adapt the Bikeshed profiling machinery to see what I can tighten up.

from highlighter.

sideshowbarker avatar sideshowbarker commented on August 18, 2024

Sounds great — I’m happy to help with testing or in any other way needed

from highlighter.

tabatkins avatar tabatkins commented on August 18, 2024

So I'm running all ~1100 samples from https://gist.githubusercontent.com/sideshowbarker/8284404/raw/8efdf8fe9713839b839955c9a099ff90e2e559a1/examples.json, which should be the entirety of HTML, and printing the results to the console, and the entire things takes about 2.3s:

real	0m2.289s
user	0m2.216s
sys	0m0.056s

This is in line with what I thought should happen, which suggests that the actual slowdown is somewhere else. I suspect that it's in freshly invoking the Python interpreter 1100 times.

Instead of running /bin/cat, could you just make an empty .py file somewhere with the following contents:

#!/usr/bin/env python2.7
# -*- coding: utf-8 -*-
pass

And invoke that instead of highlighter? I'll bet that should still take the majority of the time you're seeing.

If that's the case, then the only thing we can do to improve things is batch the calls together into a single script invocation. Is that possible to do in the Wattsi pipeline?

from highlighter.

tabatkins avatar tabatkins commented on August 18, 2024

And confirmed, I wrote another test that shelled out to highlight for each line, and it's taking much longer. Having trouble getting time to spit out something useful, but it's definitely at least a minute.

from highlighter.

sideshowbarker avatar sideshowbarker commented on August 18, 2024

If that's the case, then the only thing we can do to improve things is batch the calls together into a single script invocation. Is that possible to do in the Wattsi pipeline?

No — while anything’s possible, that would not at all be practical to do it in Wattsi in its current form

from highlighter.

tabatkins avatar tabatkins commented on August 18, 2024

Okay, I just uploaded some new code that might work. Instead of invoking highlighter/__init__.py, could you instead invoke highlighter/server.py & (the & will put it into the background), then curl -g localhost:8080/lang-goes-here?json-goes-here? It should spit out the desired html string.

For example, curl -g http://localhost:8080/webidl?[%22pre%22,%7B%22class%22:%22idl%22%7D,%22interface%20Foo%20{%20readonly%20attribute%20long%20bar;%20};%22] responds with <pre class='idl'><c- b>interface</c-> <c- g>Foo</c-> { <c- b>readonly</c-> <c- b>attribute</c-> <c- b>long</c-> <c- g>bar</c->; };</pre>.

Note that you'll have to url-encode the json or else the server gets angry.

Hopefully this should be much faster, since it only starts up Python once. You'll be paying some HTTP overhead, but it's all local, so it shouldn't be too expensive.

from highlighter.

sideshowbarker avatar sideshowbarker commented on August 18, 2024

Thanks extremely much. I hooked the highligh server into the wattsi pipeline and it worked perfectly.

It’s just an initial interim integration using curl as described above — but as expected, much much faster than re-calling the script each time.

As far as how fast it is, it takes the build time from the 5 minutes and 45 seconds it was requiring, and takes it down to about 1 minute and 45 seconds. That is, so far it’s reduced the time by 4 minutes.

While 105 seconds is still a big jump up from the 16 seconds the build was taking without this, it is worth remembering that the build isn’t doing the highlighting just once — instead it’s doing it 4 times.

So that means it’s taking about 23 seconds for each pass of the 1121 pre elements — or about 20 milliseconds on average for each pre.

But in this initial implementation, that has the cost of using shell process I/O by invoking curl to make the request to the server API and get the response back.

It seems like it will be faster to instead replace that shell-process invocation with compiled code that does the same thing using the FPC native internal fphttpclient http://wiki.freepascal.org/fphttpclient to handle the request and response.

So next I’ll try writing that up and see what we get.

from highlighter.

tabatkins avatar tabatkins commented on August 18, 2024

Out of curiousity, is there any way for you to cache the highlight results across runs, so that you only actually call into highlighter for the first run, and all subsequent runs can just immediately get their highlight results back? That should be ~free for the following runs, dropping 75% of the runtime.

from highlighter.

sideshowbarker avatar sideshowbarker commented on August 18, 2024

Out of curiousity, is there any way for you to cache the highlight results across runs

Yeah, I just now went ahead and added a simple caching mechanism to the Wattsi code. Adding that shaved an additional two seconds off the time needed for the syntax-highlighting build. So the total build time with syntax highlighting enabled is now 25 seconds in my environment — compared to 16 seconds without syntax highlighting enabled. So the performance cost is just 9 seconds total.

It’s imaginable there’s a faster way to do the caching than the way I did it. But we’re at the point where any additional speedup it might bring be wouldn’t be noticeable enough in practice to matter much.

from highlighter.

sideshowbarker avatar sideshowbarker commented on August 18, 2024

Closing, at the server option has eliminated the performance issue

from highlighter.

Related Issues (11)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.