Coder Social home page Coder Social logo

Zipper unselection benchmark about anti-xml HOT 7 OPEN

djspiewak avatar djspiewak commented on May 29, 2024
Zipper unselection benchmark

from anti-xml.

Comments (7)

djspiewak avatar djspiewak commented on May 29, 2024

I think this is absolutely a good idea. I put some thought into this a while ago, but I didn't act on it because, frankly, it's such a mess to encode the equivalent operations in scala.xml or jaxp. :-S Also, I was a little afraid of what the performance would turn out to be.

You're quite right that we should be measuring this empirically.

from anti-xml.

josharnold52 avatar josharnold52 commented on May 29, 2024

If no one has started on this, I'll take a crack at it. I also want to get some measurements on some of the other Zipper methods (updated, flatMap, etc.)

from anti-xml.

djspiewak avatar djspiewak commented on May 29, 2024

I haven't started on it yet (haven't found the time). It's definitely an important item though; thanks for tackling it!

from anti-xml.

josharnold52 avatar josharnold52 commented on May 29, 2024

The above pull request includes these benchmarks along with some optimizations. The new "modify" trials measure an entire select/modify/unselect cycle (or its equivalent). To run them, enter test-perf :modify from SBT. I also added benchmarks for individual zipper methods, which can be run using test-perf :zipperOps. See test-perf --help for more information on the new command line options.

Just to give a flavor for some of the new trials, I'll discuss some of the deep-modify results. These trials find an element by name and add an attribute to it. anti-xml does this using a Zipper. The other implementations use the most straightforward algorithm I could think of.

The first deep-modify trial uses the same query as deepSelectSmall and adds an attribute to each element:

[deepModifyFewSmall]        deep modify a few elements of a 7 MB tree          [modify 8 nodes]
 + anti-xml:                    min:   2 ms, max:   3 ms, average:   2 ms          [result has 378832 nodes, 133010 elems, 2780 attrs]
 + scala.xml:                   min:  57 ms, max:  61 ms, average:  58 ms          [result has 378832 nodes, 133010 elems, 2780 attrs]
 + javax.xml:                   min:  65 ms, max:  71 ms, average:  67 ms          [result has 378832 nodes, 133010 elems, 2780 attrs]

As can be seen, anti-xml significantly outperforms the other implementations. This is primarily due to the bloom filter optimization, which gives anti-xml a huge advantage in this trial.

A different extreme can be seen in the deepModifyManySmall trial. It is structurally similar, but it selects 2772 nodes rather than 8. This amounts to one element from every "doc" structure in the spending.xml file.

[deepModifyManySmall]       deep modification of many elements of a 7 MB tree  [modify 2771 nodes]
 + anti-xml:                    min:  89 ms, max: 105 ms, average:  94 ms          [result has 378832 nodes, 133010 elems, 5543 attrs]
 + scala.xml:                   min:  51 ms, max:  55 ms, average:  52 ms          [result has 378832 nodes, 133010 elems, 5543 attrs]
 + javax.xml:                   min:6867 ms, max:6867 ms, average:6867 ms          [result has 378832 nodes, 133010 elems, 5543 attrs]

The first item that jumps out is the awful jaxp performance. I haven't fully analyzed this, but it seems that updating nodes found via getElementsByTagName is very slow.

Notice also that scala.xml outperforms anti-xml. To analyze this, we can look at the selection-only times for this particular query.

[deepModifyManySmallSel]    Selection-only times for [deepModifyManySmall]     [378832 nodes, 133010 elems, 2772 attrs]
 + anti-xml:                    min:  60 ms, max:  65 ms, average:  62 ms          [Selected 2771 nodes]
 + scala.xml:                   min: 170 ms, max: 188 ms, average: 175 ms          [Selected 2771 nodes]
 + javax.xml:                   min:   6 ms, max:   8 ms, average:   6 ms          [Selected 2771 nodes]
 + anti-xml - no zipper:        min:  60 ms, max:  65 ms, average:  62 ms          [Selected 2771 nodes]
 + anti-xml - no bloom:         min:  32 ms, max:  35 ms, average:  33 ms          [Selected 2771 nodes]

As can be seen, roughtly 2/3 of the anti-xml modify time was spent on selection. It turns out that the bloom filter doesn't help with this particular query, because we can't eliminate any significant branches of the tree. The anti-xml - no bloom implementation measures this effect by using a custom selector that does not use the filter. For this query, the "no-bloom" version runs in roughly half the time as the normal version.

Strangely, the scala.xml selection time is worse than its modify time! That's because the selection trial uses \\, which is apparently slower than the manual traversal I used for the modify trial.

To get an idea of what's possible, I tried writing a some modify algorithms in anti-xml that didn't use selectors or zippers. The full set of modify results is:

[deepModifyManySmall]       deep modification of many elements of a 7 MB tree  [modify 2771 nodes]
 + anti-xml:                    min:  89 ms, max: 105 ms, average:  94 ms          [result has 378832 nodes, 133010 elems, 5543 attrs]
 + scala.xml:                   min:  51 ms, max:  55 ms, average:  52 ms          [result has 378832 nodes, 133010 elems, 5543 attrs]
 + scala-xml, opt:              min:  39 ms, max:  40 ms, average:  39 ms          [result has 378832 nodes, 133010 elems, 5543 attrs]
 + javax.xml:                   min:6867 ms, max:6867 ms, average:6867 ms          [result has 378832 nodes, 133010 elems, 5543 attrs]
 + anti-xml no-zip:             min: 251 ms, max: 268 ms, average: 259 ms          [result has 378832 nodes, 133010 elems, 5543 attrs]
 + anti-xml no-zip, opt:        min:  25 ms, max:  27 ms, average:  25 ms          [result has 378832 nodes, 133010 elems, 5543 attrs]
 + anti-xml no-zip, update:     min:  28 ms, max:  35 ms, average:  29 ms          [result has 378832 nodes, 133010 elems, 5543 attrs]

My best time was 25ms, in the anti-xml no-zip, opt implementation. You can find the code for all of these in the modifyTrial.scala file.

from anti-xml.

josharnold52 avatar josharnold52 commented on May 29, 2024

Followup - I actually found some optimizations that significantly reduce the overhead of the bloom filter. Indeed, they actually turn it into a net positive for the deepModifyManySmallSel trial, above. They took the corresponding modify trial down from 94ms to 61ms.

from anti-xml.

djspiewak avatar djspiewak commented on May 29, 2024

I'm going to try to find some time in the near future to create some new charts on the website based on the new benchmarks. This is really amazing stuff. I was hoping to get zipper to within an order of 5 worse than manual rebuilding. The idea that it could be faster than rebuilding with scala.xml just never occurred to me!

from anti-xml.

josharnold52 avatar josharnold52 commented on May 29, 2024

:) Let me know if anything could use further explanation. Implementations like "anti-xml - no bloom" are mainly of internal interest, so I hacked it such that they don't show up at smaller run levels. If you run test-perf without arguments, then you should just get the main results without the extra noise.

from anti-xml.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.