Comments (7)
I think this is absolutely a good idea. I put some thought into this a while ago, but I didn't act on it because, frankly, it's such a mess to encode the equivalent operations in scala.xml or jaxp. :-S Also, I was a little afraid of what the performance would turn out to be.
You're quite right that we should be measuring this empirically.
from anti-xml.
If no one has started on this, I'll take a crack at it. I also want to get some measurements on some of the other Zipper methods (updated
, flatMap
, etc.)
from anti-xml.
I haven't started on it yet (haven't found the time). It's definitely an important item though; thanks for tackling it!
from anti-xml.
The above pull request includes these benchmarks along with some optimizations. The new "modify" trials measure an entire select/modify/unselect cycle (or its equivalent). To run them, enter test-perf :modify
from SBT. I also added benchmarks for individual zipper methods, which can be run using test-perf :zipperOps
. See test-perf --help
for more information on the new command line options.
Just to give a flavor for some of the new trials, I'll discuss some of the deep-modify results. These trials find an element by name and add an attribute to it. anti-xml
does this using a Zipper. The other implementations use the most straightforward algorithm I could think of.
The first deep-modify trial uses the same query as deepSelectSmall
and adds an attribute to each element:
[deepModifyFewSmall] deep modify a few elements of a 7 MB tree [modify 8 nodes]
+ anti-xml: min: 2 ms, max: 3 ms, average: 2 ms [result has 378832 nodes, 133010 elems, 2780 attrs]
+ scala.xml: min: 57 ms, max: 61 ms, average: 58 ms [result has 378832 nodes, 133010 elems, 2780 attrs]
+ javax.xml: min: 65 ms, max: 71 ms, average: 67 ms [result has 378832 nodes, 133010 elems, 2780 attrs]
As can be seen, anti-xml significantly outperforms the other implementations. This is primarily due to the bloom filter optimization, which gives anti-xml a huge advantage in this trial.
A different extreme can be seen in the deepModifyManySmall
trial. It is structurally similar, but it selects 2772 nodes rather than 8. This amounts to one element from every "doc" structure in the spending.xml
file.
[deepModifyManySmall] deep modification of many elements of a 7 MB tree [modify 2771 nodes]
+ anti-xml: min: 89 ms, max: 105 ms, average: 94 ms [result has 378832 nodes, 133010 elems, 5543 attrs]
+ scala.xml: min: 51 ms, max: 55 ms, average: 52 ms [result has 378832 nodes, 133010 elems, 5543 attrs]
+ javax.xml: min:6867 ms, max:6867 ms, average:6867 ms [result has 378832 nodes, 133010 elems, 5543 attrs]
The first item that jumps out is the awful jaxp performance. I haven't fully analyzed this, but it seems that updating nodes found via getElementsByTagName
is very slow.
Notice also that scala.xml
outperforms anti-xml
. To analyze this, we can look at the selection-only times for this particular query.
[deepModifyManySmallSel] Selection-only times for [deepModifyManySmall] [378832 nodes, 133010 elems, 2772 attrs]
+ anti-xml: min: 60 ms, max: 65 ms, average: 62 ms [Selected 2771 nodes]
+ scala.xml: min: 170 ms, max: 188 ms, average: 175 ms [Selected 2771 nodes]
+ javax.xml: min: 6 ms, max: 8 ms, average: 6 ms [Selected 2771 nodes]
+ anti-xml - no zipper: min: 60 ms, max: 65 ms, average: 62 ms [Selected 2771 nodes]
+ anti-xml - no bloom: min: 32 ms, max: 35 ms, average: 33 ms [Selected 2771 nodes]
As can be seen, roughtly 2/3 of the anti-xml
modify time was spent on selection. It turns out that the bloom filter doesn't help with this particular query, because we can't eliminate any significant branches of the tree. The anti-xml - no bloom
implementation measures this effect by using a custom selector that does not use the filter. For this query, the "no-bloom" version runs in roughly half the time as the normal version.
Strangely, the scala.xml
selection time is worse than its modify time! That's because the selection trial uses \\
, which is apparently slower than the manual traversal I used for the modify trial.
To get an idea of what's possible, I tried writing a some modify algorithms in anti-xml
that didn't use selectors or zippers. The full set of modify results is:
[deepModifyManySmall] deep modification of many elements of a 7 MB tree [modify 2771 nodes]
+ anti-xml: min: 89 ms, max: 105 ms, average: 94 ms [result has 378832 nodes, 133010 elems, 5543 attrs]
+ scala.xml: min: 51 ms, max: 55 ms, average: 52 ms [result has 378832 nodes, 133010 elems, 5543 attrs]
+ scala-xml, opt: min: 39 ms, max: 40 ms, average: 39 ms [result has 378832 nodes, 133010 elems, 5543 attrs]
+ javax.xml: min:6867 ms, max:6867 ms, average:6867 ms [result has 378832 nodes, 133010 elems, 5543 attrs]
+ anti-xml no-zip: min: 251 ms, max: 268 ms, average: 259 ms [result has 378832 nodes, 133010 elems, 5543 attrs]
+ anti-xml no-zip, opt: min: 25 ms, max: 27 ms, average: 25 ms [result has 378832 nodes, 133010 elems, 5543 attrs]
+ anti-xml no-zip, update: min: 28 ms, max: 35 ms, average: 29 ms [result has 378832 nodes, 133010 elems, 5543 attrs]
My best time was 25ms, in the anti-xml no-zip, opt
implementation. You can find the code for all of these in the modifyTrial.scala
file.
from anti-xml.
Followup - I actually found some optimizations that significantly reduce the overhead of the bloom filter. Indeed, they actually turn it into a net positive for the deepModifyManySmallSel
trial, above. They took the corresponding modify trial down from 94ms to 61ms.
from anti-xml.
I'm going to try to find some time in the near future to create some new charts on the website based on the new benchmarks. This is really amazing stuff. I was hoping to get zipper to within an order of 5 worse than manual rebuilding. The idea that it could be faster than rebuilding with scala.xml just never occurred to me!
from anti-xml.
:) Let me know if anything could use further explanation. Implementations like "anti-xml - no bloom" are mainly of internal interest, so I hacked it such that they don't show up at smaller run levels. If you run test-perf
without arguments, then you should just get the main results without the extra noise.
from anti-xml.
Related Issues (20)
- Zipper NodeMergeStrategy Should Be Parameter on unselect HOT 1
- Merge strategies should be presented with all of a node's replacements. HOT 1
- Stack Overflow when loading files HOT 1
- anti-xml_2.8.2-0.2 HOT 1
- please update documentation HOT 1
- XML.fromString(xmlString) throws StackOverflowException HOT 3
- Improve support for URI parts in namespaces HOT 8
- Prefix handling is wrong HOT 2
- Update TODO list HOT 1
- Zipper issue replacing nested element
- Zipper issue replacing nested element with ancestor element of same name HOT 2
- Stack Overflow when parsing HTML HOT 1
- Pretty-printing? HOT 3
- Anti-XML and xhtml and empty tags HOT 1
- asInstanceOf in LinkedOrderPreservingMap throws ClassCastException
- Zipper[T].unselect has unnecessarily restrictive return type
- Elem#copy can introduce invalid characters to names/attributes/prefixes
- anti-xml for Scala 2.10 HOT 10
- just an FYI
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from anti-xml.