Comments (4)
Hi @SylvanBrocard, thanks for the bug report! This is definitely an interesting one.
Looking at your code, the first thing that occurs to me is that the pipeline you've used (iota -> filter -> map -> sum) should be using Flux's internal iteration code path, and so the shape-changing aspect of filter shouldn't be relevant -- we just end up calling the same predicate on every element of the iota sequence anyway. I'm pretty sure that the equivalent Rust pipeline is also using internal iteration.
Indeed, if we change the data source of flux_sum
to be a vector instead of flux::iota
, we can see that we do now get auto-vectorisation (at least with Clang, but since that uses the same optimiser as rustc it's a more apples-to-apples comparison anyway). This strongly suggests to me that the culprit in this case is iota
, rather than filter
.
Going back to the original code, turning on Clang's missed vectorisation reporting hints that there's something about iota
's end check that it doesn't like (as a wild guess, it can't calculate the trip count?).
To investigate, I hacked together a very quick and dirty replacement for flux::iota
with a specialisation of for_each_while
(Flux's internal iteration customisation point). It turns out that for some reason, the Clang auto-vectoriser really wants the end cursor to be saved into a local variable -- despite the fact that it's very likely that everything is getting inlined in flux_sum
at O3, so I would have though it would be able to determine that self.end
can't get modified anywhere... Anyway, after making this change, we can see that we do indeed get auto-vectorisation: https://flux.godbolt.org/z/vPsGd9qWb
So the solution in this case is for iota
to gain a specialisation of for_each_while
which explicitly makes a copy of the end cursor in its implementation. Fortunately, that should be pretty simple to do.
from flux.
After merging #181, Clang now generates 100% identical code to Rust for your example: https://flux.godbolt.org/z/8PWx8fsG6
Rather than adding a for_each_while
specialisation just for iota
as mentioned in the previous comment, I actually went for a more general solution of providing a generic for_each_while
specialisation for all multipass, bounded sequences. Hopefully this means that more sequences can now benefit from auto-vectorisation as well.
Thanks very much for the bug report @SylvanBrocard, I'm happy to have been able to improve this!
from flux.
...and having said all that, I do agree that an alternative version of filter
which yields optionals would be an interesting adaptor in its own right (I remember @brycelelbach talking about it on an episode of ADSP once). So we could definitely look at adding that in addition to the above fix.
from flux.
(I remember @brycelelbach talking about it on an episode of ADSP once).
Thank you, that's exactly what I was thinking about but couldn't remember the name of the podcast (it's episode 124).
from flux.
Related Issues (20)
- Dropping from an empty sequence asserts HOT 3
- Using Flux to adapt a circular buffer : bug HOT 7
- Add product/permutations/combinations from Python itertools HOT 14
- Suggestion: add a header containing preprocessor macros. HOT 2
- Non const-iterable reversed sequences hard error when used as a range
- Attempted out-of-bounds read in flatten() HOT 1
- BUG: `flux::scan` broken HOT 3
- take_while should not be random-access HOT 1
- Consistent comparisons
- Internal `std::variant` access optimizations HOT 3
- cartesian_product size() can overflow
- Add zip_map adaptor
- Clang Compilation failure HOT 4
- move of `loc` in assert.hpp causes clang-tidy error HOT 2
- cartesian_product::last() sometimes returns wrongly-initialised cursor if one of the source sequences is empty
- Clang crash HOT 2
- strange output, [[2]] instead of [] HOT 2
- Visual studio\code intellisense experience HOT 1
- Blog post shows perf gaps between flux and other approaches HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from flux.