Coder Social home page Coder Social logo

Comments (4)

tcbrindle avatar tcbrindle commented on June 11, 2024 1

Hi @SylvanBrocard, thanks for the bug report! This is definitely an interesting one.

Looking at your code, the first thing that occurs to me is that the pipeline you've used (iota -> filter -> map -> sum) should be using Flux's internal iteration code path, and so the shape-changing aspect of filter shouldn't be relevant -- we just end up calling the same predicate on every element of the iota sequence anyway. I'm pretty sure that the equivalent Rust pipeline is also using internal iteration.

Indeed, if we change the data source of flux_sum to be a vector instead of flux::iota, we can see that we do now get auto-vectorisation (at least with Clang, but since that uses the same optimiser as rustc it's a more apples-to-apples comparison anyway). This strongly suggests to me that the culprit in this case is iota, rather than filter.

Going back to the original code, turning on Clang's missed vectorisation reporting hints that there's something about iota's end check that it doesn't like (as a wild guess, it can't calculate the trip count?).

To investigate, I hacked together a very quick and dirty replacement for flux::iota with a specialisation of for_each_while (Flux's internal iteration customisation point). It turns out that for some reason, the Clang auto-vectoriser really wants the end cursor to be saved into a local variable -- despite the fact that it's very likely that everything is getting inlined in flux_sum at O3, so I would have though it would be able to determine that self.end can't get modified anywhere... Anyway, after making this change, we can see that we do indeed get auto-vectorisation: https://flux.godbolt.org/z/vPsGd9qWb

So the solution in this case is for iota to gain a specialisation of for_each_while which explicitly makes a copy of the end cursor in its implementation. Fortunately, that should be pretty simple to do.

from flux.

tcbrindle avatar tcbrindle commented on June 11, 2024 1

After merging #181, Clang now generates 100% identical code to Rust for your example: https://flux.godbolt.org/z/8PWx8fsG6

Rather than adding a for_each_while specialisation just for iota as mentioned in the previous comment, I actually went for a more general solution of providing a generic for_each_while specialisation for all multipass, bounded sequences. Hopefully this means that more sequences can now benefit from auto-vectorisation as well.

Thanks very much for the bug report @SylvanBrocard, I'm happy to have been able to improve this!

from flux.

tcbrindle avatar tcbrindle commented on June 11, 2024

...and having said all that, I do agree that an alternative version of filter which yields optionals would be an interesting adaptor in its own right (I remember @brycelelbach talking about it on an episode of ADSP once). So we could definitely look at adding that in addition to the above fix.

from flux.

SylvanBrocard avatar SylvanBrocard commented on June 11, 2024

(I remember @brycelelbach talking about it on an episode of ADSP once).

Thank you, that's exactly what I was thinking about but couldn't remember the name of the podcast (it's episode 124).

from flux.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.