Coder Social home page Coder Social logo

Comments (12)

fulmicoton avatar fulmicoton commented on June 3, 2024 2

(@BurntSushi I quickly test it confirm that this solved my perf problem !)

from bitpacking.

fulmicoton avatar fulmicoton commented on June 3, 2024

Thanks for your comment. This is very precious to get that kind of comment when the crate is young and has no other clients than tantivy.

Let me check I understand the philosophy behind the best practise of SIMD in rust : the compilation of those instructions is independant from the target-cpu compilation flag. Whatever the target-cpu, the instructions I requested will be compiled as is. The idea is to produce binaries that embed optimized code using SIMD instructions, along with some scalar polyfill implementation. At runtime, the best implementation can then be selected after sniffing the features available in the CPU.

I have a couple of issues with this approach, some of which are specific to my use case, and I would like to know your opinion on the subject.

In my case, the different implementations of the bitpackers create formats that are incompatible one another. That's because the bitpacking at the register type level. For instance, when bitpacking over 4 bits using SSE3, the first byte will contain integer 0, and integer 5 (instead of integer 0 and integer 1).

It is possible to create a bitpacker compatible with the SSE3 format using only scalar instructions, but it is likely to be slower than the vanilla scalar bitpacker.
If someone is targetting a platform and knows that clients won't have SSE3 (e.g.: cellphones), one will most likely prefer to use the scalar bitpacker format than the slower SSE3 format polyfill.

There exists even a use case for people to use the ScalarBitPacker and the SSE3BitPacker in the same application : while this is not implemented yet, the scalar bitpacker makes it possible to get random
access to a range of values.

Another thing that concerns me is that many rustaceans are not aware of the target-cpu flag. In practise it is typically not a game changer... Except that when one uses SIMD instructions, compiling for a target-cpu that does not handle those SIMD instructions spectacularly destroy performances.
I can only hope for people to read the README and compile programs using tantivy with the right compilation flags.
In the end, I wonder if it wouldn't be less tricky for the users to just : have a sse3 feature flag and prevent them from compilling with a target-cpu that does not handle sse3... But I believe there is no way to return a clean looking error message today.

After reading those concerns, what do you think would be the best solution here ?

from bitpacking.

BurntSushi avatar BurntSushi commented on June 3, 2024

Let me check I understand the philosophy behind the best practise of SIMD in rust : the compilation of those instructions is independant from the target-cpu compilation flag. Whatever the target-cpu, the instructions I requested will be compiled as is. The idea is to produce binaries that embed optimized code using SIMD instructions, along with some scalar polyfill implementation. At runtime, the best implementation can then be selected after sniffing the features available in the CPU.

You got it!

In my case, the different implementations of the bitpackers create formats that are incompatible one another. That's because the bitpacking at the register type level. For instance, when bitpacking over 4 bits using SSE3, the first byte will contain integer 0, and integer 5 (instead of integer 0 and integer 1).

Oh... Yikes. That's a noodle scratcher. This does seem orthogonal to runtime dispatch to me though, no?

I do appreciate that this means you can't just expose one set of methods and have the crate "pick" the right one. You'll need to give callers a choice. But with runtime dispatch, you can give callers every choice available. e.g.,

// always succeeds
fn create_scalar_compressor() -> ScalarCompressor { ... }

// only succeeds when current CPU supports SSSE3
fn create_ssse3_compressor() -> Option<SSSE3Compressor> { ... }

// only succeeds when current CPU supports AVX2
fn create_avx2_compressor() -> Option<AVX2Compressor> { ... }

The corresponding types you get back would then have the methods suitable for doing compression (or decompression too, whatever).

It is then up to the caller to figure out which one to use. It's probable that the caller should return an error if its CPU doesn't have the appropriate capabilities to read the compressed data. e.g., Creating an index on an AVX2 CPU and trying to read it on a CPU that doesn't support AVX2 I guess should produce an error. It seems likely that you want/need this even if you stick with compile time options. Otherwise, if I run tantivy that happened to be compiled using different options and end up sacrificing the integrity of my data, then thats really bad.

Note that the ssse3 and avx2 types are also platform specific and should only be available on x86. You have some flexibility here in how you deal with it, but the simplest is probably to just promote the platform specificness into your own public API, but you could technically abstract over it (and I'd subsequently rename the types so that they didn't include ISA extension names in them).

(I probably still might not expose a trait for this since I try to keep APIs small, but I can definitely see the motivation given multiple distinct alternatives. And I could see callers would want to use the trait. You could also declare that the caller should define their own trait.)

Another thing that concerns me is that many rustaceans are not aware of the target-cpu flag. In practise it is typically not a game changer... Except that when one uses SIMD instructions, compiling for a target-cpu that does not handle those SIMD instructions spectacularly destroy performances.
I can only hope for people to read the README and compile programs using tantivy with the right compilation flags.

In the end, I wonder if it wouldn't be less tricky for the users to just : have a sse3 feature flag and prevent them from compilling with a target-cpu that does not handle sse3... But I believe there is no way to return a clean looking error message today.

Yeah this is exactly why I think people should be using runtime feature detection by default unless there is a good reason not to do it. With runtime feature detection, you shouldn't ever need to worry about compile time flags. This impacts every dependent of tantivy as well. For example, if I build an application that uses tantivy internally and start distributing binaries on GitHub, it would be awesome if those binaries took advantage of SIMD if it were available automatically. Doing this for compile time feature configuration is a huge burden; you need to ship a different binary for each permutation of target feature you want to support. Doing this for runtime feature detection is free: as the application writer, I don't even need to be aware of the fact that tantivy uses CPU specific features. It will Just Work. That is an enormous benefit, and IMO, it is worth doing almost everything possible to get it.

from bitpacking.

BurntSushi avatar BurntSushi commented on June 3, 2024

Also, I think the fact that the different methods here produce distinct incompatible bitpackings should be documented in huge bold letters. I didn't realize that until you told me. :-)

from bitpacking.

BurntSushi avatar BurntSushi commented on June 3, 2024

@fulmicoton I will also say this: it may be the case that I wind up needing to use a crate like this, and if you'd like to defer the work to address this issue, then I'd probably be happy to take it on assuming you're comfortable with major refactorings. I don't know when it will happen (it could be a year or more, or it could be a month, I have no clue).

from bitpacking.

fulmicoton avatar fulmicoton commented on June 3, 2024

@BurntSushi
The Option<> seems like the right thing to do.

Just to make sure of what should be done, let me be very specific here :

  • Fix the README
  • Force the user to request for a Compressor, by calling a get_scalar_compressor function
  • Keep a BitPacker trait but make the static methods instance method
  • Optionally create a polyfill for the SSSE3 format to scalar (optional)
  • Guard the SSSE3 / AVX stuff behind a #[cfg(target_arch)] gate.

With runtime feature detection, you shouldn't ever need to worry about compile time flags...
it would be awesome if those binaries took advantage of SIMD if it were available automatically.

Maybe I am misunderstanding your point, but unfortunately, in my experience it does not work that way.
Right now, if you use SIMD instruction "dynamically" on a binary that was not compiled with the right compilation flag your performance will suck badly. For instance, the performance of cargo bench on this crate are terrible if you omit to specify the target-cpu flag.
I read somewhere that this was because the presence of SIMD instructions was preventing inlining.

I will also say this: it may be the case that I wind up needing to use a crate like this, and if you'd like to defer the work to address this issue, then I'd probably be happy to take it on assuming you're comfortable with major refactorings. I don't know when it will happen (it could be a year or more, or it could be a month, I have no clue).

I am open to this refactoring and I would love if you could take over this.
I may have some bandwidth somewhere at the end of the month, but I just changed jobs and it might require some paperwork for me to resume working on opensource project.
Let's do it this way. Ping me if you feel you will have some time to work on this, so that I can assign the ticket to you. If I start working on it earlier, I will assign it to me.

from bitpacking.

BurntSushi avatar BurntSushi commented on June 3, 2024

Maybe I am misunderstanding your point, but unfortunately, in my experience it does not work that way.
Right now, if you use SIMD instruction "dynamically" on a binary that was not compiled with the right compilation flag your performance will suck badly. For instance, the performance of cargo bench on this crate are terrible if you omit to specify the target-cpu flag.
I read somewhere that this was because the presence of SIMD instructions was preventing inlining.

Oh I see, yeah, this is definitely not the intended behavior! I ported the regex crate over to runtime feature detection and everything works as expected. The most likely reason why function inlining isn't working for you is because of target feature mismatches. If you read the dynamic CPU feature detection section in the docs, you'll see an example of how to do it the right way, which requires use of the #[target_feature] attribute.

Other links:

from bitpacking.

BurntSushi avatar BurntSushi commented on June 3, 2024

I am open to this refactoring and I would love if you could take over this.
I may have some bandwidth somewhere at the end of the month, but I just changed jobs and it might require some paperwork for me to resume working on opensource project.
Let's do it this way. Ping me if you feel you will have some time to work on this, so that I can assign the ticket to you. If I start working on it earlier, I will assign it to me.

Sounds good. I have no timeline, but just wanted to put this out there!

from bitpacking.

fulmicoton avatar fulmicoton commented on June 3, 2024

If you read the dynamic CPU feature detection section in the docs, you'll see an example of how to do it the right way, which requires use of the #[target_feature] attribute.

stdsimd#401 is exactly the problem i have. I will try and see if #[target_feature] helps ! Thank you so much.

from bitpacking.

fulmicoton avatar fulmicoton commented on June 3, 2024

I think I should be able to take care of this issue this week. If that's ok I'll send you a CR @BurntSushi

from bitpacking.

BurntSushi avatar BurntSushi commented on June 3, 2024

Sounds good! No rush at all. :-)

from bitpacking.

fulmicoton avatar fulmicoton commented on June 3, 2024

PR available here. #5

Can you have look at it ? @BurntSushi

It seems like I cannot mark you as a reviewer because you dont belong to the tantivy-search github organization. (I'd gladly add you to it if you want to)

from bitpacking.

Related Issues (16)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.