Coder Social home page Coder Social logo

Comments (6)

Yawning avatar Yawning commented on June 8, 2024

Before I ever get tempted to do this again, I am not sure if it is possible to do the AVX2 backend, without it being an extreme exercise in masochism. The reasoning is as follows:

  • The intrinsic code has the benefit of the compiler handling register allocation, and spilling, while this will need to be done by hand. While this isn't usually a big deal, square_and_negate_D and mul in particular use a rather large number of variables.

  • It may not be possible to port the code over in a way that is easy to maintain. The original code's nice split between the vector of field elements and the point operations is probably unable to be preserved without trashing performance. For example, having the overhead of a function call, moving data to/from memory just to do 5 VPBLENDDs is clearly extremely inefficient.

The AVX-512 IFMA backend looks more promising with respect to the former concern (since I get double the registers to work with, and the limbs are larger), the latter while an issue might be tolerable.

That said, I currently do not have access to a CPU that supports AVX-512 IFMA, so as of now it is a rather moot point.

from curve25519-voi.

Yawning avatar Yawning commented on June 8, 2024

I got tempted to do this again, and the results are in #19. I'm not sure how much I'll hate maintaining this, but it does go really fast.

from curve25519-voi.

Yawning avatar Yawning commented on June 8, 2024

While the goal of this project is to produce something that is relatively easy to maintain, and I have lingering doubts over how much I will enjoy maintaining the AVX2 code, I went ahead and merged it because it provides a substantial improvement to verification performance.

I can't think of a nice way to make this also support AVX-512 without it being a total shitshow, so my tentative plan when I eventually get a system that supports it is to yank the AVX2 code out and replace it with AVX-512, though that is unlikely to happen for a while.

from curve25519-voi.

Yawning avatar Yawning commented on June 8, 2024

I do not see this happening in the short to medium term.

While it is currently possible to do the development with a consumer oriented Rocket Lake system, that involves someone actually buying a Rocket Lake processor. As Rocket Lake is is better left off as silicone in the form of the sand in a cat shit filled public park sandbox, I will not be doing so.

While Alder Lake is (hopefully) going to be an improvement, and actually worth using, Intel is reversing their recent trend on bringing AVX-512 to the mass market, with the consumer and desktop SKUs having the unit disabled by fuse, indicating their desire to keep the instruction set as a server only (Sapphire Rapids) thing.

Closing till the AVX-512 availability situation changes.

Note: If someone wants to supply me with hardware with an AVX-512 unit, that will sit in my apartment, to get this done and to maintain it, I'm open to options.

from curve25519-voi.

Yawning avatar Yawning commented on June 8, 2024

Apparently AVX-512 is available on Alder Lake if the E-cores are disabled, so a consumer oriented system is still suitable for development. I also do have access to a Tiger Lake i7.

I'm vaguely tempted to reconsider this, especially since license-based downclocking appears to be a non-issue in Ice Lake/Rocket Lake (there is a voltage transition performance penalty still), but I'm not all that enthusiastic about maintaining two separate assembly implementations (since AVX-512 is not nearly as ubiquitous as AVX-2 is).

See: https://travisdowns.github.io/blog/2020/08/19/icl-avx512-freq.html

from curve25519-voi.

Yawning avatar Yawning commented on June 8, 2024

Welp, never mind. Intel is killing AVX-512 on ADL with a ME update.

from curve25519-voi.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.