Coder Social home page Coder Social logo

SIMD about mold HOT 3 CLOSED

rui314 avatar rui314 commented on July 18, 2024
SIMD

from mold.

Comments (3)

rui314 avatar rui314 commented on July 18, 2024 1

mold does not use SIMD instructions explicitly, but SIMD is used at a lot of places in mold, because many library functions are implemented using SIMD. For example, glibc's strlen is implemented using SSE 4.2's instructions, I believe. Other example is xxhash3. There might be other places that I can use SIMD to improve mold's performance, but I can't come up with anything right now.

As to cryptographic hashing, I actually tried BLAKE3. We are currently using SHA-256 for Identical Comdat Folding and Build-ID computation. For the former use case, we need to compute a cryptographic hash for small data (typically less than 100 bytes). For the latter, we compute a SHA-256 for the entire output file, which can be as large as multi-gigabyte.

It looks like BLAKE3 is slower than SHA-256 at least on my machine for small data. This is perhaps due to high initialization and finalization cost. For large data, BLKAE3 is indeed faster than SHA-256 by a factor of two. If we have enough number of cores, build-id computation is bounded by memory bandwidth even with SHA-256, so I don't see an immediate need to switch to BLAKE3, though.

from mold.

Alcaro avatar Alcaro commented on July 18, 2024

SIMD can speed up large loops where each iteration does the same thing (no input-dependent branches), and each iteration does not depend on the previous one. For example, SIMD can speed up most kinds of non-cryptographic checksum calculation.

SIMD can not speed up anything else, including branchy code, recursion, too short loops, non-loops, and most data structure operations (arrays are fine, few other things are). Sometimes it's possible to rewrite a function into a SIMD-friendlier form, but it's rare.

I think mold spends most of its time in TBB hashmaps. The hash calculation may be SIMDable, unless TBB already does that; the rest of the hashmap is, as far as I know, not SIMDable.

SIMD can also only speed up your own code. Disk access belongs to the kernel, not to mold.

from mold.

kirawi avatar kirawi commented on July 18, 2024

I agree with you, I was more so referring to the resolving symbols step. simdjson uses some clever SIMD tricks discussed in their paper to avoid branching while still being able to resolve symbols, but I'm not certain how applicable it would be here.

As for hashing, https://github.com/BLAKE3-team/BLAKE3 (6.8 GiB/s versus SHA1's 1 GiB/s in their benchmark), would be a good candidate, but I imagine @rui314 isn't keen on adding more dependencies unless they're absolutely necessary.

from mold.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.