Coder Social home page Coder Social logo

Comments (8)

straight-shoota avatar straight-shoota commented on July 4, 2024 2

First of all, this is a great find! Memory allocations are a huge performance factor in many Crystal applications. Speeding that up is a great win.

I'd like to discuss this independently of the proposed solution #14679 which IMO goes too far in one go.

I believe we should take smaller steps:

  1. Implement memory clearing for gc_none (#14678). This is an obvious bug because the documentation clearly states that GC.malloc should clear the memory.
  2. That allows dropping the memset intrinsic in the compiler, getting rid of double clearing. This resolves this bug report.

Adding support for allocating non-zeroed memory as a performance improvement is a separate feature request.
Let's discuss that in #14687.

from crystal.

HertzDevil avatar HertzDevil commented on July 4, 2024 2

In LLVM 19 there is now an initializes attribute which supposedly aids dead store elimination of the memset in use cases like this: llvm/llvm-project#84803

from crystal.

ysbaddaden avatar ysbaddaden commented on July 4, 2024 1

Copied from #14687 (I mixed up the issues 🤦):

Investigating memory allocations in codegen, there are only a few calls:

  • Codegen#malloc is called by Codegen#malloc_closure and Codegen#malloc_aggregate.
  • Codegen#malloc_atomic is only called from Codegen#allocate_aggregate.
  • Codegen#array_malloc and Codegen#array_malloc_atomic are only used to implement Pointer.malloc.

Of these use cases:

  • Codegen#malloc_closure: I don't know if it needs the memory to be initialized to zero (it currently does or doesn't because of #14678); I believe this is for #14687 to decide;

  • Codegen#allocate_aggregate: it needs the memory to be initialized to zero, and may do it twice => 🐛

  • Pointer(T).malloc: the memory must be initialized to zero (whatever if T has inner pointers) and may do it twice => 🐛 (thanks @BlobCodes for correcting me).

Proposal:

Having the four Codegen#[array_]malloc[_atomic] methods always return initialized memory would harmonize the malloc methods behavior, then fixing #malloc_aggregate #codegen_primitive_pointer_malloc to avoid a pointless memset should be enough to fix this issue simply & quickly.

Then #14687 could go and squeeze even more performance.

from crystal.

ysbaddaden avatar ysbaddaden commented on July 4, 2024 1

The only issue with #14710 is that it changes the public GC.malloc_atomic to now clear the memory, and we use it to allocate buffers in stdlib (searching github, there are only a couple usages of GC.malloc_atomic)... when I thought to fix the codegen helpers only.

This is a much better move to harmonize it all and you were right all along: we'll want a GC.unsafe_malloc_atomic or GC.uninitialized_malloc_atomic (just with reversed naming). I'd prefer an arg, but that wouldn't work with LLVM allockind attribute, right?

Maybe do this in three steps?

  1. fix GC.malloc for gc/none (bugfix);
  2. fix Codegen#[array_]malloc_atomic to return initialized memory + fix double clears;
  3. add GC.unsafe_malloc_atomic (or GC.uninitialized_malloc_atomic), use it to allocate buffers in stdlib, make GC.malloc_atomic to always initialize memory, and finally update Codegen#malloc_atomic (memory is already initialized).

I believe 1. and 2. would be quickly merged (mere bugfixes), while we can discuss 3. (feature change of the public API).

I understand this is getting very boring to deal with. We're splitting our own PRs in a myriad of smaller ones on a daily basis 😫

Note: this is only my personal opinion. Let's wait for the others.

from crystal.

straight-shoota avatar straight-shoota commented on July 4, 2024

I recalled another discussion which showed an issue with zeroing out memory in the initialization. It makes it impossible to lazily allocate memory via mmap, so that it only assigns the memory that is actually written to. If the runtime explicitly writes the entire memory slice full of zeros, lazy allocation doesn't work (and I don't think this can be optimized away).
So it makes a lot of sense to delegate zeroing to the allocator. It can make the best decisions about how to implement it most efficiently. For example this can be the first time that the memory is actually accessed (i.e. written to).

Ref #11572 (comment) ff.

from crystal.

straight-shoota avatar straight-shoota commented on July 4, 2024

What's the point in having GC.malloc_atomic return zeroed memory?
The main reason for zeroing is to prevent the GC from misidentifying garbage data as pointers. So this is only necessary for memory in which the GC looks for pointers.
GC.malloc_atomic is explicitly for pointer-free memory. So I don't see why it would need to clear the memory.

It is a public API method. Changing its semantics has performance implications for a lot of user code. We should only do that if there's a really good reason for it.

from crystal.

BlobCodes avatar BlobCodes commented on July 4, 2024

Dirty memory can always cause hard-to-fix bugs, so having clean memory be the default could be a gold idea.

The main reason is to be able to provide a consistent API together with support for #14687.

If we later want to have a non-clearing variant of malloc and a clearing variant of malloc_atomic, the inconsistent clearing behaviour between the two existing methods will probably cause an ugly API.

The way I see it, we currently have three options when wanting to implement #14687:

  • Retrofit the old methods to have consistent clearing behaviour and add new _unsafe or _clean variants consistently.
  • Have an inconsistent/bad API (malloc_dirty + malloc_atomic_clean
  • Deprecate current methods and re-do it

We definitely want some way of allocating clean pointer-free memory because the compiler expects clean memory for reference types and Pointer.malloc - and the allocator will be more efficient at it than the compiler.

from crystal.

straight-shoota avatar straight-shoota commented on July 4, 2024

Having a clean default sounds good, of course. But it has a performance cost.
Feature correctness should always come before performance, but this isn't about correctness, it's about preventing a light foot gun. But this is more like a gun that you cannot point down and you have to lift your foot in order to hit it.

AFAIK GC.malloc_atomic has always been returning non-zeroed memory and I'm not aware of any particular issues with that. So I believe it's reasonable to assume the risk is relatively small.

The issue we're addressing here is that the same memory is cleared twice. We should focus on that and not introduce any other changes in the same step.

Also, I don't want to introduce an avoidable change without offering an alternative to retain current performance.
We need to continue the discussion about changing the API for memory allocations.

from crystal.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.