Bug Report When allocating non-atomic memory using the <code class

Copied from <a class="issue-link js-issue-link" data-error-text="Failed to load ti

The only issue with <a class="issue-link js-issue-link" data-error-text="Failed to loa

Non-atomic memory is cleared twice about crystal HOT 8 OPEN

BlobCodes commented on July 4, 2024

Non-atomic memory is cleared twice

from crystal.

Comments (8)

straight-shoota commented on July 4, 2024 2

First of all, this is a great find! Memory allocations are a huge performance factor in many Crystal applications. Speeding that up is a great win.

I'd like to discuss this independently of the proposed solution #14679 which IMO goes too far in one go.

I believe we should take smaller steps:

Implement memory clearing for gc_none (#14678). This is an obvious bug because the documentation clearly states that GC.malloc should clear the memory.
That allows dropping the memset intrinsic in the compiler, getting rid of double clearing. This resolves this bug report.

Adding support for allocating non-zeroed memory as a performance improvement is a separate feature request.
Let's discuss that in #14687.

from crystal.

HertzDevil commented on July 4, 2024 2

In LLVM 19 there is now an initializes attribute which supposedly aids dead store elimination of the memset in use cases like this: llvm/llvm-project#84803

from crystal.

ysbaddaden commented on July 4, 2024 1

Copied from #14687 (I mixed up the issues 🤦):

Investigating memory allocations in codegen, there are only a few calls:

Codegen#malloc is called by Codegen#malloc_closure and Codegen#malloc_aggregate.
Codegen#malloc_atomic is only called from Codegen#allocate_aggregate.
Codegen#array_malloc and Codegen#array_malloc_atomic are only used to implement Pointer.malloc.

Of these use cases:

Codegen#malloc_closure: I don't know if it needs the memory to be initialized to zero (it currently does or doesn't because of #14678); I believe this is for #14687 to decide;
Codegen#allocate_aggregate: it needs the memory to be initialized to zero, and may do it twice => 🐛
Pointer(T).malloc: the memory must be initialized to zero (whatever if T has inner pointers) and may do it twice => 🐛 (thanks @BlobCodes for correcting me).

Proposal:

Having the four Codegen#[array_]malloc[_atomic] methods always return initialized memory would harmonize the malloc methods behavior, then fixing #malloc_aggregate #codegen_primitive_pointer_malloc to avoid a pointless memset should be enough to fix this issue simply & quickly.

Then #14687 could go and squeeze even more performance.

from crystal.

ysbaddaden commented on July 4, 2024 1

The only issue with #14710 is that it changes the public GC.malloc_atomic to now clear the memory, and we use it to allocate buffers in stdlib (searching github, there are only a couple usages of GC.malloc_atomic)... when I thought to fix the codegen helpers only.

This is a much better move to harmonize it all and you were right all along: we'll want a GC.unsafe_malloc_atomic or GC.uninitialized_malloc_atomic (just with reversed naming). I'd prefer an arg, but that wouldn't work with LLVM allockind attribute, right?

Maybe do this in three steps?

fix GC.malloc for gc/none (bugfix);
fix Codegen#[array_]malloc_atomic to return initialized memory + fix double clears;
add GC.unsafe_malloc_atomic (or GC.uninitialized_malloc_atomic), use it to allocate buffers in stdlib, make GC.malloc_atomic to always initialize memory, and finally update Codegen#malloc_atomic (memory is already initialized).

I believe 1. and 2. would be quickly merged (mere bugfixes), while we can discuss 3. (feature change of the public API).

I understand this is getting very boring to deal with. We're splitting our own PRs in a myriad of smaller ones on a daily basis 😫

Note: this is only my personal opinion. Let's wait for the others.

from crystal.

straight-shoota commented on July 4, 2024

I recalled another discussion which showed an issue with zeroing out memory in the initialization. It makes it impossible to lazily allocate memory via mmap, so that it only assigns the memory that is actually written to. If the runtime explicitly writes the entire memory slice full of zeros, lazy allocation doesn't work (and I don't think this can be optimized away).
So it makes a lot of sense to delegate zeroing to the allocator. It can make the best decisions about how to implement it most efficiently. For example this can be the first time that the memory is actually accessed (i.e. written to).

Ref #11572 (comment) ff.

from crystal.

straight-shoota commented on July 4, 2024

What's the point in having GC.malloc_atomic return zeroed memory?
The main reason for zeroing is to prevent the GC from misidentifying garbage data as pointers. So this is only necessary for memory in which the GC looks for pointers.
GC.malloc_atomic is explicitly for pointer-free memory. So I don't see why it would need to clear the memory.

It is a public API method. Changing its semantics has performance implications for a lot of user code. We should only do that if there's a really good reason for it.

from crystal.

BlobCodes commented on July 4, 2024

Dirty memory can always cause hard-to-fix bugs, so having clean memory be the default could be a gold idea.

The main reason is to be able to provide a consistent API together with support for #14687.

If we later want to have a non-clearing variant of malloc and a clearing variant of malloc_atomic, the inconsistent clearing behaviour between the two existing methods will probably cause an ugly API.

The way I see it, we currently have three options when wanting to implement #14687:

Retrofit the old methods to have consistent clearing behaviour and add new _unsafe or _clean variants consistently.
Have an inconsistent/bad API (malloc_dirty + malloc_atomic_clean
Deprecate current methods and re-do it

We definitely want some way of allocating clean pointer-free memory because the compiler expects clean memory for reference types and Pointer.malloc - and the allocator will be more efficient at it than the compiler.

from crystal.

straight-shoota commented on July 4, 2024

Having a clean default sounds good, of course. But it has a performance cost.
Feature correctness should always come before performance, but this isn't about correctness, it's about preventing a light foot gun. But this is more like a gun that you cannot point down and you have to lift your foot in order to hit it.

AFAIK GC.malloc_atomic has always been returning non-zeroed memory and I'm not aware of any particular issues with that. So I believe it's reasonable to assume the risk is relatively small.

The issue we're addressing here is that the same memory is cleared twice. We should focus on that and not introduce any other changes in the same step.

Also, I don't want to introduce an avoidable change without offering an alternative to retain current performance.
We need to continue the discussion about changing the API for memory allocations.

from crystal.

Non-atomic memory is cleared twice about crystal HOT 8 OPEN

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent