Comments (8)
First of all, this is a great find! Memory allocations are a huge performance factor in many Crystal applications. Speeding that up is a great win.
I'd like to discuss this independently of the proposed solution #14679 which IMO goes too far in one go.
I believe we should take smaller steps:
- Implement memory clearing for
gc_none
(#14678). This is an obvious bug because the documentation clearly states thatGC.malloc
should clear the memory. - That allows dropping the memset intrinsic in the compiler, getting rid of double clearing. This resolves this bug report.
Adding support for allocating non-zeroed memory as a performance improvement is a separate feature request.
Let's discuss that in #14687.
from crystal.
In LLVM 19 there is now an initializes
attribute which supposedly aids dead store elimination of the memset
in use cases like this: llvm/llvm-project#84803
from crystal.
Copied from #14687 (I mixed up the issues 🤦):
Investigating memory allocations in codegen, there are only a few calls:
Codegen#malloc
is called byCodegen#malloc_closure
andCodegen#malloc_aggregate
.Codegen#malloc_atomic
is only called fromCodegen#allocate_aggregate
.Codegen#array_malloc
andCodegen#array_malloc_atomic
are only used to implementPointer.malloc
.
Of these use cases:
-
Codegen#malloc_closure
: I don't know if it needs the memory to be initialized to zero (it currently does or doesn't because of #14678); I believe this is for #14687 to decide; -
Codegen#allocate_aggregate
: it needs the memory to be initialized to zero, and may do it twice => 🐛 -
Pointer(T).malloc
: the memory must be initialized to zero (whatever ifT
has inner pointers) and may do it twice => 🐛 (thanks @BlobCodes for correcting me).
Proposal:
Having the four Codegen#[array_]malloc[_atomic]
methods always return initialized memory would harmonize the malloc methods behavior, then fixing #malloc_aggregate
#codegen_primitive_pointer_malloc
to avoid a pointless memset should be enough to fix this issue simply & quickly.
Then #14687 could go and squeeze even more performance.
from crystal.
The only issue with #14710 is that it changes the public GC.malloc_atomic
to now clear the memory, and we use it to allocate buffers in stdlib (searching github, there are only a couple usages of GC.malloc_atomic
)... when I thought to fix the codegen helpers only.
This is a much better move to harmonize it all and you were right all along: we'll want a GC.unsafe_malloc_atomic
or GC.uninitialized_malloc_atomic
(just with reversed naming). I'd prefer an arg, but that wouldn't work with LLVM allockind
attribute, right?
Maybe do this in three steps?
- fix
GC.malloc
for gc/none (bugfix); - fix
Codegen#[array_]malloc_atomic
to return initialized memory + fix double clears; - add
GC.unsafe_malloc_atomic
(orGC.uninitialized_malloc_atomic
), use it to allocate buffers in stdlib, makeGC.malloc_atomic
to always initialize memory, and finally updateCodegen#malloc_atomic
(memory is already initialized).
I believe 1. and 2. would be quickly merged (mere bugfixes), while we can discuss 3. (feature change of the public API).
I understand this is getting very boring to deal with. We're splitting our own PRs in a myriad of smaller ones on a daily basis 😫
Note: this is only my personal opinion. Let's wait for the others.
from crystal.
I recalled another discussion which showed an issue with zeroing out memory in the initialization. It makes it impossible to lazily allocate memory via mmap
, so that it only assigns the memory that is actually written to. If the runtime explicitly writes the entire memory slice full of zeros, lazy allocation doesn't work (and I don't think this can be optimized away).
So it makes a lot of sense to delegate zeroing to the allocator. It can make the best decisions about how to implement it most efficiently. For example this can be the first time that the memory is actually accessed (i.e. written to).
Ref #11572 (comment) ff.
from crystal.
What's the point in having GC.malloc_atomic
return zeroed memory?
The main reason for zeroing is to prevent the GC from misidentifying garbage data as pointers. So this is only necessary for memory in which the GC looks for pointers.
GC.malloc_atomic
is explicitly for pointer-free memory. So I don't see why it would need to clear the memory.
It is a public API method. Changing its semantics has performance implications for a lot of user code. We should only do that if there's a really good reason for it.
from crystal.
Dirty memory can always cause hard-to-fix bugs, so having clean memory be the default could be a gold idea.
The main reason is to be able to provide a consistent API together with support for #14687.
If we later want to have a non-clearing variant of malloc
and a clearing variant of malloc_atomic
, the inconsistent clearing behaviour between the two existing methods will probably cause an ugly API.
The way I see it, we currently have three options when wanting to implement #14687:
- Retrofit the old methods to have consistent clearing behaviour and add new
_unsafe
or_clean
variants consistently. - Have an inconsistent/bad API (
malloc_dirty
+malloc_atomic_clean
- Deprecate current methods and re-do it
We definitely want some way of allocating clean pointer-free memory because the compiler expects clean memory for reference types and Pointer.malloc
- and the allocator will be more efficient at it than the compiler.
from crystal.
Having a clean default sounds good, of course. But it has a performance cost.
Feature correctness should always come before performance, but this isn't about correctness, it's about preventing a light foot gun. But this is more like a gun that you cannot point down and you have to lift your foot in order to hit it.
AFAIK GC.malloc_atomic
has always been returning non-zeroed memory and I'm not aware of any particular issues with that. So I believe it's reasonable to assume the risk is relatively small.
The issue we're addressing here is that the same memory is cleared twice. We should focus on that and not introduce any other changes in the same step.
Also, I don't want to introduce an avoidable change without offering an alternative to retain current performance.
We need to continue the discussion about changing the API for memory allocations.
from crystal.
Related Issues (20)
- Internal error when using `sizeof` as generic type argument in inferred ivar type
- ECR escape sequences do not work with `-`
- Customizing or hiding `Benchmark.ips`'s output format HOT 3
- Adding a Difference method to the Math module HOT 3
- Visit the Time.local in the macro. HOT 3
- Add Makefile support `--mcpu=native` as override FLAGS to permit build crystal compiler can enable this option optional for a better performance. HOT 4
- Compiler should Emit Warning/Notes when Deduced Type Differs from Annotated Type. HOT 5
- API breakage with `IO::Evented#evented_read` HOT 14
- Index out of bounds when uploading files through iOS HOT 8
- Breaking change to `Crystal::System::Socket#system_connect` timeout type HOT 5
- `Benchmark.ips` does not accept `Time::Span` for calculation and warmup times
- Force named arguments for spec methods
- LLVM codegen error can stall the compiler
- Playground TLS Cert Expired HOT 2
- Formatter inserts unexpected whitespace in `if` with comments after `macro`
- `super` with implicit block argument will not call `initialize`
- Interpreter: "BUG: missing local var migration from Nil to (String | Nil) (Exception)"
- Interpreter: LLVM ERROR: Invalid size request on a scalable vector. Aborted (core dumped)
- Representation of `Process` arguments HOT 1
- Discrepancy in `#[](start, count)` with negative start in array-like collections HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from crystal.