“Better” is a very subjective term here, as has already been explained in the very thread you linked. There is more to the story there than “just” dropping in calloc
as a replacement to malloc
and be done with it.
This benchmark is misleading because it does not zero the memory until sum
is called. Please don’t sweep the big caveat of using calloc
under the rug and present it as a pure win in performance - it just isn’t. I’ve already benchmarked this in the other thread extensively.
No - this only “works” for types where a zero bitpattern also happens to coincide with the zero
of that type, as I mentioned in the linked thread:
In which case, using calloc
by default means the memory is initialized twice, once by the kernel when it gives you the memory that’s filled with 0, and once when julia has to inevitably call zero
to initialize the data properly. This will lead to users being confused about why zeros
is slower than it needs to be and should thus be avoided.
All of these are internal implementation details you cannot rely on. Do not assume these to be true.
The interface for allocation I would prefer to all options presented here looks like this:
"""
Allocate a single object of type `T`, using memory managed by `Allocator`.
Return a `T`, throws an OutOfMemory exception when it fails to allocate memory.
"""
allocate(::Allocator, ::T)
"""
Allocate `n` objects of type `T`, using memory managed by `Allocator`.
Return a collection of `T`, throws an OutOfMemory exception when it fails to allocate memory.
"""
allocate(::Allocator, ::T, n)
This is how Zig does it, and for good reason - you don’t need more. Rust has some more fanciness for deallocation. Both of these can be used in the compiler to hoist allocations to the stack if need be. The second one semantically returns an Array
(or a fixed-size equivalent) when passed a default GC.
That’s because “splitting the memory up” to create other objects out of is not a safe operation - if you allocate a large array and use that as the “backing memory” for multiple objects, you’re well on the way of reimplementing a memory manager yourself. For the regular julia GC, it would have to keep that big chunk around as long as even a single other object (that may use only a tiny portion of that memory) is still accessible. You really need the allocator passed in the API to handle that part of the operation for you.