Use and define free_aligned_sized, aligned_alloc (and free_sized)

E.g. for Libc.

Currently Julia uses posix_memalign (except on Windows) for itself (sometimes) and corresponding free, not special free for aligned, as needed on Windows, but we likely should everywhere:

Several heap allocators1 (hereafter, “allocators”) expose as extensions variants of free that accept, in addition to the address of the allocation, an additional argument “reminding” the allocator of the size of that allocation. These extensions can reduce deallocation cost by 30%, allow extra security-hardening functionality, and currently ship in several implementations.

Julia bypasses Libc.malloc, has its own pool system which is likely redundant, and is a problem for me (when I tried to test mimalloc).

I’m not sure we would get a speedup or improved safety unless we disable Julia’s pools (that can be a separate decision; and discussion for later). I.e. for most allocations. But there are more allocations in Julia that go straight to malloc, or malloc_s, or some other Julia wrapper for. I see for thread-local storage, aligned malloc is used, and memset after, so in effect calloc, and I considered changing to calloc. But first of, this is likely not speed-critical (and only done once per thread?) and there’s no corresponding aligned malloc I found out, in the standard…

The problem for free_aligned_sized is that it’s only defined in C23, but it needs not be a problem. I was confused at first seeing it only working for aligned_alloc, and free_sized only working for other allocations, but in fact a legal implementation for both is just doing regular free.

So for Libc, we could do that, unless Julia is compiled with C23, and we should do that by default, but not have it as a requirement with a fallback for such libc library/allocators. And users of Julia’s Libc library can opt into the new allocators and deallocators gradually.

FYI:

is_sufficiently_aligned (C++26) checks whether the pointer points to an object whose alignment has at least the given value

Informs the implementation that the object ptr points to is aligned to at least N. The implementation may use this information to generate more efficient code, but it might only make this assumption if the object is accessed via the return value of assume_aligned.

N must be a power of 2.

Types for composite class design (since C++26)

Defined in header <memory>
indirect

(C++26) a wrapper containing dynamically-allocated object with value-like semantics
(class template)
polymorphic

(C++26) a polymorphic wrapper containing dynamically-allocated object with value-like semantics
(class template)

Apart from alignment, there is the issue of on-demand paging: calloc can be either malloc+memset, or it can be an mmap, at the discretion of your allocator’s internal state.

Often mmap and lazy paging is preferable.

However, specifically for thread initialization, laziness is super scary:

Some thread might be in a spinlock critical section when the pagefault hits. And julia spinlocks really spin forever, instead of falling back to a futex after some time. (^this is something we should fix. True spinlocks are almost never the answer, because performance tanks if the thread holding the crit section gets preempted by the OS or pagefaults)

2 Likes