I am wondering about the actual advantages over allocating a temporary array. That is, how much do you save? The temp is short-lived, so one would naively hope that generational gc manages to reclaim the alloc quickly. Have you tested?
Otherwise, I’d guess an init function (because precompilation) that malloc
s the buffer would be an alternative. If your blocks are reasonably large, and you want thread-safety, then you could also mmap
the buffer during initialization (one buffer for every thread). This way, you limit memory consumption: The buffers only eat a handful of bytes in the kernel until they are actually used and the kernel faults them in; and if your function is never called from other threads, then you only fault in a single buffer.
By using a julia-allocated undef
-array you rely on array.c
and libc
heuristics/thresholds for whether the memory consumption is lazy (good: almost free until faulted in) or eager (bad: julia/libc might decide during initialization/compilation that there is a juicy spot of already faulted-in memory, and then your function never gets called at runtime and you wasted all this sweet memory; terrible to reproduce, because dependent on operating system version and load/init/compile/compute order). Additionally this gives you full control over offsets (annoying false sharing; reproducible offsets make this reproducible) and you can share buffers between Float32
and ComplexF64
operations.
PS. The invocations could be
julia> fd=ccall(:memfd_create, Csize_t, (Cstring, Cuint), "foo", 0)
0x0000000000000011
julia> ccall(:ftruncate, Cint, (Cint, Csize_t), fd, 1000)
0
julia> _handle=Base.OS_HANDLE(fd)
RawFD(0x00000011)
julia> _io=open(_handle)
IOStream(<fd 17>)
julia> a1=Mmap.mmap(_io, Matrix{Int}, (4, 4)); a2 = Mmap.mmap(_io, Matrix{Float64}, (4, 4));
julia> a1[1]=1; a2[2]=3.0; @show pointer(a1), pointer(a2);
(pointer(a1), pointer(a2)) = (Ptr{Int64} @0x00007f4581673000, Ptr{Float64} @0x00007f4581672000)
julia> a1
4×4 Array{Int64,2}:
1 0 0 0
4613937818241073152 0 0 0
0 0 0 0
0 0 0 0
julia> a2
4×4 Array{Float64,2}:
4.94066e-324 0.0 0.0 0.0
3.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0
Alternatively, one can use a single mapping and use unsafe_wrap
. The above bypasses aliasing detection (different virtual memory adresses that are mapped to the same page). Memory mapping games are useful for persistent datastructures: You can create new copy-on-write mappings to the underlying fd
. Unfortunately it is very hard on linux to create new cow-mappings to some range of virtual memory, cf https://github.com/JuliaLang/julia/pull/31630.
If you use the unsafe_wrap
route, then you can also use Mmap.Anonymous()
. For some reason, mmap(::Mmap.Anonymous, args...)
needs explicit offset=0
; I should probably file a bug for that.