I’m currently writing some performance critical code and got a bit confused about stack and heap allocation in Julia. I thought that immutables are usually stack allocated but apparently that is not the case. I’ve made a simple example for reproduction. I’d very glad if someone could explain why it works the ways it does as it might help me with my actual more complex problem.
function test_alloc(size, dim)
acc = CartesianIndex(ntuple(x -> 1, dim))
for i in CartesianIndices(ntuple(x -> size, dim))
acc += i
end
acc
end
So what I don’t understand is why are the Cartesian indices allocated on the heap in the second case while they are not in the first case. The way I see it, the function should return much faster in the second case because there are only 10⁴ indices to accumulate as opposed to 100³ in the first case, but the memory allocation slows it down.
I don’t mean to derail the thread, but I’m a little surprised there are any instances where this does not allocate. dim is runtime information, so nutple( x -> 1 , dim ) is not type stable, which typically gets you allocations. There must be some secret sauce in the ntuple function, but the docstring only alludes to this obliquely.
Perhaps someone well versed in the julia internals has some insight.
I think you found it already, that’s probably it, thanks. But you pointed out something interesting nutple( x -> 1 , dim) is not type stable indeed. I created a new example where the type can be inferred at compile time.
function test_alloc_dim(size, dim::Val{N}) where N
acc = CartesianIndex(ntuple(x -> 1, N))
for i in CartesianIndices(ntuple(x -> size, N))
acc += i
end
acc
end
To come back to my question, so immutables are usually allocated on the stack if their type can be inferred at compile time? I guess it also depends on how they are shared after instantiation, but in the case where they are created inside a function and not shared with anyone but the function’s caller, they should be allocated on the stack, right?
Its all about inference, yes. Even some mutables can be stack allocated, if they do not escape:
julia> using StaticArrays
julia> function f(::Val{N}) where {N}
x = zero(MVector{N,Float64})
x[1] = 1.0
return SVector(x)
end
f (generic function with 1 method)
julia> @btime f($(Val(3)))
2.527 ns (0 allocations: 0 bytes)
3-element SVector{3, Float64} with indices SOneTo(3):
1.0
0.0
0.0
The root of the problem, I guess, is that Vector’s size is not encoded in the type signature.
So, even if the compiler can prove it does not escape, that does not automatically mean it knows how much space is needed to allocate it on stack.
we probably want a size limit on putting Memory on the stack to avoid stackoverflow. For larger ones, we could potentially inline the free so the Memory doesn’t have to be swept by the GC, etc
Do you know a likely limit to how larger you want to stack allocate? Not that it should be relied upon. And as you say even free when larger (sort of similar to optimization when using Bumper.jl):
Very interesting, but I note:
Note: FixedSizeArrays are not guaranteed to be stack-allocated, in fact they will more likely not be stack-allocated. However, in some extremely simple cases the compiler may be able to completely elide their allocations:
So it’s up to the compiler, and you can get a pointer to the heap-allocated object (always?), but when an object is allocated on the stack, you could in theory get a pointer to it, though you likely do NOT want to (if you try and get it then it would spell trouble, so I guess the compiler takes it into account and then doesn’t do it).
There’s a third case, when objects are small enough that you can put in registers, so not even theoretically possible to get a pointer.