How to know when objects are stack or heap allocated

opiateblush · August 22, 2023, 10:34am

I’m currently writing some performance critical code and got a bit confused about stack and heap allocation in Julia. I thought that immutables are usually stack allocated but apparently that is not the case. I’ve made a simple example for reproduction. I’d very glad if someone could explain why it works the ways it does as it might help me with my actual more complex problem.

function test_alloc(size, dim)
    acc = CartesianIndex(ntuple(x -> 1, dim))

    for i in CartesianIndices(ntuple(x -> size, dim))
        acc += i
    end

    acc
end

julia> @time test_alloc(100, 3);
  0.000484 seconds

julia> @time test_alloc(10, 4);
  0.000652 seconds (40.00 k allocations: 2.136 MiB)

So what I don’t understand is why are the Cartesian indices allocated on the heap in the second case while they are not in the first case. The way I see it, the function should return much faster in the second case because there are only 10⁴ indices to accumulate as opposed to 100³ in the first case, but the memory allocation slows it down.

JonasWickman · August 22, 2023, 11:08am

I don’t mean to derail the thread, but I’m a little surprised there are any instances where this does not allocate. dim is runtime information, so nutple( x -> 1 , dim ) is not type stable, which typically gets you allocations. There must be some secret sauce in the ntuple function, but the docstring only alludes to this obliquely.

Perhaps someone well versed in the julia internals has some insight.

opiateblush · August 22, 2023, 11:33am

I think you found it already, that’s probably it, thanks. But you pointed out something interesting nutple( x -> 1 , dim) is not type stable indeed. I created a new example where the type can be inferred at compile time.

function test_alloc_dim(size, dim::Val{N}) where N
    acc = CartesianIndex(ntuple(x -> 1, N))

    for i in CartesianIndices(ntuple(x -> size, N))
        acc += i
    end

    acc
end

julia> @time test_alloc_dim(100, Val(3));
  0.000401 seconds

julia> @time test_alloc_dim(10, Val(4));
  0.000007 seconds

julia> @time test_alloc_dim(10, Val(5));
  0.000055 seconds

Edit: The secret sauce seems to be this, found in ntuple.jl from line 46 to 50:

# inferable ntuple (enough for bootstrapping)
ntuple(f, ::Val{0}) = ()
ntuple(f, ::Val{1}) = (@inline; (f(1),))
ntuple(f, ::Val{2}) = (@inline; (f(1), f(2)))
ntuple(f, ::Val{3}) = (@inline; (f(1), f(2), f(3)))

To come back to my question, so immutables are usually allocated on the stack if their type can be inferred at compile time? I guess it also depends on how they are shared after instantiation, but in the case where they are created inside a function and not shared with anyone but the function’s caller, they should be allocated on the stack, right?

lmiq · August 22, 2023, 12:52pm

Its all about inference, yes. Even some mutables can be stack allocated, if they do not escape:

julia> using StaticArrays

julia> function f(::Val{N}) where {N}
           x = zero(MVector{N,Float64})
           x[1] = 1.0
           return SVector(x)
       end
f (generic function with 1 method)

julia> @btime f($(Val(3)))
  2.527 ns (0 allocations: 0 bytes)
3-element SVector{3, Float64} with indices SOneTo(3):
 1.0
 0.0
 0.0

marteaua · November 6, 2024, 6:12am

Do you know why a Julia Vector (that still doesn’t escape) is heap allocated anyway?

julia> using StaticArrays
julia> function f2(::Val{N}) where {N}
           x = zeros(Float64,N)
           x[1] = 1.0
           return SVector{N}(x)
       end
julia> @btime f2($(Val(3)))
    18.235 ns (1 allocation: 80 bytes)

In this case, the return value of f2 could be completely determined at compile time as N is known…

Zentrik · November 6, 2024, 6:29am

The compiler is not smart enough yet. Though it is improving, see make `memorynew` intrinsic by oscardssmith · Pull Request #55913 · JuliaLang/julia · GitHub.

Vasily_Pisarev · November 6, 2024, 10:28am

The root of the problem, I guess, is that Vector’s size is not encoded in the type signature.
So, even if the compiler can prove it does not escape, that does not automatically mean it knows how much space is needed to allocate it on stack.

Palli · November 8, 2024, 1:12am

@Oscar_Smith:

we probably want a size limit on putting Memory on the stack to avoid stackoverflow. For larger ones, we could potentially inline the free so the Memory doesn’t have to be swept by the GC, etc

Do you know a likely limit to how larger you want to stack allocate? Not that it should be relied upon. And as you say even free when larger (sort of similar to optimization when using Bumper.jl):

Very interesting, but I note:

Note: FixedSizeArrays are not guaranteed to be stack-allocated, in fact they will more likely not be stack-allocated. However, in some extremely simple cases the compiler may be able to completely elide their allocations:

So it’s up to the compiler, and you can get a pointer to the heap-allocated object (always?), but when an object is allocated on the stack, you could in theory get a pointer to it, though you likely do NOT want to (if you try and get it then it would spell trouble, so I guess the compiler takes it into account and then doesn’t do it).

There’s a third case, when objects are small enough that you can put in registers, so not even theoretically possible to get a pointer.

Topic		Replies	Views
Why Are Statically Sized Arrays Immutable? General Usage question , staticarrays , immutable	30	1036	April 25, 2025
A nice explanation of memory stack vs. heap Offtopic memory-allocation	11	13059	January 29, 2021
StaticArrays and allocations General Usage	27	2278	May 31, 2021
Unable to eliminate an unwanted allocation Performance memory-allocation	23	784	May 16, 2022
Why mutable structs are allocated on the heap? General Usage question	36	7945	September 26, 2024

How to know when objects are stack or heap allocated

Related topics