How to know when objects are stack or heap allocated

I’m currently writing some performance critical code and got a bit confused about stack and heap allocation in Julia. I thought that immutables are usually stack allocated but apparently that is not the case. I’ve made a simple example for reproduction. I’d very glad if someone could explain why it works the ways it does as it might help me with my actual more complex problem.

function test_alloc(size, dim)
    acc = CartesianIndex(ntuple(x -> 1, dim))

    for i in CartesianIndices(ntuple(x -> size, dim))
        acc += i
    end

    acc
end
julia> @time test_alloc(100, 3);
  0.000484 seconds

julia> @time test_alloc(10, 4);
  0.000652 seconds (40.00 k allocations: 2.136 MiB)

So what I don’t understand is why are the Cartesian indices allocated on the heap in the second case while they are not in the first case. The way I see it, the function should return much faster in the second case because there are only 10⁴ indices to accumulate as opposed to 100³ in the first case, but the memory allocation slows it down.

I don’t mean to derail the thread, but I’m a little surprised there are any instances where this does not allocate. dim is runtime information, so nutple( x -> 1 , dim ) is not type stable, which typically gets you allocations. There must be some secret sauce in the ntuple function, but the docstring only alludes to this obliquely.

Perhaps someone well versed in the julia internals has some insight.

3 Likes

I think you found it already, that’s probably it, thanks. But you pointed out something interesting nutple( x -> 1 , dim) is not type stable indeed. I created a new example where the type can be inferred at compile time.

function test_alloc_dim(size, dim::Val{N}) where N
    acc = CartesianIndex(ntuple(x -> 1, N))

    for i in CartesianIndices(ntuple(x -> size, N))
        acc += i
    end

    acc
end
julia> @time test_alloc_dim(100, Val(3));
  0.000401 seconds

julia> @time test_alloc_dim(10, Val(4));
  0.000007 seconds

julia> @time test_alloc_dim(10, Val(5));
  0.000055 seconds

Edit: The secret sauce seems to be this, found in ntuple.jl from line 46 to 50:

# inferable ntuple (enough for bootstrapping)
ntuple(f, ::Val{0}) = ()
ntuple(f, ::Val{1}) = (@inline; (f(1),))
ntuple(f, ::Val{2}) = (@inline; (f(1), f(2)))
ntuple(f, ::Val{3}) = (@inline; (f(1), f(2), f(3)))

To come back to my question, so immutables are usually allocated on the stack if their type can be inferred at compile time? I guess it also depends on how they are shared after instantiation, but in the case where they are created inside a function and not shared with anyone but the function’s caller, they should be allocated on the stack, right?

3 Likes

Its all about inference, yes. Even some mutables can be stack allocated, if they do not escape:

julia> using StaticArrays

julia> function f(::Val{N}) where {N}
           x = zero(MVector{N,Float64})
           x[1] = 1.0
           return SVector(x)
       end
f (generic function with 1 method)

julia> @btime f($(Val(3)))
  2.527 ns (0 allocations: 0 bytes)
3-element SVector{3, Float64} with indices SOneTo(3):
 1.0
 0.0
 0.0
3 Likes