Understanding source of allocations when profiling

The (A,SA) construct arose from the discussion in another thread where it’s useful for me to operate on the data as a matrix with a particular alignment, but also have the StructArray view for convenience. To avoid repeated allocations, my thought was just to keep both views of the data around as a tuple. The use of V in my actual application is then just to keep a front- and back-buffer for operations, again, to minimize allocations. Basically, I do one “operation” that takes V[1] as input and writes the output to V[2], then I just swap the entries in V so that the current state is in the front buffer at the end of the operation.

So, in that case, I’d generally expect to take V as my argument to a function. If I make an innertestfunc that just takes V and does the circshift on it, then I get the following profiling output:

        - using StructArrays
        -
        - struct S
        -     a::Float64
        -     b::Float64
        - end
        -
        - function testfunc()
        0     A = rand(1001,2,3);
        -
    48144     B = zeros(1001,2,3);
        -
       16     SA = StructArray{S}(A, dims=3);
        -
       16     SB = StructArray{S}(B, dims=3);
        -
      224     V = [(A,SA),(B,SB)];
        -
       64     innertestfunc(V)
        - end
        -
        - function innertestfunc(V)
        0     circshift!(V[2][2].a, V[1][2].a, (1,0));
        - end

Moreover, @code_warntype for the inner function gives

julia> A = rand(1001,2,3);

julia> B = zeros(1001,2,3);

julia> SA = StructArray{S}(A, dims=3);
^[[A^[[A^[[A
julia> SB = StructArray{S}(B, dims=3);

julia> V = [(A,SA),(B,SB)];

julia> @code_warntype innertestfunc(V)
Variables
  #self#::Core.Const(innertestfunc)
  V::Vector{Tuple{Array{Float64, 3}, StructArray{S, 2, NamedTuple{(:a, :b), Tuple{SubArray{Float64, 2, Array{Float64, 3}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Int64}, true}, SubArray{Float64, 2, Array{Float64, 3}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Int64}, true}}}, Int64}}}

Body::SubArray{Float64, 2, Array{Float64, 3}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Int64}, true}
1 ─ %1 = Base.getindex(V, 2)::Tuple{Array{Float64, 3}, StructArray{S, 2, NamedTuple{(:a, :b), Tuple{SubArray{Float64, 2, Array{Float64, 3}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Int64}, true}, SubArray{Float64, 2, Array{Float64, 3}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Int64}, true}}}, Int64}}
│   %2 = Base.getindex(%1, 2)::StructArray{S, 2, NamedTuple{(:a, :b), Tuple{SubArray{Float64, 2, Array{Float64, 3}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Int64}, true}, SubArray{Float64, 2, Array{Float64, 3}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Int64}, true}}}, Int64}
│   %3 = Base.getproperty(%2, :a)::SubArray{Float64, 2, Array{Float64, 3}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Int64}, true}
│   %4 = Base.getindex(V, 1)::Tuple{Array{Float64, 3}, StructArray{S, 2, NamedTuple{(:a, :b), Tuple{SubArray{Float64, 2, Array{Float64, 3}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Int64}, true}, SubArray{Float64, 2, Array{Float64, 3}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Int64}, true}}}, Int64}}
│   %5 = Base.getindex(%4, 2)::StructArray{S, 2, NamedTuple{(:a, :b), Tuple{SubArray{Float64, 2, Array{Float64, 3}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Int64}, true}, SubArray{Float64, 2, Array{Float64, 3}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Int64}, true}}}, Int64}
│   %6 = Base.getproperty(%5, :a)::SubArray{Float64, 2, Array{Float64, 3}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Int64}, true}
│   %7 = Core.tuple(1, 0)::Core.Const((1, 0))
│   %8 = Main.circshift!(%3, %6, %7)::SubArray{Float64, 2, Array{Float64, 3}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Int64}, true}
└──      return %8

The good news is this seems to have got rid of the extra allocations. I guess this leaves me with two questions:

  1. I can see in the REPL how I’ve got a concrete V that can be used for type inference on innertestfunc. However, I don’t see how wrapping the inner step in a separate function change the interpretation of the code at compile-time when I call it from testfunc()? I would have thought (perhaps still thinking in C/C++ idiom) that this couldn’t improve type-inference at compile time, since anything it knows when it calls innertestfunc() could just as well be known if its manually inlined back into testfunc().

  2. I’m still unclear why A = rand(1001,2,3); appears to not be associated with any allocation.