The (A,SA)
construct arose from the discussion in another thread where it’s useful for me to operate on the data as a matrix with a particular alignment, but also have the StructArray view for convenience. To avoid repeated allocations, my thought was just to keep both views of the data around as a tuple. The use of V
in my actual application is then just to keep a front and backbuffer for operations, again, to minimize allocations. Basically, I do one “operation” that takes V[1]
as input and writes the output to V[2]
, then I just swap the entries in V
so that the current state is in the front buffer at the end of the operation.
So, in that case, I’d generally expect to take V
as my argument to a function. If I make an innertestfunc
that just takes V
and does the circshift
on it, then I get the following profiling output:
 using StructArrays

 struct S
 a::Float64
 b::Float64
 end

 function testfunc()
0 A = rand(1001,2,3);

48144 B = zeros(1001,2,3);

16 SA = StructArray{S}(A, dims=3);

16 SB = StructArray{S}(B, dims=3);

224 V = [(A,SA),(B,SB)];

64 innertestfunc(V)
 end

 function innertestfunc(V)
0 circshift!(V[2][2].a, V[1][2].a, (1,0));
 end
Moreover, @code_warntype
for the inner function gives
julia> A = rand(1001,2,3);
julia> B = zeros(1001,2,3);
julia> SA = StructArray{S}(A, dims=3);
^[[A^[[A^[[A
julia> SB = StructArray{S}(B, dims=3);
julia> V = [(A,SA),(B,SB)];
julia> @code_warntype innertestfunc(V)
Variables
#self#::Core.Const(innertestfunc)
V::Vector{Tuple{Array{Float64, 3}, StructArray{S, 2, NamedTuple{(:a, :b), Tuple{SubArray{Float64, 2, Array{Float64, 3}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Int64}, true}, SubArray{Float64, 2, Array{Float64, 3}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Int64}, true}}}, Int64}}}
Body::SubArray{Float64, 2, Array{Float64, 3}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Int64}, true}
1 ─ %1 = Base.getindex(V, 2)::Tuple{Array{Float64, 3}, StructArray{S, 2, NamedTuple{(:a, :b), Tuple{SubArray{Float64, 2, Array{Float64, 3}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Int64}, true}, SubArray{Float64, 2, Array{Float64, 3}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Int64}, true}}}, Int64}}
│ %2 = Base.getindex(%1, 2)::StructArray{S, 2, NamedTuple{(:a, :b), Tuple{SubArray{Float64, 2, Array{Float64, 3}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Int64}, true}, SubArray{Float64, 2, Array{Float64, 3}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Int64}, true}}}, Int64}
│ %3 = Base.getproperty(%2, :a)::SubArray{Float64, 2, Array{Float64, 3}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Int64}, true}
│ %4 = Base.getindex(V, 1)::Tuple{Array{Float64, 3}, StructArray{S, 2, NamedTuple{(:a, :b), Tuple{SubArray{Float64, 2, Array{Float64, 3}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Int64}, true}, SubArray{Float64, 2, Array{Float64, 3}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Int64}, true}}}, Int64}}
│ %5 = Base.getindex(%4, 2)::StructArray{S, 2, NamedTuple{(:a, :b), Tuple{SubArray{Float64, 2, Array{Float64, 3}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Int64}, true}, SubArray{Float64, 2, Array{Float64, 3}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Int64}, true}}}, Int64}
│ %6 = Base.getproperty(%5, :a)::SubArray{Float64, 2, Array{Float64, 3}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Int64}, true}
│ %7 = Core.tuple(1, 0)::Core.Const((1, 0))
│ %8 = Main.circshift!(%3, %6, %7)::SubArray{Float64, 2, Array{Float64, 3}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Int64}, true}
└── return %8
The good news is this seems to have got rid of the extra allocations. I guess this leaves me with two questions:

I can see in the REPL how I’ve got a concrete V that can be used for type inference on innertestfunc
. However, I don’t see how wrapping the inner step in a separate function change the interpretation of the code at compiletime when I call it from testfunc()
? I would have thought (perhaps still thinking in C/C++ idiom) that this couldn’t improve typeinference at compile time, since anything it knows when it calls innertestfunc()
could just as well be known if its manually inlined back into testfunc()
.

I’m still unclear why A = rand(1001,2,3);
appears to not be associated with any allocation.