# Understanding source of allocations when profiling

I’m a little confused on how to attribute some allocations that I see, depending on how I profile some code. In particular, I’m curious because when I profile this code using `julia --track-allocation=user` (running `testfunc()` once, clearing with `Profile.clear_malloc_data()`, and running `testfunc()` again before exiting the REPL, as suggested here), I get the following:

``````        - using StructArrays
-
- struct S
-     a::Float64
-     b::Float64
- end
-
- function testfunc()
0     A = rand(1001,2,3);
-
48144     B = zeros(1001,2,3);
-
16     SA = StructArray{S}(A, dims=3);
-
16     SB = StructArray{S}(B, dims=3);
-
224     V = [(A,SA),(B,SB)];
-
640     circshift!(V[2][2].a, V[1][2].a, (1,0));
- end
``````

This confuses me both because I expect `A = rand(1001,2,3);` to allocate memory, and because I expect `circshift!(V[2][2].a, V[1][2].a, (1,0));` to have no allocations.

In addition, I tried use `@btime` to make sense of this with a simplified example in the REPL as follows:

``````julia> struct S
a::Float64
b::Float64
end

julia> A = rand(1001,2,3);

julia> B = zeros(1001,2,3);

julia> SA = StructArray{S}(A, dims=3);

julia> SB = StructArray{S}(B, dims=3);

julia> V = [(A,SA),(B,SB)];

julia> @btime circshift!(\$V[2][2].a, \$V[1][2].a, (1,0));
445.631 ns (0 allocations: 0 bytes)

julia> @btime circshift!(\$(V)[2][2].a, \$(V)[1][2].a, (1,0));
449.279 ns (0 allocations: 0 bytes)

julia> @btime circshift!(V[2][2].a, V[1][2].a, (1,0));
1.501 μs (11 allocations: 992 bytes)
``````

However, this doesn’t really clear it up for me. Does the memory I’m seeing when I don’t interpolate the variables relate to the same source of allocation as when I’m doing the runtime profiling method, or are these just totally unrelated?

The allocations in `@btime` stem from the fact that you’re accessing a global variable in your benchmark, not from the code itself.

What does `@code_warntype` say about your `testfunc()`? Could it be that it’s type unstable because of the `dims=3` in the creation of structarrays, since it’s not known to the compiler that the accesses to `V` in the last line are in bounds?

`@code_warntype` gives the following output.

``````julia> @code_warntype testfunc()
Variables
#self#::Core.Const(testfunc)
V::Vector{_A} where _A
SB::StructArray{S, _A, _B, _C} where {_A, _B<:Union{Tuple, NamedTuple}, _C}
SA::StructArray{S, _A, _B, _C} where {_A, _B<:Union{Tuple, NamedTuple}, _C}
B::Array{Float64, 3}
A::Array{Float64, 3}

Body::AbstractArray
1 ─       (A = Main.rand(1001, 2, 3))
│         (B = Main.zeros(1001, 2, 3))
│   %3  = Core.apply_type(Main.StructArray, Main.S)::Core.Const(StructArray{S, N, C, I} where {N, C<:Union{Tuple, NamedTuple}, I})
│   %4  = (:dims,)::Core.Const((:dims,))
│   %5  = Core.apply_type(Core.NamedTuple, %4)::Core.Const(NamedTuple{(:dims,), T} where T<:Tuple)
│   %6  = Core.tuple(3)::Core.Const((3,))
│   %7  = (%5)(%6)::Core.Const((dims = 3,))
│   %8  = Core.kwfunc(%3)::Core.Const(Core.var"#Type##kw"())
│         (SA = (%8)(%7, %3, A))
│   %10 = Core.apply_type(Main.StructArray, Main.S)::Core.Const(StructArray{S, N, C, I} where {N, C<:Union{Tuple, NamedTuple}, I})
│   %11 = (:dims,)::Core.Const((:dims,))
│   %12 = Core.apply_type(Core.NamedTuple, %11)::Core.Const(NamedTuple{(:dims,), T} where T<:Tuple)
│   %13 = Core.tuple(3)::Core.Const((3,))
│   %14 = (%12)(%13)::Core.Const((dims = 3,))
│   %15 = Core.kwfunc(%10)::Core.Const(Core.var"#Type##kw"())
│         (SB = (%15)(%14, %10, B))
│   %17 = Core.tuple(A, SA)::Tuple{Array{Float64, 3}, StructArray{S, _A, _B, _C} where {_A, _B<:Union{Tuple, NamedTuple}, _C}}
│   %18 = Core.tuple(B, SB)::Tuple{Array{Float64, 3}, StructArray{S, _A, _B, _C} where {_A, _B<:Union{Tuple, NamedTuple}, _C}}
│         (V = Base.vect(%17, %18))
│   %20 = Base.getindex(V, 2)::Any
│   %21 = Base.getindex(%20, 2)::Any
│   %22 = Base.getproperty(%21, :a)::Any
│   %23 = Base.getindex(V, 1)::Any
│   %24 = Base.getindex(%23, 2)::Any
│   %25 = Base.getproperty(%24, :a)::Any
│   %26 = Core.tuple(1, 0)::Core.Const((1, 0))
│   %27 = Main.circshift!(%22, %25, %26)::AbstractArray
└──       return %27
``````

I’m still too new to Julia to really understand this output. If I read this right, though, it seems that `V` is not recognized as a Vector-of-Tuples. Is this something I can annotate in the code, and would that help performance?

All those `Any` there mean that the compiler can’t figure out what type will come out of accessing `V`. The core problem lies in

``````%17 = Core.tuple(A, SA)::Tuple{Array{Float64, 3}, StructArray{S, _A, _B, _C} where {_A, _B<:Union{Tuple, NamedTuple}, _C}}
%18 = Core.tuple(B, SB)::Tuple{Array{Float64, 3}, StructArray{S, _A, _B, _C} where {_A, _B<:Union{Tuple, NamedTuple}, _C}}
``````

because it’s here that the compiler doesn’t know enough about what kind of `StructArray` you’re creating. I’m kind of confused why you’d create a `StructArray{S}` from an `Array{Float64,3}` in the first place, and it seems the compiler agrees. It tries to create a common type for

``````V = [(A,SA),(B,SB)]
``````

and can only come up with `Any` as a common super type for these two, since they’re basically both some tuple of some array and some StructArray, which only share `Any` as their supertype.

One way around this problem would be to create a function barier by putting everything after the creation of your StructArrays in its own function (or just create the StructArrays outside of your `testfunc` and pass them in, which is a little more julian and flows nicely with the common pattern of preallocating your arrays).

The `(A,SA)` construct arose from the discussion in another thread where it’s useful for me to operate on the data as a matrix with a particular alignment, but also have the StructArray view for convenience. To avoid repeated allocations, my thought was just to keep both views of the data around as a tuple. The use of `V` in my actual application is then just to keep a front- and back-buffer for operations, again, to minimize allocations. Basically, I do one “operation” that takes `V[1]` as input and writes the output to `V[2]`, then I just swap the entries in `V` so that the current state is in the front buffer at the end of the operation.

So, in that case, I’d generally expect to take `V` as my argument to a function. If I make an `innertestfunc` that just takes `V` and does the `circshift` on it, then I get the following profiling output:

``````        - using StructArrays
-
- struct S
-     a::Float64
-     b::Float64
- end
-
- function testfunc()
0     A = rand(1001,2,3);
-
48144     B = zeros(1001,2,3);
-
16     SA = StructArray{S}(A, dims=3);
-
16     SB = StructArray{S}(B, dims=3);
-
224     V = [(A,SA),(B,SB)];
-
64     innertestfunc(V)
- end
-
- function innertestfunc(V)
0     circshift!(V[2][2].a, V[1][2].a, (1,0));
- end
``````

Moreover, `@code_warntype` for the inner function gives

``````julia> A = rand(1001,2,3);

julia> B = zeros(1001,2,3);

julia> SA = StructArray{S}(A, dims=3);
julia> SB = StructArray{S}(B, dims=3);

julia> V = [(A,SA),(B,SB)];

julia> @code_warntype innertestfunc(V)
Variables
#self#::Core.Const(innertestfunc)
V::Vector{Tuple{Array{Float64, 3}, StructArray{S, 2, NamedTuple{(:a, :b), Tuple{SubArray{Float64, 2, Array{Float64, 3}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Int64}, true}, SubArray{Float64, 2, Array{Float64, 3}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Int64}, true}}}, Int64}}}

Body::SubArray{Float64, 2, Array{Float64, 3}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Int64}, true}
1 ─ %1 = Base.getindex(V, 2)::Tuple{Array{Float64, 3}, StructArray{S, 2, NamedTuple{(:a, :b), Tuple{SubArray{Float64, 2, Array{Float64, 3}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Int64}, true}, SubArray{Float64, 2, Array{Float64, 3}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Int64}, true}}}, Int64}}
│   %2 = Base.getindex(%1, 2)::StructArray{S, 2, NamedTuple{(:a, :b), Tuple{SubArray{Float64, 2, Array{Float64, 3}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Int64}, true}, SubArray{Float64, 2, Array{Float64, 3}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Int64}, true}}}, Int64}
│   %3 = Base.getproperty(%2, :a)::SubArray{Float64, 2, Array{Float64, 3}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Int64}, true}
│   %4 = Base.getindex(V, 1)::Tuple{Array{Float64, 3}, StructArray{S, 2, NamedTuple{(:a, :b), Tuple{SubArray{Float64, 2, Array{Float64, 3}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Int64}, true}, SubArray{Float64, 2, Array{Float64, 3}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Int64}, true}}}, Int64}}
│   %5 = Base.getindex(%4, 2)::StructArray{S, 2, NamedTuple{(:a, :b), Tuple{SubArray{Float64, 2, Array{Float64, 3}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Int64}, true}, SubArray{Float64, 2, Array{Float64, 3}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Int64}, true}}}, Int64}
│   %6 = Base.getproperty(%5, :a)::SubArray{Float64, 2, Array{Float64, 3}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Int64}, true}
│   %7 = Core.tuple(1, 0)::Core.Const((1, 0))
│   %8 = Main.circshift!(%3, %6, %7)::SubArray{Float64, 2, Array{Float64, 3}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Int64}, true}
└──      return %8
``````

The good news is this seems to have got rid of the extra allocations. I guess this leaves me with two questions:

1. I can see in the REPL how I’ve got a concrete V that can be used for type inference on `innertestfunc`. However, I don’t see how wrapping the inner step in a separate function change the interpretation of the code at compile-time when I call it from `testfunc()`? I would have thought (perhaps still thinking in C/C++ idiom) that this couldn’t improve type-inference at compile time, since anything it knows when it calls `innertestfunc()` could just as well be known if its manually inlined back into `testfunc()`.

2. I’m still unclear why `A = rand(1001,2,3);` appears to not be associated with any allocation.