Understanding source of allocations when profiling

mdtisdall · August 4, 2021, 9:01pm

The (A,SA) construct arose from the discussion in another thread where it’s useful for me to operate on the data as a matrix with a particular alignment, but also have the StructArray view for convenience. To avoid repeated allocations, my thought was just to keep both views of the data around as a tuple. The use of V in my actual application is then just to keep a front- and back-buffer for operations, again, to minimize allocations. Basically, I do one “operation” that takes V[1] as input and writes the output to V[2], then I just swap the entries in V so that the current state is in the front buffer at the end of the operation.

So, in that case, I’d generally expect to take V as my argument to a function. If I make an innertestfunc that just takes V and does the circshift on it, then I get the following profiling output:

        - using StructArrays
        -
        - struct S
        -     a::Float64
        -     b::Float64
        - end
        -
        - function testfunc()
        0     A = rand(1001,2,3);
        -
    48144     B = zeros(1001,2,3);
        -
       16     SA = StructArray{S}(A, dims=3);
        -
       16     SB = StructArray{S}(B, dims=3);
        -
      224     V = [(A,SA),(B,SB)];
        -
       64     innertestfunc(V)
        - end
        -
        - function innertestfunc(V)
        0     circshift!(V[2][2].a, V[1][2].a, (1,0));
        - end

Moreover, @code_warntype for the inner function gives

julia> A = rand(1001,2,3);

julia> B = zeros(1001,2,3);

julia> SA = StructArray{S}(A, dims=3);
^[[A^[[A^[[A
julia> SB = StructArray{S}(B, dims=3);

julia> V = [(A,SA),(B,SB)];

julia> @code_warntype innertestfunc(V)
Variables
  #self#::Core.Const(innertestfunc)
  V::Vector{Tuple{Array{Float64, 3}, StructArray{S, 2, NamedTuple{(:a, :b), Tuple{SubArray{Float64, 2, Array{Float64, 3}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Int64}, true}, SubArray{Float64, 2, Array{Float64, 3}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Int64}, true}}}, Int64}}}

Body::SubArray{Float64, 2, Array{Float64, 3}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Int64}, true}
1 ─ %1 = Base.getindex(V, 2)::Tuple{Array{Float64, 3}, StructArray{S, 2, NamedTuple{(:a, :b), Tuple{SubArray{Float64, 2, Array{Float64, 3}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Int64}, true}, SubArray{Float64, 2, Array{Float64, 3}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Int64}, true}}}, Int64}}
│   %2 = Base.getindex(%1, 2)::StructArray{S, 2, NamedTuple{(:a, :b), Tuple{SubArray{Float64, 2, Array{Float64, 3}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Int64}, true}, SubArray{Float64, 2, Array{Float64, 3}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Int64}, true}}}, Int64}
│   %3 = Base.getproperty(%2, :a)::SubArray{Float64, 2, Array{Float64, 3}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Int64}, true}
│   %4 = Base.getindex(V, 1)::Tuple{Array{Float64, 3}, StructArray{S, 2, NamedTuple{(:a, :b), Tuple{SubArray{Float64, 2, Array{Float64, 3}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Int64}, true}, SubArray{Float64, 2, Array{Float64, 3}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Int64}, true}}}, Int64}}
│   %5 = Base.getindex(%4, 2)::StructArray{S, 2, NamedTuple{(:a, :b), Tuple{SubArray{Float64, 2, Array{Float64, 3}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Int64}, true}, SubArray{Float64, 2, Array{Float64, 3}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Int64}, true}}}, Int64}
│   %6 = Base.getproperty(%5, :a)::SubArray{Float64, 2, Array{Float64, 3}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Int64}, true}
│   %7 = Core.tuple(1, 0)::Core.Const((1, 0))
│   %8 = Main.circshift!(%3, %6, %7)::SubArray{Float64, 2, Array{Float64, 3}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Int64}, true}
└──      return %8

The good news is this seems to have got rid of the extra allocations. I guess this leaves me with two questions:

I can see in the REPL how I’ve got a concrete V that can be used for type inference on innertestfunc. However, I don’t see how wrapping the inner step in a separate function change the interpretation of the code at compile-time when I call it from testfunc()? I would have thought (perhaps still thinking in C/C++ idiom) that this couldn’t improve type-inference at compile time, since anything it knows when it calls innertestfunc() could just as well be known if its manually inlined back into testfunc().
I’m still unclear why A = rand(1001,2,3); appears to not be associated with any allocation.

Topic		Replies	Views
Memory allocations are not very understandable Performance	26	2030	December 12, 2017
Finding the memory allocation in some code General Usage performance	3	884	August 25, 2017
Inappropriate memory allocation? Julia at Scale question , memory	24	2231	September 4, 2017
Is simply accessing an array element really allocating? (Solved) New to Julia	8	1133	January 31, 2019
Profiling memory allocations Performance memory-allocation , profiling	1	647	March 21, 2023

Understanding source of allocations when profiling

Related topics