Spurious memory allocations within function

Hi everyone,

I am trying to reduce memory allocations in a function through preallocations. However, implementing the Perfomance Tips still leaves a significant amount of probably unnecessary allocations, which I can circumvent if I write the code outside of the function. I have reduced the code to this trivial example:

julia> n = 10;

julia> T = UInt32;

julia> M = rand(T, (n,n));

julia> V = zeros(T, n);

julia> function f(T, M::Matrix, n::Int64)
           V = zeros(T, n);
           @views @inbounds for i = 1:n
               V .= view(M,:, i);
           end
       end;

julia> @btime f($T,$M,$n)

  400.000 ns (11 allocations: 576 bytes)

julia> @btime for i = 1:$n
           copyto!($V, view($M,:, $i));
       end
  48.927 ns (0 allocations: 0 bytes)

The same behaviour is achieved when replacing the .= by copyto!. An even simpler example is the following


julia> f(M::Matrix) = [view(M,:,i) for i = 1:n];

julia> @btime f(M);
  143.457 ns (4 allocations: 576 bytes)

julia> @btime [view($M,:,$i) for i = 1:$n];
  59.440 ns (1 allocation: 496 bytes)

What’s the origin of the additional allocations and how can they be avoided?

Since this is quite a basic question, I assume there is some documentation or discussion on this, but I haven’t been lucky so far.

In the first function you have two things going on:

You actually explicitly create a vector inside the function. That’s an allocation right there. If you want to avoid this, create the vector outside, and pass it into the function.

The second thing is the use of the parameter T. The function will not specialize to the type UInt32. T is a value, not a type in this case, and the compiler only sees it as a DataType. If you want specialization, write

function f(::Type{T}, M::Matrix, n::Int64) 
2 Likes

Here, f accesses the global variable n. Send n as an input argument.

1 Like

Thanks so much! I am aware of the explicit allocation, however I didn’t expect so many different allocations to happen. In my actual use case, I am bound by the allocations that occur in every iteration of the loop. So maybe this example is better suited to demonstrate my issue:


julia> function f(::Type{T}, M::Matrix, n::Int64, k::Int64)
           V = zeros(T, n);
           @inbounds for _ = 1:k
               copyto!(V, view(M,:, 1));
           end
       end;

julia> n1 = 1000;

julia> n2= 100_000;

julia> @btime f($T,$M,$n,n1)
  13.041 ÎĽs (1001 allocations: 46.97 KiB)

julia> @btime f($T,$M,$n,n2)
  1.311 ms (100001 allocations: 4.58 MiB)

julia> @btime @inbounds for _ = 1:$n1
           copyto!($V, view($M,:, 1));
       end
  4.744 ÎĽs (0 allocations: 0 bytes)

julia> @btime @inbounds for _ = 1:$n2
           copyto!($V, view($M,:, 1));
       end
  473.041 ÎĽs (0 allocations: 0 bytes)

I’m not at my computer, and must admit I don’t immediately see the cause. I notice you didn’t interpolate n1 and n2, but I doubt that is the cause.

(Aside: does it not make more sense to pass T as part of M’s signature?)

Yes of course, you’re right that solved the issue in the trivial example. Based on that, I have also tried to narrow it down and outsourced the allocation of V and pass it as an argument instead:

julia> n = 10;

julia> T = UInt32;

julia> M = rand(T, (n,n));

julia> V = zeros(T, n);

julia> function f1(::Type{T}, M::Matrix{T}, n::Int64, k::Int64)
           V::Vector{T} = zeros(T, n);
           for _ = 1:k
               copyto!(V, view(M,:, 1));
           end
       end;

julia> function f2(::Type{T}, M::Matrix{T}, V::Vector{T}, n::Int64, k::Int64)
           for _ = 1:k
               copyto!(V, view(M,:, 1));
           end
       end;

julia> n2= 100_000;

julia> @btime f1($T,$M,$n,$n2)
  1.286 ms (100002 allocations: 4.58 MiB)

julia> @btime f2($T,$M,$V,$n,$n2)
  473.208 ÎĽs (1 allocation: 16 bytes)

So it appears that V gets reallocated in every iteration of the loop, but I don’t understand why? It is probably related to scoping but shouldn’t V be still in scope?

You forgot to add where {T} after your function signature. The signature should be

function f3(::Type{T}, M::Matrix{T}, n::Int64, k::Int64) where {T}

(though frankly, you don’t need ::Type{T}, since T is already in Matrix{T}).

You may have to restart your REPL to see the effect, though.

Also, don’t do this:

Remove the type assertion, just write V = zeros(T, n).

1 Like

Thanks so much, that’s the solution! Not sure I understand what’s going on here though :sweat_smile:

I think I have a rough idea. Normally

f3(::Type{T}, M::Matrix{T}, n::Int64, k::Int64)

would not be allowed, and you would get ERROR: UndefVarError: T not defined. You need to tell Julia that T is a type variable or parameter, which you do by appending where {T}, which basically means “where T is a type parameter”.

If you don’t have the where clause, Julia will try to look up what T means, and in this case it was already defined earlier in your code as T = UInt32. This will then be used, and your slow function defnition is (almost):

function f3(::Type{UInt32}, M::Matrix{UInt32}, n::Int64, k::Int64)

except the type parameter is stored in the variable T, which happens to be a non-const global variable. So, firstly, your definition will only work for UInt32, and, secondly, it will be slow.

Adding where {T} tells Julia that T is a parameter, not a particular global value.

3 Likes

The where { T } bit can also be written as where { T <: Any }. You’re essentially telling the Julia compiler that you want to allow a different version of the function to be compiled for any element type of the incoming matrix.