Why these two functions have different allocations?

I want to avoid any excess allocation. The f1! below needs 2m+1 allocations while the f2! needs m+1 allocations.
is ito possible to make this kind of functionality 1 allocation only?

function f1!(du, u)
    m,n = size(u)
    rAA = zeros(n)
    for i = 1:m
        A, B, C, D, E = u[i, :]
        rAA .= [0.1*A, 0.2*B, 0.1*C, 0.2*D, 0.1*E]
        lmul!(0.5, rAA)
        du[i,:] .= rAA
    end
end
function f2!(du, u)
    m,n = size(u)
    rAA = zeros(n)
    for i = 1:m
        A, B, C, D, E = u[i, :]
        rAA[1] = 0.1*A
        rAA[2] = 0.2*B
        rAA[3] = 0.1*C
        rAA[4] = 0.2*D
        rAA[5] = 0.1*E
        lmul!(0.5, rAA)
        du[i,:] .= rAA
    end
end

this needs allocation

It seems like you can just annotate @views in the for loop:

function f3!(du, u)
    m,n = size(u)
    rAA = zeros(n)
    @views for i = 1:m
        A, B, C, D, E = u[i, :]
        rAA[1] = 0.1*A
        rAA[2] = 0.2*B
        rAA[3] = 0.1*C
        rAA[4] = 0.2*D
        rAA[5] = 0.1*E
        lmul!(0.5, rAA)
        du[i,:] .= rAA
    end
end
1 Like

you can replace that by a tuple, and that should be non-allocating:

rAA .= (0.1*A, 0.2*B, 0.1*C, 0.2*D, 0.1*E)

Thank you! This creats what I want. But… can you give some hints so I can figure out why? Thanks a lot!

Thanks also! Then i don’t have to write excess code to assign values for each elements.

Yes I prefer the notation of @lmiq too. The u[i,:] allocates, so annotating the view on that is enough to remove it.

function f4!(du, u)
    m,n = size(u)
    rAA = zeros(n)
    for i = 1:m
        A, B, C, D, E = @view u[i, :]
        rAA .= (0.1*A, 0.2*B, 0.1*C, 0.2*D, 0.1*E)
        lmul!(0.5, rAA)
        du[i,:] .= rAA
    end
end

Also, note that it may be faster to permute u so that when you extract A, B, C, D, E it happens in column major order (i.e. you’ll have u[:,i]).

u[i, :] allocates a new vector, so avoid that as well.

function f2!(du, u)
    m,n = size(u)
    rAA = zeros(n)
    for i = 1:m
        rAA[1] = 0.1*u[i, 1]
        rAA[2] = 0.2*u[i, 2]
        rAA[3] = 0.1*u[i, 3]
        rAA[4] = 0.2*u[i, 4]
        rAA[5] = 0.1*u[i, 5]
        lmul!(0.5, rAA)
        du[i,:] .= rAA
    end
end

Some more notes if you care about actual speed and not just allocations:

  • Julia arrays are column-major, so u[i,:] and du[i,:] is not the preferred order of accessing these (if they are actual Arrays).
  • The last two lines in the loop operate on the complete array rAA even though only the first 5 elements of rAA ever change. This could be easily avoided. It looks like rAA only has five elements (I looked at f2! where that isn’t necessarily the case).
1 Like

Thank you! I understand Julia is column-major I have to use like this because in my code each line is the concentration of different species at one location. The calculation of rAA is not as simple as the code here but requires the concentration of different species.
But do you think coding like

A = @view u[1, :]
B = @view u[2, :]
...

and call A[i] B[i] in the loop will be better?

I wouldn’t expect that to perform any better. It could actually be worse with u in it’s current shape because of the memory access order, but you won’t know until you try it.

Same with permuting u. Maybe outside this function you need u in the shape it is now, but depending on its dimensions, the function might be faster for your application if inside the function you permutedims(u). This makes a copy (costs allocations and time) but then extracting the elements will be faster - this may or may not be faster overall. I tried it here with u sized (300, 5) and it wasn’t worth it to permutedims(u) inside the function f4!.

The best you can do there is probably to use static arrays:

julia> using StaticArrays
       function f5!(du, u::Vector{T}) where T
           r = T(0.1,0.2,0.1,0.2,0.1)
           for i = 1:length(u)
               du[i] = 0.5*(r .* u[i])
           end
       end
f5! (generic function with 2 methods)

julia> u = [ 1 .+ zero(SVector{5,Float64}) for i in 1:5 ];

julia> du = copy(u);

julia> @btime f5!($u2,du2)
  17.075 ns (0 allocations: 0 bytes)


(probably the code needs some adaptation to actually fit your real use case)

or, if you want to keep u and du as matrices:

julia> function f4!(du, u)
           m,n = size(u)
           for i = 1:m
               A, B, C, D, E = @view u[i, :]
               rAA = SVector{5,Float64}(0.1*A, 0.2*B, 0.1*C, 0.2*D, 0.1*E)
               du[i,:] .= 0.5*rAA
           end
       end
f4! (generic function with 1 method)

julia> @btime f4!($du,$u)
  27.456 ns (0 allocations: 0 bytes)

The issue in this last example is that the 5 must be known at compile time. Thus, if it varies, you must find a way to put it in the type information of the input variables of the function.

Thank you! this also gives me new insight!