Why these two functions have different allocations?

Fred_He · July 12, 2021, 4:00pm

I want to avoid any excess allocation. The f1! below needs 2m+1 allocations while the f2! needs m+1 allocations.
is ito possible to make this kind of functionality 1 allocation only?

function f1!(du, u)
    m,n = size(u)
    rAA = zeros(n)
    for i = 1:m
        A, B, C, D, E = u[i, :]
        rAA .= [0.1*A, 0.2*B, 0.1*C, 0.2*D, 0.1*E]
        lmul!(0.5, rAA)
        du[i,:] .= rAA
    end
end

function f2!(du, u)
    m,n = size(u)
    rAA = zeros(n)
    for i = 1:m
        A, B, C, D, E = u[i, :]
        rAA[1] = 0.1*A
        rAA[2] = 0.2*B
        rAA[3] = 0.1*C
        rAA[4] = 0.2*D
        rAA[5] = 0.1*E
        lmul!(0.5, rAA)
        du[i,:] .= rAA
    end
end

jling · July 12, 2021, 4:02pm

this needs allocation

EP-Guy · July 12, 2021, 4:23pm

It seems like you can just annotate @views in the for loop:

function f3!(du, u)
    m,n = size(u)
    rAA = zeros(n)
    @views for i = 1:m
        A, B, C, D, E = u[i, :]
        rAA[1] = 0.1*A
        rAA[2] = 0.2*B
        rAA[3] = 0.1*C
        rAA[4] = 0.2*D
        rAA[5] = 0.1*E
        lmul!(0.5, rAA)
        du[i,:] .= rAA
    end
end

lmiq · July 12, 2021, 4:29pm

you can replace that by a tuple, and that should be non-allocating:

rAA .= (0.1*A, 0.2*B, 0.1*C, 0.2*D, 0.1*E)

Fred_He · July 12, 2021, 4:30pm

Thank you! This creats what I want. But… can you give some hints so I can figure out why? Thanks a lot!

Fred_He · July 12, 2021, 4:33pm

Thanks also! Then i don’t have to write excess code to assign values for each elements.

EP-Guy · July 12, 2021, 4:36pm

Yes I prefer the notation of @lmiq too. The u[i,:] allocates, so annotating the view on that is enough to remove it.

function f4!(du, u)
    m,n = size(u)
    rAA = zeros(n)
    for i = 1:m
        A, B, C, D, E = @view u[i, :]
        rAA .= (0.1*A, 0.2*B, 0.1*C, 0.2*D, 0.1*E)
        lmul!(0.5, rAA)
        du[i,:] .= rAA
    end
end

Also, note that it may be faster to permute u so that when you extract A, B, C, D, E it happens in column major order (i.e. you’ll have u[:,i]).

sostock · July 12, 2021, 4:36pm

u[i, :] allocates a new vector, so avoid that as well.

function f2!(du, u)
    m,n = size(u)
    rAA = zeros(n)
    for i = 1:m
        rAA[1] = 0.1*u[i, 1]
        rAA[2] = 0.2*u[i, 2]
        rAA[3] = 0.1*u[i, 3]
        rAA[4] = 0.2*u[i, 4]
        rAA[5] = 0.1*u[i, 5]
        lmul!(0.5, rAA)
        du[i,:] .= rAA
    end
end

Some more notes if you care about actual speed and not just allocations:

Julia arrays are column-major, so u[i,:] and du[i,:] is not the preferred order of accessing these (if they are actual Arrays).
~~The last two lines in the loop operate on the complete array rAA even though only the first 5 elements of rAA ever change. This could be easily avoided.~~ It looks like rAA only has five elements (I looked at f2! where that isn’t necessarily the case).

Fred_He · July 12, 2021, 4:47pm

Thank you! I understand Julia is column-major I have to use like this because in my code each line is the concentration of different species at one location. The calculation of rAA is not as simple as the code here but requires the concentration of different species.
But do you think coding like

A = @view u[1, :]
B = @view u[2, :]
...

and call A[i] B[i] in the loop will be better?

EP-Guy · July 12, 2021, 5:02pm

I wouldn’t expect that to perform any better. It could actually be worse with u in it’s current shape because of the memory access order, but you won’t know until you try it.

Same with permuting u. Maybe outside this function you need u in the shape it is now, but depending on its dimensions, the function might be faster for your application if inside the function you permutedims(u). This makes a copy (costs allocations and time) but then extracting the elements will be faster - this may or may not be faster overall. I tried it here with u sized (300, 5) and it wasn’t worth it to permutedims(u) inside the function f4!.

lmiq · July 12, 2021, 5:07pm

The best you can do there is probably to use static arrays:

julia> using StaticArrays
       function f5!(du, u::Vector{T}) where T
           r = T(0.1,0.2,0.1,0.2,0.1)
           for i = 1:length(u)
               du[i] = 0.5*(r .* u[i])
           end
       end
f5! (generic function with 2 methods)

julia> u = [ 1 .+ zero(SVector{5,Float64}) for i in 1:5 ];

julia> du = copy(u);

julia> @btime f5!($u2,du2)
  17.075 ns (0 allocations: 0 bytes)

(probably the code needs some adaptation to actually fit your real use case)

or, if you want to keep u and du as matrices:

julia> function f4!(du, u)
           m,n = size(u)
           for i = 1:m
               A, B, C, D, E = @view u[i, :]
               rAA = SVector{5,Float64}(0.1*A, 0.2*B, 0.1*C, 0.2*D, 0.1*E)
               du[i,:] .= 0.5*rAA
           end
       end
f4! (generic function with 1 method)

julia> @btime f4!($du,$u)
  27.456 ns (0 allocations: 0 bytes)

The issue in this last example is that the 5 must be known at compile time. Thus, if it varies, you must find a way to put it in the type information of the input variables of the function.

Fred_He · July 13, 2021, 8:19am

Thank you! this also gives me new insight!

Topic		Replies	Views
Passing views to function without allocation New to Julia	14	1532	December 18, 2020
Reducing allocations and run time General Usage	8	393	February 12, 2021
Minimizing Allocations - How to? Resources? New to Julia	5	772	August 19, 2020
Problem with creating view of vectors in vector of vectors (it keeps allocating) Performance memory-allocation , vector	2	129	July 9, 2024
Common allocation mistakes Performance memory-allocation	47	7133	August 21, 2023

Why these two functions have different allocations?

Related topics