I want to avoid any excess allocation. The f1! below needs 2m+1 allocations while the f2! needs m+1 allocations.
is ito possible to make this kind of functionality 1 allocation only?
function f1!(du, u)
m,n = size(u)
rAA = zeros(n)
for i = 1:m
A, B, C, D, E = u[i, :]
rAA .= [0.1*A, 0.2*B, 0.1*C, 0.2*D, 0.1*E]
lmul!(0.5, rAA)
du[i,:] .= rAA
end
end
function f2!(du, u)
m,n = size(u)
rAA = zeros(n)
for i = 1:m
A, B, C, D, E = u[i, :]
rAA[1] = 0.1*A
rAA[2] = 0.2*B
rAA[3] = 0.1*C
rAA[4] = 0.2*D
rAA[5] = 0.1*E
lmul!(0.5, rAA)
du[i,:] .= rAA
end
end
u[i, :] allocates a new vector, so avoid that as well.
function f2!(du, u)
m,n = size(u)
rAA = zeros(n)
for i = 1:m
rAA[1] = 0.1*u[i, 1]
rAA[2] = 0.2*u[i, 2]
rAA[3] = 0.1*u[i, 3]
rAA[4] = 0.2*u[i, 4]
rAA[5] = 0.1*u[i, 5]
lmul!(0.5, rAA)
du[i,:] .= rAA
end
end
Some more notes if you care about actual speed and not just allocations:
Julia arrays are column-major, so u[i,:] and du[i,:] is not the preferred order of accessing these (if they are actual Arrays).
The last two lines in the loop operate on the complete array rAA even though only the first 5 elements of rAA ever change. This could be easily avoided. It looks like rAA only has five elements (I looked at f2! where that isn’t necessarily the case).
Thank you! I understand Julia is column-major I have to use like this because in my code each line is the concentration of different species at one location. The calculation of rAA is not as simple as the code here but requires the concentration of different species.
But do you think coding like
I wouldn’t expect that to perform any better. It could actually be worse with u in it’s current shape because of the memory access order, but you won’t know until you try it.
Same with permuting u. Maybe outside this function you need u in the shape it is now, but depending on its dimensions, the function might be faster for your application if inside the function you permutedims(u). This makes a copy (costs allocations and time) but then extracting the elements will be faster - this may or may not be faster overall. I tried it here with u sized (300, 5) and it wasn’t worth it to permutedims(u) inside the function f4!.
The best you can do there is probably to use static arrays:
julia> using StaticArrays
function f5!(du, u::Vector{T}) where T
r = T(0.1,0.2,0.1,0.2,0.1)
for i = 1:length(u)
du[i] = 0.5*(r .* u[i])
end
end
f5! (generic function with 2 methods)
julia> u = [ 1 .+ zero(SVector{5,Float64}) for i in 1:5 ];
julia> du = copy(u);
julia> @btime f5!($u2,du2)
17.075 ns (0 allocations: 0 bytes)
(probably the code needs some adaptation to actually fit your real use case)
or, if you want to keep u and du as matrices:
julia> function f4!(du, u)
m,n = size(u)
for i = 1:m
A, B, C, D, E = @view u[i, :]
rAA = SVector{5,Float64}(0.1*A, 0.2*B, 0.1*C, 0.2*D, 0.1*E)
du[i,:] .= 0.5*rAA
end
end
f4! (generic function with 1 method)
julia> @btime f4!($du,$u)
27.456 ns (0 allocations: 0 bytes)
The issue in this last example is that the 5 must be known at compile time. Thus, if it varies, you must find a way to put it in the type information of the input variables of the function.