For performance reasons I want to use pre-allocated vectors. This often gives a 10 to 35 times perfomance boost.
Simple example:
using Parameters
@with_kw struct Solver{P}
vec1::Vector{Float64} = zeros(P)
vec2::Vector{Float64} = zeros(P)
vec3::Vector{Float64} = zeros(P)
end
function first!(s::Solver, output, input)
s.vec1 .= 0
for i in 1:100
s.vec1 +.= input
end
output .= s.vec1
nothing
end
function second!(s::Solver, output, input)
s.vec2 .= 0
s.vec3 .= 0
for i in 1:100
s.vec3 .= sin.(input)
s.vec2 *.= first!(s, s.vec3, input)
end
output .= s.vec2
nothing
end
The problem I have with this approach is that all pre-allocated vectors are visible in all functions.
How can I have private, pre-allocated vectors in each of the functions?
I know that full privacy is not possible in Julia, but with code like shown here it is way too easy to modify vectors that should be private to function first!()
by accident in function second!()
.
Furthermore, it is not clear from reading the code which of the members of the struct Solver
are supposed to be private and which don’t.
How would you re-write these lines, using a lazy cache?
gamma = copy(gamma_new)
abs_gamma_new = copy(gamma_new)
induced_velocity_all = zeros(n_panels, 3)
relative_velocity_array = similar(va_array)
relative_velocity_crossz = similar(relative_velocity_array)
v_acrossz_array = similar(va_array)
cl_array = zeros(n_panels)
damp = zeros(length(gamma))
v_normal_array = zeros(n_panels)
v_tangential_array = zeros(n_panels)
Function signature:
function gamma_loop(
solver::Solver,
body_aero::BodyAerodynamics,
gamma_new::Vector{Float64},
va_array::Matrix{Float64},
chord_array::Vector{Float64},
x_airf_array::Matrix{Float64},
y_airf_array::Matrix{Float64},
z_airf_array::Matrix{Float64},
panels::Vector{Panel},
relaxation_factor::Float64;
log::Bool = true
)
the hard way :
function gamma_loop(
gamma_new::Vector{Float64},
va_array::Matrix{Float64},
cache
)
gamma = cache[gamma_new]
abs_gamma_new = cache[gamma_new]
induced_velocity_all = cache[va_array]
relative_velocity_array = cache[va_array]
relative_velocity_crossz = cache[relative_velocity_array]
v_acrossz_array = cache[va_array]
cl_array = cache[gamma_new]
damp = cache[gamma_new]
v_normal_array = cache[gamma_new]
v_tangential_array = cache[gamma_new]
va_array .+= v_tangential_array .+ v_normal_array .+ damp .+ cl_array .+ v_acrossz_array .+ relative_velocity_array .+ relative_velocity_array .+ relative_velocity_crossz .+ induced_velocity_all .+gamma .+ abs_gamma_new
return nothing
end
n_panels = 100
gamma_new = zeros(n_panels)
va_array = zeros(n_panels, 3)
cache = PreallocationTools.LazyBufferCache()
@time gamma_loop(gamma_new, va_array, cache)
but obviously you should still reuse cache when you can instead of creating so many but each will be its own cache even between function. Note that anything can be in a cache (like with similar). but here it is after
using BenchmarkTools
@btime gamma_loop($gamma_new, $va_array, $cache)
1.100 μs (0 allocations: 0 bytes)
And do I need these two lines:
gamma = cache[gamma_new]
gamma .= gamma_new
Or is gamma already initialized after the first line?
And do I need one cache per variable? Or one cache per function? Or one cache per type?
I currently create the cache with:
const cache = LazyBufferCache()
Think of it as a global dictionary if the key exists you get the data otherwise it’s set ( it’s more complicated than that for type stability but you can think of it like that). I think you have to do it yes try to print it.
So I guess I need two caches for these two lines:
gamma = copy(gamma_new)
abs_gamma_new = copy(gamma_new)
Because they share the same key.
Like:
gamma = cache1[gamma_new]
abs_gamma_new = cache2[gamma_new]
yes,
using PreallocationTools
function mul2(arr,cache)
arrcopy1 = cache[arr]
arrcopy2 = cache[arr]
@info arrcopy1 === arrcopy2
end
arr = ones(10)
cache = PreallocationTools.LazyBufferCache()
mul2(arr,cache)
[ Info: true
I also tested arrcopy1 is not equal to arr here