How can I use pre-allocated vectors in a private way?

For performance reasons I want to use pre-allocated vectors. This often gives a 10 to 35 times perfomance boost.

Simple example:

using Parameters

@with_kw struct Solver{P}
    vec1::Vector{Float64} = zeros(P)
    vec2::Vector{Float64} = zeros(P)
    vec3::Vector{Float64} = zeros(P)
end

function first!(s::Solver, output, input)
    s.vec1 .= 0
    for i in 1:100
        s.vec1 +.= input
    end
    output .= s.vec1
    nothing
end

function second!(s::Solver, output, input)
    s.vec2 .= 0
    s.vec3 .= 0
    for i in 1:100
        s.vec3 .= sin.(input)
        s.vec2 *.= first!(s, s.vec3, input)
    end
    output .= s.vec2
    nothing
end

The problem I have with this approach is that all pre-allocated vectors are visible in all functions.

How can I have private, pre-allocated vectors in each of the functions?

I know that full privacy is not possible in Julia, but with code like shown here it is way too easy to modify vectors that should be private to function first!() by accident in function second!().

Furthermore, it is not clear from reading the code which of the members of the struct Solver are supposed to be private and which don’t.

You could use GitHub - SciML/PreallocationTools.jl: Tools for building non-allocating pre-cached functions in Julia, allowing for GC-free usage of automatic differentiation in complex codes and use a different lazy cache for each function.

1 Like

How would you re-write these lines, using a lazy cache?

    gamma = copy(gamma_new)
    abs_gamma_new = copy(gamma_new)
    induced_velocity_all = zeros(n_panels, 3)
    relative_velocity_array = similar(va_array)
    relative_velocity_crossz = similar(relative_velocity_array)
    v_acrossz_array = similar(va_array)
    cl_array = zeros(n_panels)
    damp = zeros(length(gamma))
    v_normal_array = zeros(n_panels)
    v_tangential_array = zeros(n_panels)

Function signature:

function gamma_loop(
    solver::Solver,
    body_aero::BodyAerodynamics,
    gamma_new::Vector{Float64},
    va_array::Matrix{Float64},
    chord_array::Vector{Float64},
    x_airf_array::Matrix{Float64},
    y_airf_array::Matrix{Float64},
    z_airf_array::Matrix{Float64},
    panels::Vector{Panel},
    relaxation_factor::Float64;
    log::Bool = true
)

the hard way :

function gamma_loop(
    gamma_new::Vector{Float64},
    va_array::Matrix{Float64},
    cache
)
    gamma = cache[gamma_new]
    abs_gamma_new = cache[gamma_new]
    induced_velocity_all = cache[va_array]
    relative_velocity_array = cache[va_array]
    relative_velocity_crossz = cache[relative_velocity_array]
    v_acrossz_array = cache[va_array]
    cl_array = cache[gamma_new]
    damp = cache[gamma_new]
    v_normal_array = cache[gamma_new]
    v_tangential_array = cache[gamma_new]    
    va_array .+= v_tangential_array .+ v_normal_array .+ damp .+ cl_array .+ v_acrossz_array .+ relative_velocity_array .+ relative_velocity_array .+ relative_velocity_crossz .+ induced_velocity_all .+gamma .+ abs_gamma_new
    return nothing
end
n_panels = 100
gamma_new = zeros(n_panels)
va_array = zeros(n_panels, 3)
cache = PreallocationTools.LazyBufferCache()

@time gamma_loop(gamma_new, va_array, cache)

but obviously you should still reuse cache when you can instead of creating so many but each will be its own cache even between function. Note that anything can be in a cache (like with similar). but here it is after

using BenchmarkTools
@btime gamma_loop($gamma_new, $va_array, $cache)
  1.100 μs (0 allocations: 0 bytes)

And do I need these two lines:

    gamma = cache[gamma_new]
    gamma .= gamma_new

Or is gamma already initialized after the first line?

And do I need one cache per variable? Or one cache per function? Or one cache per type?

I currently create the cache with:

const cache = LazyBufferCache()

Think of it as a global dictionary if the key exists you get the data otherwise it’s set ( it’s more complicated than that for type stability but you can think of it like that). I think you have to do it yes try to print it.

So I guess I need two caches for these two lines:

    gamma = copy(gamma_new)
    abs_gamma_new = copy(gamma_new)

Because they share the same key.

Like:

    gamma = cache1[gamma_new]
    abs_gamma_new = cache2[gamma_new]

yes,

using PreallocationTools

function mul2(arr,cache)
    arrcopy1 = cache[arr]
    arrcopy2 = cache[arr]
    @info arrcopy1 === arrcopy2
end
arr = ones(10)
cache = PreallocationTools.LazyBufferCache()
mul2(arr,cache)
[ Info: true

I also tested arrcopy1 is not equal to arr here