Static Array and optimization

I am working on optimizing a 3x3 solver in Rheology. Here is my function. My question relates to the amount of memory allocation necessary to store a function in a dictionary. Here is a MWE:

using DifferentialEquations
using StaticArrays
#using InteractiveUtils

function dudt_giesekus(u, p, t, gradv)
    η0, τ, α = @view p[:]

    # Governing equations are for components of the 3x3 stress tensor
    σ = SA_F32[u[1] u[4] 0.; u[4] 0. 0.; 0. 0. u[3]]
    ∇v = SA_F32[0. 0. 0. ; gradv(t) 0. 0. ; 0. 0. 0.]
    D = 0.5 .* (∇v .+ transpose(∇v))

    T1 = (η0/τ) .* D
    T2 = (transpose(∇v) * σ) + (σ * ∇v)

    coef = α / (τ * η0)
    F = coef * (σ * σ)
    du = -σ / τ .+ T1 .+ T2  .- F .+ T2  # 9 equations (static matrix)
end

function run()
    du = [0.  0.  0.; 0.  0.  0.; 0.  0.  0.]
    u = [0.  0.  0.; 0.  0.  0.; 0.  0.  0.]
    u0 = SA[0.  0.  0.; 0.  0.  0.; 0.  0.  0.]
    t = 0.
    p = SA[1.,1.,1.]

    dct = Dict()
    dct[:γ_protoc] = convert(Vector{Float32}, [1, 2, 1, 2, 1, 2, 1, 2])
    dct[:ω_protoc] = convert(Vector{Float32}, [1, 1, 0.5, 0.5, 2., 2., 1/3., 1/3.])

    tspan = (0., 5.)
    σ0 = SA[0.  0.  0.; 0.  0.  0.; 0.  0.  0.]

    #v21_protoc = [ (t) -> 2. .* cos(3*t) for i in 1:8]    # Requires no or almost no memory allocation

    # Memory allocation is 12k per call to solve(). WHY?  <<<<<<
    v21_protoc = (t) -> dct[:γ_protoc][1]*cos(dct[:ω_protoc][1]*t)

    #fct = (t) -> 2f0*cos(1.5*t)  # Very efficient. No memory allocation

    dudt(u,p,t) = dudt_giesekus(u, p, t, v21_protoc)
    prob_giesekus = ODEProblem(dudt, σ0, tspan, p)

    sol_giesekus = solve(prob_giesekus, Tsit5())
    sol_giesekus = solve(prob_giesekus, Tsit5())
end

run()

When I run the above as is, the line

 v21_protoc = (t) -> dct[:γ_protoc][1]*cos(dct[:ω_protoc][1]*t)

allocates about 12k of memory every time solve is called.

First question: why doesn’t the allocation only occur once? After all, the function is not changing. It is as if the line executes inside dudt_giesekus.

Second question: why 12k for such a simple statement?

Third question: memory allocation goes to zero when I define fct according to:

    fct = (t) -> 2f0*cos(1.5*t)  # Very efficient. No memory allocation

which is what I would expect.

I use a dictionary because I want the flexibility to update the parameters of the function from a driver script.

Here are some additional interesting tidbits that I’d like to understand better. The memory allocation map when defining fct via a dictionary and calling solve twice is:

    20640   ∇v = SA_F32[0. 0. 0. ; gradv(t) 0. 0. ; 0. 0. 0.]
    33024   D = 0.5 .* (∇v .+ transpose(∇v))
        -
    24768   T1 = (η0/τ) .* D
    74160   T2 = (transpose(∇v) * σ) + (σ * ∇v)
        -
        0   coef = α / (τ * η0)
        0   F = coef * (σ * σ)
    61920   du = -σ / τ .+ T1 .+ T2  .- F .+ T2  # 9 equations (static matrix)

However, the memory allocation goes to zero when defining the function without dictionaries. In both cases, the function is anonymous.

Thanks.

this is {Any, Any}, you will have boxing everywhere

1 Like

To elaborate on the former answer, use Dict{Symbol,Vector{Float32}}() instead.

Alternatively, you can write the above as

dct = Dict(:γ_protoc => Float32[1, 2, 1, 2, 1, 2, 1, 2], :ω_protoc => Float32[1, 1, 0.5, 0.5, 2., 2., 1/3., 1/3.])

and the types will be inferred automatically.

It worked! Since my dictionary contains several different types, I can also do:

   γ_protoc = dct[:γ_protoc]  # The type should now be correctly inferred on the LHS
  ω_protoc = dct[:ω_protoc]

with no memory allocated in the RHS of the solver.

One question remains: why is there memory allocation multiple times on the line above, namely, every time solve is called? How can that be? Thanks.

Every time code reaches into an untyped container to pull out an element, it can’t know what type of object it will get. Even if it keeps grabbing the same object, it doesn’t know that no one has changed it since the last time it looked. Since it doesn’t know what to expect, dynamic dispatch is necessary and you get the extra slowdown and allocations.

Often, you can sidestep the “every time” issue using a function barrier so that dynamic dispatch only happens once. This isn’t as good as static dispatch, but can be much better than repeatedly requiring dynamic dispatch.

When accessing a typed container, it always knows what type it will get out. This means that static dispatch is possible.

Thanks!