Using preallocated arrays of unknown type

Miroboru · June 15, 2025, 4:53pm

I have an optimization problem where I have a function f(x) that is given to the optimizer (I use Optim.jl with the LBFGS() method). Inside this function f, I have another function g(x, y) that returns an array v.

function f(x)
  ...
  v = g(x, y) # x is the vector optimized over, y is some other parameter vector
  ...
  return -res
end

I want to swap out g(x,y) with a function g!(x,y,v) which uses the array v to return my computation. The array v should in turn be given to f as f(x,v), so that I can call the optimizer like this

result = Optim.optimize( x -> f(x, v), .... , autodiff=:forward)

My problem is that I don’t know the type of v beforehand if I want to use autodiff. Is there any way I can provide a memory blob for Vector or FixedSizeVector to use, or is there a dual number type that is general enough for me to initialize v with?

contradict · June 15, 2025, 5:00pm

Check out PreallocationTools.jl.

Miroboru · June 15, 2025, 10:00pm

Thanks, that works like a charm, yielding a ten-fold speedup! I still have a lot of allocations though, but I guess that is just the dual numbers being allocated here and there.

gdalle · June 15, 2025, 10:43pm

Can you give an MWE? Just so see if some of these allocations can be removed.

Miroboru · June 16, 2025, 6:03am

Trying to make an MWE seems to remove the problems that I have, so there must be something wrong in the way I am doing it in the actual code. I am posting the function I am using below:

function state_control_to_poly_vector!(state, control, vec)
    N = length(state.K)
    M = 5 + 3 * N
    vec[1] = state.MAT
    vec[2] = state.MU
    vec[3] = state.ML
    vec[4] = state.TE
    vec[5] = state.TL
    for n=0:N-1
        vec[6+n] = state.K[n+1]
    end
    for n=0:2N-1
        vec[6+N+n] = control[n+1]
    end
    for m=1:M
        vec[M+m] = vec[m]^2
        vec[2*M+m] = vec[m]^3
    end
end


function fast_optimize_control(
    nt::Int,
    state,
    valfuncs,
    params,
    scalars,
    ricedata,
    simparams,
)
    # Constraints and initial values
    valfunc = valfuncs[nt]

    lower = vcat(0.01 * ones(simparams.N), 0.7 * ones(simparams.N))
    upper = vcat(0.3 * ones(simparams.N), 0.9 * ones(simparams.N))
    initial_x = vcat(0.1 * ones(simparams.N), 0.8 * ones(simparams.N))

    # Precompute terms needed for the very fast gross_output
    D = @. (1.0 + ricedata.A1 / params.NN * (state.TE / 2.5)^ricedata.B2)
    αKL = @. params.TFP[nt] * state.K^scalars.GAMMA * params.L[nt]^(1.0 - scalars.GAMMA) / D
    αBBK = @. params.B3[nt] * ricedata.B1 * αKL

    # Objective function
    function f(x, vtmp)
        r = zero(eltype(x))
        for n = 1:simparams.N
            r +=
                params.DELTA[n] *
                params.L[nt][n] *
                log(
                    (αKL[n] - αBBK[n] * x[n]^ricedata.B2[n]) * x[simparams.N+n] /
                    params.L[nt][n],
                )
        end
        r *= scalars.Q
        v = get_tmp(vtmp, x)
        state_control_to_poly_vector!(state, x, v)
        v .-= valfunc.Amean
        v ./= valfunc.Astd
        ddot = dot(valfunc.val, v)
        ddot = ddot*valfunc.ystd + valfunc.ymean

        vhat = r + ddot
        return -vhat
    end

    inner_optimizer = Optim.LBFGS()
    v = zeros(3*41)
    vtmp = DiffCache(v)
    result = Optim.optimize(
        x -> f(x, vtmp),
        lower,
        upper,
        initial_x,
        Fminbox(inner_optimizer),
        Optim.Options(store_trace = false, show_trace = false);
        autodiff = :forward,
    )
    x = Optim.minimizer(result)
    MIU = x[1:simparams.N]
    ALPHA = x[(simparams.N+1):end]

    return Control(MIU, ALPHA)
end

As you can see I am capturing quite a few variables in a closure f, but this is not the issue (I have tried to add all the extra parameters as function arguments as well).

Running this function with Profile.Allocs.@profile, and then using PProf. Allocs.pprof(), I get the following:

  Total:           0       6549 (flat, cum) 75.52%
    220            .          .                               (αKL[n] - αBBK[n] * x[n]^ricedata.B2[n]) * x[simparams.N+n] / 
    221            .          .                               params.L[nt][n], 
    222            .          .                           ) 
    223            .          .                   end 
    224            .          .                   r *= scalars.Q 
    225            .        553                   v = get_tmp(vtmp, x) 
    226            .        554                   state_control_to_poly_vector!(state, x, v) 
    227            .        566                   v .-= valfunc.Amean 
    228            .        510                   v ./= valfunc.Astd 
    229            .        530                   ddot = dot(valfunc.val, v) 
    230            .       2135                   ddot = ddot*valfunc.ystd + valfunc.ymean 
    231            .          .            
    232            .       1122                   vhat = r + ddot 
    233            .        579                   return -vhat 
    234            .          .               end 
    235            .          .

From what I understand the only variables having other types than Float64 here should be x, r and v, and consequently ddot and vhat.

What stands out to me here are all the allocations where scalar operations are performed, eg. like line 230. Also, it seems that r and ddot has different types, as line 232 should not incur any allocations if the type was the same. The calculations involving just r does not incur any allocations it seems.

gdalle · June 16, 2025, 7:24am

Thanks, could you give the complete code with the necessary imports and variable definitions? I’d like to run profiling locally. Typically, allocations on scalar operations like dot are a sign of type-instabilities, which you might be able to see better using @profview from Julia’s VSCode extension;

Miroboru · June 16, 2025, 10:26am

Unfortunately the parameter vectors are based on some non-trivial computations using data read from Excel spreadsheets and so on, and there are a lot of calculations happening before this step.

In general I can say that I use a mix of structs with typed members (ofc) and named tuples initialized with scalars and some typed vectors. I have tested the functions extensively with JET to catch any type instabilities (this of course excludes the optimization itself, as Optim seems to contain quite a lot of type instabilities).

I have made some observations that may be of help though. Rearranging my object functions like this

    function f(x, vtmp)
        v = get_tmp(vtmp, x)
        r = zero(eltype(v))
        for n = 1:simparams.N
            r +=
                params.DELTA[n] *
                params.L[nt][n] *
                log(
                    (αKL[n] - αBBK[n] * x[n]^ricedata.B2[n]) * x[simparams.N+n] /
                    params.L[nt][n],
                )
        end
        r *= scalars.Q
        state_control_to_poly_vector!(state, x, v)
        v .-= valfunc.Amean
        v ./= valfunc.Astd
        ddot = dot(valfunc.val, v)
        ddot = ddot*valfunc.ystd + valfunc.ymean

        vhat = r + ddot
        return -vhat
    end

leads to much worse performance wrt. allocations:

Total:           0      20541 (flat, cum) 90.93%
    212            .          .               # Objective function 
    213            .          .               function f(x, vtmp) 
    214            .        544                   v = get_tmp(vtmp, x) 
    215            .          .                   r = zero(eltype(v)) 
    216            .          .                   for n = 1:simparams.N 
    217            .      13307                       r += 
    218            .          .                           params.DELTA[n] * 
    219            .          .                           params.L[nt][n] * 
    220            .          .                           log( 
    221            .          .                               (αKL[n] - αBBK[n] * x[n]^ricedata.B2[n]) * x[simparams.N+n] / 
    222            .          .                               params.L[nt][n], 
    223            .          .                           ) 
    224            .          .                   end 
    225            .       1112                   r *= scalars.Q 
    226            .        555                   state_control_to_poly_vector!(state, x, v) 
    227            .        534                   v .-= valfunc.Amean 
    228            .        554                   v ./= valfunc.Astd 
    229            .        571                   ddot = dot(valfunc.val, v) 
    230            .       2228                   ddot = ddot*valfunc.ystd + valfunc.ymean 
    231            .          .            
    232            .        571                   vhat = r + ddot 
    233            .        565                   return -vhat 
    234            .          .               end

In other words - using the type coming from get_tmp(vtmp, x) instead of just directly from x leads to more allocations (the code runs just fine yielding the same results as before).

I’ll do a bit more digging tonight, and maybe I can produce a slightly more realistic MWE that produces the same results then.

Topic		Replies	Views
Correct preallocation for different Autodiff libraries Performance flux , zygote , forwarddiff , reversediff , autodiff	3	796	May 13, 2021
Are there ways to preallocate memory via structs for JuMP autodiff? Optimization (Mathematical) jump , memory-allocation , forwarddiff , autodiff	2	247	August 20, 2023
IPNewton and ForwardDiff with preallocated Arrays Optimization (Mathematical)	2	794	February 11, 2021
Autodifferencing and type stability in DifferentialEquations.jl New to Julia diffeq	5	628	January 18, 2022
Spurious memory allocations within function Performance question , memory-allocation	9	436	March 22, 2023

Using preallocated arrays of unknown type

Related topics