Using preallocated arrays of unknown type

I have an optimization problem where I have a function f(x) that is given to the optimizer (I use Optim.jl with the LBFGS() method). Inside this function f, I have another function g(x, y) that returns an array v.

function f(x)
  ...
  v = g(x, y) # x is the vector optimized over, y is some other parameter vector
  ...
  return -res
end

I want to swap out g(x,y) with a function g!(x,y,v) which uses the array v to return my computation. The array v should in turn be given to f as f(x,v), so that I can call the optimizer like this

result = Optim.optimize( x -> f(x, v), .... , autodiff=:forward) 

My problem is that I don’t know the type of v beforehand if I want to use autodiff. Is there any way I can provide a memory blob for Vector or FixedSizeVector to use, or is there a dual number type that is general enough for me to initialize v with?

Check out PreallocationTools.jl.

1 Like

Thanks, that works like a charm, yielding a ten-fold speedup! I still have a lot of allocations though, but I guess that is just the dual numbers being allocated here and there.

1 Like

Can you give an MWE? Just so see if some of these allocations can be removed.

Trying to make an MWE seems to remove the problems that I have, so there must be something wrong in the way I am doing it in the actual code. I am posting the function I am using below:

function state_control_to_poly_vector!(state, control, vec)
    N = length(state.K)
    M = 5 + 3 * N
    vec[1] = state.MAT
    vec[2] = state.MU
    vec[3] = state.ML
    vec[4] = state.TE
    vec[5] = state.TL
    for n=0:N-1
        vec[6+n] = state.K[n+1]
    end
    for n=0:2N-1
        vec[6+N+n] = control[n+1]
    end
    for m=1:M
        vec[M+m] = vec[m]^2
        vec[2*M+m] = vec[m]^3
    end
end


function fast_optimize_control(
    nt::Int,
    state,
    valfuncs,
    params,
    scalars,
    ricedata,
    simparams,
)
    # Constraints and initial values
    valfunc = valfuncs[nt]

    lower = vcat(0.01 * ones(simparams.N), 0.7 * ones(simparams.N))
    upper = vcat(0.3 * ones(simparams.N), 0.9 * ones(simparams.N))
    initial_x = vcat(0.1 * ones(simparams.N), 0.8 * ones(simparams.N))

    # Precompute terms needed for the very fast gross_output
    D = @. (1.0 + ricedata.A1 / params.NN * (state.TE / 2.5)^ricedata.B2)
    αKL = @. params.TFP[nt] * state.K^scalars.GAMMA * params.L[nt]^(1.0 - scalars.GAMMA) / D
    αBBK = @. params.B3[nt] * ricedata.B1 * αKL

    # Objective function
    function f(x, vtmp)
        r = zero(eltype(x))
        for n = 1:simparams.N
            r +=
                params.DELTA[n] *
                params.L[nt][n] *
                log(
                    (αKL[n] - αBBK[n] * x[n]^ricedata.B2[n]) * x[simparams.N+n] /
                    params.L[nt][n],
                )
        end
        r *= scalars.Q
        v = get_tmp(vtmp, x)
        state_control_to_poly_vector!(state, x, v)
        v .-= valfunc.Amean
        v ./= valfunc.Astd
        ddot = dot(valfunc.val, v)
        ddot = ddot*valfunc.ystd + valfunc.ymean

        vhat = r + ddot
        return -vhat
    end

    inner_optimizer = Optim.LBFGS()
    v = zeros(3*41)
    vtmp = DiffCache(v)
    result = Optim.optimize(
        x -> f(x, vtmp),
        lower,
        upper,
        initial_x,
        Fminbox(inner_optimizer),
        Optim.Options(store_trace = false, show_trace = false);
        autodiff = :forward,
    )
    x = Optim.minimizer(result)
    MIU = x[1:simparams.N]
    ALPHA = x[(simparams.N+1):end]

    return Control(MIU, ALPHA)
end

As you can see I am capturing quite a few variables in a closure f, but this is not the issue (I have tried to add all the extra parameters as function arguments as well).

Running this function with Profile.Allocs.@profile, and then using PProf. Allocs.pprof(), I get the following:

  Total:           0       6549 (flat, cum) 75.52%
    220            .          .                               (αKL[n] - αBBK[n] * x[n]^ricedata.B2[n]) * x[simparams.N+n] / 
    221            .          .                               params.L[nt][n], 
    222            .          .                           ) 
    223            .          .                   end 
    224            .          .                   r *= scalars.Q 
    225            .        553                   v = get_tmp(vtmp, x) 
    226            .        554                   state_control_to_poly_vector!(state, x, v) 
    227            .        566                   v .-= valfunc.Amean 
    228            .        510                   v ./= valfunc.Astd 
    229            .        530                   ddot = dot(valfunc.val, v) 
    230            .       2135                   ddot = ddot*valfunc.ystd + valfunc.ymean 
    231            .          .            
    232            .       1122                   vhat = r + ddot 
    233            .        579                   return -vhat 
    234            .          .               end 
    235            .          .            

From what I understand the only variables having other types than Float64 here should be x, r and v, and consequently ddot and vhat.

What stands out to me here are all the allocations where scalar operations are performed, eg. like line 230. Also, it seems that r and ddot has different types, as line 232 should not incur any allocations if the type was the same. The calculations involving just r does not incur any allocations it seems.

Thanks, could you give the complete code with the necessary imports and variable definitions? I’d like to run profiling locally. Typically, allocations on scalar operations like dot are a sign of type-instabilities, which you might be able to see better using @profview from Julia’s VSCode extension;

Unfortunately the parameter vectors are based on some non-trivial computations using data read from Excel spreadsheets and so on, and there are a lot of calculations happening before this step.

In general I can say that I use a mix of structs with typed members (ofc) and named tuples initialized with scalars and some typed vectors. I have tested the functions extensively with JET to catch any type instabilities (this of course excludes the optimization itself, as Optim seems to contain quite a lot of type instabilities).

I have made some observations that may be of help though. Rearranging my object functions like this

    function f(x, vtmp)
        v = get_tmp(vtmp, x)
        r = zero(eltype(v))
        for n = 1:simparams.N
            r +=
                params.DELTA[n] *
                params.L[nt][n] *
                log(
                    (αKL[n] - αBBK[n] * x[n]^ricedata.B2[n]) * x[simparams.N+n] /
                    params.L[nt][n],
                )
        end
        r *= scalars.Q
        state_control_to_poly_vector!(state, x, v)
        v .-= valfunc.Amean
        v ./= valfunc.Astd
        ddot = dot(valfunc.val, v)
        ddot = ddot*valfunc.ystd + valfunc.ymean

        vhat = r + ddot
        return -vhat
    end

leads to much worse performance wrt. allocations:

Total:           0      20541 (flat, cum) 90.93%
    212            .          .               # Objective function 
    213            .          .               function f(x, vtmp) 
    214            .        544                   v = get_tmp(vtmp, x) 
    215            .          .                   r = zero(eltype(v)) 
    216            .          .                   for n = 1:simparams.N 
    217            .      13307                       r += 
    218            .          .                           params.DELTA[n] * 
    219            .          .                           params.L[nt][n] * 
    220            .          .                           log( 
    221            .          .                               (αKL[n] - αBBK[n] * x[n]^ricedata.B2[n]) * x[simparams.N+n] / 
    222            .          .                               params.L[nt][n], 
    223            .          .                           ) 
    224            .          .                   end 
    225            .       1112                   r *= scalars.Q 
    226            .        555                   state_control_to_poly_vector!(state, x, v) 
    227            .        534                   v .-= valfunc.Amean 
    228            .        554                   v ./= valfunc.Astd 
    229            .        571                   ddot = dot(valfunc.val, v) 
    230            .       2228                   ddot = ddot*valfunc.ystd + valfunc.ymean 
    231            .          .            
    232            .        571                   vhat = r + ddot 
    233            .        565                   return -vhat 
    234            .          .               end 

In other words - using the type coming from get_tmp(vtmp, x) instead of just directly from x leads to more allocations (the code runs just fine yielding the same results as before).

I’ll do a bit more digging tonight, and maybe I can produce a slightly more realistic MWE that produces the same results then.

1 Like