Autodifferentiation of model with fixed and mutable parameters

EminentCoder · November 28, 2024, 10:46am

Hello all,

I am currently trying to build a model where some of the model-parameters need to be optimized while others have to remain fixed. The current approach works rather well, but I have some trouble with auto-differentiation. It is best explained with a simplified, working example:

using ForwardDiff
using LinearAlgebra
using Optim

# Given parameters, can be arbitrarily long
parameter_values = ones(5)

# Element-wise square of the parameters, for example
function objective_function(params)
    return params'*params
end

println("Evaluation: $(objective_function(parameter_values))")
solution_all = optimize(objective_function, parameter_values, LBFGS(), autodiff=:forward)
println("Solution: $(solution_all.minimizer) with minimum value $(solution_all.minimum)")

The output I am getting is desirable, since the optimizer is allowed to modify all of the parameters:

Evaluation: 5.0
Solution: [0.0, 0.0, 0.0, 0.0, 0.0] with minimum value 0.0

Now I would like to optimize only over a sub-set of parameters, namely those marked as mutable:

# User-provided bit-vector, same length as parameter_values
mutable_parameters = [false, false, false, true, true]

In order to fix some of the parameters, I create a new, modified version of the objective_function:

# Separate fixed from mutable parameters
function separator(obj_func, selector_array, all_params, mutable_params)
    all_params[selector_array] = mutable_params
    return obj_func(all_params)
end

new_objective_function(params) = separator(objective_function, mutable_parameters, parameter_values, params)

This also works as intended (the first three parameters are fixed to 1.0, the last two are optimized towards 0.0). However, in this example I cannot apply auto-differentiation:

println("Evaluation: $(new_objective_function(parameter_values[mutable_parameters]))")
# Note: No autodiff=:forward as above
solution_selected = optimize(new_objective_function, parameter_values[mutable_parameters], LBFGS()) 
println("Solution: $(solution_selected.minimizer) with minimum value $(solution_selected.minimum)")

Evaluation: 5.0
Solution: [1.0504708214398306e-11, -2.6163737842921364e-11] with minimum value 3.0

Question: What do I need to change in order to get auto-differentiation to work again?

The best-case would be some tweaking to the current approach, but others with the same functionality would be also more than welcome!

Kindly,
EminentCoder

PS: If I just try to take the gradient of the new_objective_function over the mutable parameters, this is what happens:

julia> ForwardDiff.gradient(new_objective_function, parameter_values[mutable_parameters])
ERROR: MethodError: no method matching Float64(::ForwardDiff.Dual{ForwardDiff.Tag{typeof(new_objective_function), Float64}, Float64, 2})
The type `Float64` exists, but no method is defined for this combination of argument types when trying to construct it.

Closest candidates are:
  (::Type{T})(::Real, ::RoundingMode) where T<:AbstractFloat
   @ Base rounding.jl:265
  (::Type{T})(::T) where T<:Number
   @ Core boot.jl:900
  Float64(::IrrationalConstants.Invsqrt2π)
   @ IrrationalConstants ~/.julia/packages/IrrationalConstants/vp5v4/src/macro.jl:112
  ...

Stacktrace:
  [1] convert(::Type{Float64}, x::ForwardDiff.Dual{ForwardDiff.Tag{typeof(new_objective_function), Float64}, Float64, 2})
    @ Base ./number.jl:7
  [2] setindex!(A::Vector{Float64}, x::ForwardDiff.Dual{ForwardDiff.Tag{…}, Float64, 2}, i::Int64)
    @ Base ./array.jl:976
  [3] macro expansion
    @ ./multidimensional.jl:981 [inlined]
  [4] macro expansion
    @ ./cartesian.jl:64 [inlined]
  [5] _unsafe_setindex!(::IndexLinear, A::Vector{…}, x::Vector{…}, I::Base.LogicalIndex{…})
    @ Base ./multidimensional.jl:979
  [6] _setindex!
    @ ./multidimensional.jl:967 [inlined]
  [7] setindex!
    @ ./abstractarray.jl:1413 [inlined]
  [8] separator(obj_func::typeof(objective_function), selector_array::Vector{…}, all_params::Vector{…}, mutable_params::Vector{…})
    @ Main ~/continuoustimesem/diff_problem.jl:20
  [9] new_objective_function(parameters::Vector{ForwardDiff.Dual{ForwardDiff.Tag{…}, Float64, 2}})
    @ Main ~/continuoustimesem/diff_problem.jl:24
 [10] vector_mode_dual_eval!
    @ ~/.julia/packages/ForwardDiff/UBbGT/src/apiutils.jl:24 [inlined]
 [11] vector_mode_gradient(f::typeof(new_objective_function), x::Vector{…}, cfg::ForwardDiff.GradientConfig{…})
    @ ForwardDiff ~/.julia/packages/ForwardDiff/UBbGT/src/gradient.jl:91
 [12] gradient
    @ ~/.julia/packages/ForwardDiff/UBbGT/src/gradient.jl:20 [inlined]
 [13] gradient(f::typeof(new_objective_function), x::Vector{…}, cfg::ForwardDiff.GradientConfig{…})
    @ ForwardDiff ~/.julia/packages/ForwardDiff/UBbGT/src/gradient.jl:17
 [14] gradient(f::typeof(new_objective_function), x::Vector{Float64})
    @ ForwardDiff ~/.julia/packages/ForwardDiff/UBbGT/src/gradient.jl:17
 [15] top-level scope
    @ REPL[4]:1
Some type information was truncated. Use `show(err)` to see complete types.

harold07douglas · November 28, 2024, 11:28am

EminentCoder:

I am currently trying to build a model where some of the model-parameters need to be optimized while others have to remain fixed. The current approach works rather well, but I have some trouble with auto-differentiation. It is best explained with a simplified, working example:
using ForwardDiff
using LinearAlgebra
using Optim

# Given parameters, can be arbitrarily long
parameter_values = ones(5)

# Element-wise square of the parameters, for example
function objective_function(params)
    return params'*params
end

println("Evaluation: $(objective_function(parameter_values))")
solution_all = optimize(objective_function, parameter_values, LBFGS(), autodiff=:forward)
println("Solution: $(solution_all.minimizer) with minimum value $(solution_all.minimum)")
The output I am getting is desirable, since the optimizer is allowed to modify all of the parameters:

Evaluation: 5.0
Solution: [0.0, 0.0, 0.0, 0.0, 0.0] with minimum value 0.0

Now I would like to optimize only over a sub-set of parameters, namely those marked as mutable:
# User-provided bit-vector, same length as parameter_values
mutable_parameters = [false, false, false, true, true]
In order to fix some of the parameters, I create a new, modified version of the objective_function:
# Separate fixed from mutable parameters
function separator(obj_func, selector_array, all_params, mutable_params)
    all_params[selector_array] = mutable_params
    return obj_func(all_params)
end

new_objective_function(params) = separator(objective_function, mutable_parameters, parameter_values, params)
This also works as intended (the first three parameters are fixed to 1.0, the last two are optimized towards 0.0). However, in this example I cannot apply auto-differentiation:
println("Evaluation: $(new_objective_function(parameter_values[mutable_parameters]))")
# Note: No autodiff=:forward as above
solution_selected = optimize(new_objective_function, parameter_values[mutable_parameters], LBFGS()) 
println("Solution: $(solution_selected.minimizer) with minimum value $(solution_selected.minimum)")
Evaluation: 5.0
Solution: [1.0504708214398306e-11, -2.6163737842921364e-11] with minimum value 3.0

Question: What do I need to change in order to get auto-differentiation to work again?

The best-case would be some tweaking to the current approach, but others with the same functionality would be also more than welcome!

Kindly,
EminentCoder

PS: If I just try to take the gradient of the new_objective_function over the mutable parameters, this is what happens:
julia> ForwardDiff.gradient(new_objective_function, parameter_values[mutable_parameters])
ERROR: MethodError: no method matching Float64(::ForwardDiff.Dual{ForwardDiff.Tag{typeof(new_objective_function), Float64}, Float64, 2})
The type `Float64` exists, but no method is defined for this combination of argument types when trying to construct it.

Closest candidates are:
  (::Type{T})(::Real, ::RoundingMode) where T<:AbstractFloat
   @ Base rounding.jl:265
  (::Type{T})(::T) where T<:Number
   @ Core boot.jl:900
  Float64(::IrrationalConstants.Invsqrt2π)
   @ IrrationalConstants ~/.julia/packages/IrrationalConstants/vp5v4/src/macro.jl:112
  ...

Stacktrace:
  [1] convert(::Type{Float64}, x::ForwardDiff.Dual{ForwardDiff.Tag{typeof(new_objective_function), Float64}, Float64, 2})
    @ Base ./number.jl:7
  [2] setindex!(A::Vector{Float64}, x::ForwardDiff.Dual{ForwardDiff.Tag{…}, Float64, 2}, i::Int64)
    @ Base ./array.jl:976
  [3] macro expansion
    @ ./multidimensional.jl:981 [inlined]
  [4] macro expansion
    @ ./cartesian.jl:64 [inlined]
  [5] _unsafe_setindex!(::IndexLinear, A::Vector{…}, x::Vector{…}, I::Base.LogicalIndex{…})
    @ Base ./multidimensional.jl:979
  [6] _setindex!
    @ ./multidimensional.jl:967 [inlined]
  [7] setindex!

Hello,

You need to adjust the way you handle the parameters inside your objective function to make sure that the modification step is compatible with auto-differentiation.

Here’s how you can achieve this:

Avoid in-place modification of Dual values: Instead of directly modifying the parameter vector (all_params[selector_array] = mutable_params), you should create a new vector where the fixed parameters remain fixed and the mutable parameters can be updated. Use a more auto-diff friendly way of separating parameters: We can use an approach where we “copy” the parameters in such a way that they retain their Dual type compatibility and avoid modifying them in-place. using ForwardDiff
using LinearAlgebra
using Optim

Given parameters, can be arbitrarily long

parameter_values = ones(5)

Element-wise square of the parameters, for example

function objective_function(params)
return params’ * params
end

println(“Evaluation: $(objective_function(parameter_values))”)

User-provided bit-vector, same length as parameter_values

mutable_parameters = [false, false, false, true, true]

Function to construct a new objective function where only mutable parameters are updated

function new_objective_function(mutable_params)
# Copy the fixed parameters to create the full parameter vector
all_params = copy(parameter_values)
all_params[mutable_parameters] .= mutable_params # Using broadcasting assignment
return objective_function(all_params)
end

println(“Evaluation: $(new_objective_function(parameter_values[mutable_parameters]))”)

Optimizing only over the mutable parameters

solution_selected = optimize(new_objective_function, parameter_values[mutable_parameters], LBFGS(), autodiff=:forward)
println(“Solution: (solution_selected.minimizer) with minimum value (solution_selected.minimum)”)

gdalle · November 28, 2024, 11:59am

Hi @EminentCoder, welcome to the community!

What the answer above (ChatGPT ?) means to say is that your vector all_params is initialized to be full of Float64. Thus, ForwardDiff cannot use it in differentiation because it needs to work on Dual numbers (see limitations of ForwardDiff).
I think something along this line would work, where you re-create a new vector of global parameters with the same element type as the variable you’re using in differentiation:

function separator(obj_func, selector_array, all_params, mutable_params)
    new_all_params = similar(all_params, eltype(mutable_params))  # can accommodate Dual numbers if necessary
    new_all_params[selector_array] .= mutable_params
    new_all_params[.!selector_array] = all_params[.!selector_array]
    return obj_func(all_params)
end

If that code is too slow for you, you may also want take a look at the performance tips. In particular, using of global variables that are not passed as function arguments is often a bad idea performance-wise.

EminentCoder · November 28, 2024, 1:06pm

Hello @gdalle,

With a minor correction ( return obj_func(all_params) to return obj_func(new_all_params)), your approach works perfectly fine for me. Thank you for the quick reply and the elegant solution!