Help with Tag type parameter in ForwardDiff.jl (+ Optim.jl)


I understand from the documentation of ForwardDiff.jl that the package uses a Tag type parameter to avoid something called ‘perturbation confusion’:
From experimenting I have noticed that the Tag parameter is somehow used to identify the function where the dual values are used.

Suppose I have roughly a function like this:

function construct_objective(model::SomeModel, data::SomeData, params::Parameter{Float64})
   params_dual = ForwardDiff.Dual{*what goes here?*}(params);
   function obj!(x::AbstractArray{T, 1}) where T <: Real
         set_parameter_values!(x, params_dual, Val(:estimated)); # Set estimated parameters to match x.
         compute_objective_value(model, y, data, params_dual);
   Optim.TwiceDifferentiable(obj!, construct_array(params_dual); autodiff = :forward);

which creates an objective I can optimize using Optim.jl. Here the argument ‘params’ contains model parameters, some of which I want to estimate.

On the line where I convert my parameters to Parameter{Dual{…}}, is it possible for me to specify the Tag parameter myself and not internally by ForwardDiff?
There is no explanation for this in the documentation for ForwardDiff.jl.
The reason I would like to specify it myself (and have everything working correctly), is that then I could specify at run time which of the parameters in ‘params’ are estimated and which are left at fixed values.

My idea is that I would like to write code like this:

par = get_initial_params(...); # Get initial parameter values for all parameters (fixed and estimated).
fixed_names = (:a, :b); # Specify which parameters will be left fixed i.e optimization won't modify them.
set_fixed!(par, fixed_names);
obj = construct_objective(model, data, par); # Construct objective estimating only estimated parameters, and compatible with ForwardDiff.
res = Optim.optimize(obj, construct_array(par), ...);  # Optimise.

Note that without autodifferentiation the above can be made to work by removing the ‘params_dual = …’ line from the code snippet and changing ‘params_dual’ to ‘params’.

All help is greatly appreciated.

Edit: added backticks for readability, thanks @Tamas_Papp

Please quote your code with backticks:

If you just want to optimize some variables, I would use a closure. Eg

f(a, b, c) = a^2 + b^2 + c^2
fixab(a, b) = c -> f(a, b, c)
1 Like

Thanks for your reply. ForwardDiff could definately be used with your example.

However, like I said, I would like the solution to be such that fixed (and hence estimated) parameters could be selected at run time (without modification of the objective function). As I have a lot of parameters, having a separate argument for each parameter is not an option also.

I wonder if ForwardDiff is flexible enough to allow this.

You could use a closure/callable that mixed fixed and known arguments into a vector. This is a more general and probably easier solution than somehow rigging ForwardDiff.

This is super-crude and can be optimized further, but

struct FixedMix{P,F}

function (m::FixedMix)(x::AbstractVector)
    xlen = length(x)
    ylen = xlen + length(m.positions)
    y = Vector{promote_type(eltype(x), eltype(m.fixed))}(undef, ylen)
    for (p, f) in zip(m.positions, m.fixed)
        y[p] = f
    for (i, elt) in zip(setdiff(1:ylen, m.positions), x)
        y[i] = elt

m = FixedMix([3,1], [-3.0, -5.0])


julia> (identity ∘ m)([7])
3-element Array{Float64,1}:
1 Like

Thank you. I think this solves the problem.
It is probably impossible to do without allocating, sadly.

Not sure about that, depends on the type of input your function takes. If it is a vector, you can preallocate that in the above structure. If it is a tuple, you can use advanced techniques for inference and mixing the two vector.

That said, is allocation a bottleneck in your code? If you are doing anything nontrivial inside the function body, it should have a negligible cost.

It is a custom type so I can not preallocate it.
I will have to try out your solution and see what the performance looks like. It might well be that the cost of allocating is not significant.