ForwarDiff of a function depending on a mask vector

Hi all,

this is my first post here!

I’m still very new to Julia and I’m enjoying a lot learning about it. I have a simple question for which I couldn’t find an answer, I hope it wasn’t posted already somewhere else.

I’ll make a simple example that shows the problem (the actual code is much more complex, but this is the basic idea): I’m trying to differentiate the following function wrt to x

function my_fun(x,xvals,ids)

out = copy(xvals)
out[ids] = x[ids]

return out


Basically, the function returns a vector, some element of that vector are constant (xvals) while others (defined by the mask ids) depend on the input x.

If I have, for example, x = [0 0 0 0], xvals = [1 2 3 4] and ids=[1 2], the function seems to work correctly, but when I try to call ForwarDiff to differentiate it wrt x

zz = ForwardDiff.jacobian(x → my_fun(x,xvals,ids),x)

I get MethodError: no method matching Float64(::ForwardDiff.Dual{ForwardDiff.Tag{var"#5#6", Int64}, Int64, 4})

I can’t really understand what the problem is, since if I try with another function that only returns the selected components:

function my_fun(x,ids)
out = x[ids]
return out


the differentiation works without problems.

Can someone explain me what’s wrong?


Not at the computer right now, but I think the problem is that you create your array “out” as containing floats and then you want to push dual numbers into it.

Try adding
out = convert.(eltype(x), out)
In between the lines of your function.

Hi vettert,

your reply helped me a lot. It was still not exact, as the type mismatch happens directly when I assign the values x[ids] into a vector of duals out, so I figured out the conversion has to happen directly at the creation of the vector out.

For further reference, this function can be correctly differentiated wrt x:

function my_fun(x,xvals,ids)
out = convert.(eltype(x),xvals)
out[ids] = x[ids]
return out

I find it a bit ugly, however. Anyone has a suggestion for improving its?


Welcome to the Julia forum, Lorenzo!

I think this is as pretty as it gets. The only way to improve on this is to look at the places where you use this function. Semantically, it seems a bit odd that you may want to copy x values into xvals if you cannot be sure that x and xvals have the same eltype.

Hi Lorenzo,
Great. It was a little late when I typed my post, but you managed to arrive at the right solution from it. Anyhow, I agree with ettersi that it looks a little odd to do this at all. Maybe context on why you are doing this would be illuminating?

Hi ettersi and vettert,

thanks to both of you.

As I was saying, this is a much simplified function that shows the problem I was getting.
What I’m actually doing is re-writing in Julia my old Matlab code, which solves optimal control problems using the Direct Finite elements transcription method.
In my old code, I had the possibility of specifying boundary conditions either as simple constant values (i.e. final velocity=0) and as general nonlinear functions of the boundary states (ex xf[2] = xi[1]^3/xf[4]).
Since I don’t want to have redundant variables for the boundary states that are imposed (the first case), my solution vector does not include them, and I pass the constant imposed terms as part of a user defined type that holds several other information (the transcription).
Thus, when I want to impose the general nonlinear boundary constraints, I need to reconstruct the boundary states: some of those components are constants and come from the constant user defined type, while others are stored inside the solution vector.
To solve the problem I then need the Jacobian of that nonlinear boundary terms wrt the solution vector.

Hope this makes it a bit clearer.

If I understand you correctly, your problem is somewhat like the following:

# Dummy input
c = 1
f = x->x^2
n = 5
idx = rand(1:5,3)

# Assemble the boundary condition
b = fill(c,n)
b[idx] .= f.(idx)

If this is approximately representative, then I would recommend to allocate b with the correct eltype right from the start. Perhaps the best way to do this would be the following:

T = promote_type(typeof(c), typeof(f(idx[1])))
b = fill(convert(T,c),n)

If @ettersi’s example is correct, you may also want to think about doing this without mutating b, simply make it correctly the first time. Something like b = T[i ∈ idx ? f(i) : c for i in 1:n] – unfortunately without the T you get different promotion behaviour, which will tend to make an abstract type. (E.g. with f = sqrt the dummy input will make Vector{Real}, which will be slow.) If f is cheap, you could also do something like b = [ifelse(i ∈ idx, promote(f(i), c)...) for i in 1:n].

The situation is more like

b = fill(c,n)
b[idx] = x[idx]
f(b) (which is nonlinear and accepts a vector, actually multiple vectors built as the above two lines)

If I understand correctly you’re getting the type x will be converted to, and generate directly out with the correct type. I guess this has a performance benefit to my a posteriori conversion, right?

OK, then a non-mutating solution might look like mask = eachindex(x) .∈ Ref(idx); b = ifelse.(mask, x, oftype.(x,c)). Not super pretty, though. Maybe defining a function like promote_ifelse(c, x, y) = ifelse(c, promote(x,y)...) and then broadcasting that promote_ifelse.(mask, x, c) is tidier? Or define some utility like this (here complex x would have the same difficulty of not fitting into b=fill(13, 5))

julia> choose(ind, x::AbstractVector, y::Number) = map(eachindex(x)) do i
         xp, yp = promote(x[i],y)
         i in ind ? xp : yp

julia> choose([1,3], rand(5).+im, 13)
5-element Vector{ComplexF64}:
 0.23760044132852687 + 1.0im
                13.0 + 0.0im
 0.25750298249695736 + 1.0im
                13.0 + 0.0im
                13.0 + 0.0im

Your non-mutating version is O(n * length(idx)) while my mutating version is O(n + length(idx)). I’m assuming the reason why you want to avoid mutation is because this may lead to issues with reverse mode differentiation?

That’s true. None of my variants end up very neat; having to call promote_type etc. yourself always feels a bit low-level, so I just wondered if there was some tidier high-level way… but perhaps there isn’t.

I tried with another approach, much simpler than the ones you suggested, but in the same spirit of creating the variable with the right type straight away

function my_fun(x,xvals,ids,ids2)
out = zeros(eltype(x),length(xvals))
out[ids] = x[ids]
out[ids2] = xvals[ids2]
return out

Not sure if it’s more readable than my option1, but it probably has better performance since it creates the right variable straight away without converting it