@ToucheSir – thanks for your comments so far; I now have one more question and would be very grateful for your input. An important piece of background information here is that I am not planning to use standard ML models through flux, but I’m working a fairly specialized class of models related to equivariant message passing, and am really writing all components from scratch. However, I can see that in the future I might leverage some of GeomtricFlux or even use a few Flux components to give additional flexibility to my models.
My alternative plan to using the implicit parameters of Flux / Zygote is to specify parameters through a nested NamedTuple
, e.g., like this:
mutable struct ACE{T, TB}
basis::TB
c::Vector{T}
end
params(ace::ACE) = (basis = params(ace.basis), c = ace.c)
function set_params!(ace::ACE, p::NamedTuple)
set_params!(ace.basis, p.basis)
ace.c = p.c
return ace
end
In this case, a Zygote call such as
p0 = params(ace)
g = Zygote.gradient( p -> somefunction( set_params!(ace, p), X ), p0 )
will return a gradient g::NamedTuple
with precisely the same structure as the p0
params NamedTuple. I then wrap them into a type that treats them as abstract vectors so I can add, multiply etc and do optimization.
I’ve implemented a prototype for this and it works quite well. Going a step further, I’ve also experimented with specifying rrules where the gradient w.r.t. the model is the gradient w.r.t. parameters. This has worked equally well. All this takes a little more overhead than the implicit parameters, but since I have to write custom rrules for all my model components anyhow I’m still considering it.
Arguably one advantage of this approach is that I’m not limited to using only Arrays for storing parameters but have much more flexibility.
Do you have any thoughts on this approach and how well it might interact with the future plans for Flux, or is your advise still to stick with implicit parameters for now and then transition once Flux transitions?