Flux params restrictive

I’m working on integrating the Zygote.Params idea into my codes to simplify parameter wrangling, and I was struck by the fact that

params!(p::Params, x::AbstractArray{<:Number}, seen = IdSet()) = push!(p, x)

only arrays of numbers are allowed as parameter fields, but not arrays of arrays, in particular I’m interested in using e.g. Vector{SVector{N, T}}. I get it is equivalent to matrix, but this is just an example. In fact, I’m even more interested in Array{N, T} where T is an abstract array such as struct MyT a::Float64; b::SVector{3, Float64} end

Is there a strong reason to only allow x::AbstractArray{<:Number}? What are the pitfalls if I try to extend this behaviour?

hm … I was just told in a Flux issue that Flux is in fact trying to drop implicit parameters altogether. Is there a discussion somewhere of why and what will it be replaced with?

I’m not sure of the original intent, but one thing this allows is the use of Vectors for holding a list of layers. If it treated all arrays as parameters, the AbstractVector version of Chain wouldn’t work, for example.

As an easy workaround, you can add params yourself to the collection by push!ing them manually.

Ok but I don’t want to start using a paradigm that’s in the way out anyhow. What Should I do instead? How should I structure my models?

“On the way out” is not going to happen for some months yet, and I imagine in the meantime you’ll want to get work done :wink: . We’ll make sure to provide a migration path when the time comes.

thanks for answering me in two places at once. I want to run one more test and will then maybe have one more clarifying question.

@ToucheSir – thanks for your comments so far; I now have one more question and would be very grateful for your input. An important piece of background information here is that I am not planning to use standard ML models through flux, but I’m working a fairly specialized class of models related to equivariant message passing, and am really writing all components from scratch. However, I can see that in the future I might leverage some of GeomtricFlux or even use a few Flux components to give additional flexibility to my models.

My alternative plan to using the implicit parameters of Flux / Zygote is to specify parameters through a nested NamedTuple, e.g., like this:

mutable struct ACE{T, TB}
   basis::TB 
   c::Vector{T}
end

params(ace::ACE) = (basis = params(ace.basis), c = ace.c)

function set_params!(ace::ACE, p::NamedTuple)
   set_params!(ace.basis, p.basis)
   ace.c = p.c
   return ace 
end 

In this case, a Zygote call such as

p0 = params(ace)
g = Zygote.gradient( p -> somefunction( set_params!(ace, p), X ), p0 )

will return a gradient g::NamedTuple with precisely the same structure as the p0 params NamedTuple. I then wrap them into a type that treats them as abstract vectors so I can add, multiply etc and do optimization.

I’ve implemented a prototype for this and it works quite well. Going a step further, I’ve also experimented with specifying rrules where the gradient w.r.t. the model is the gradient w.r.t. parameters. This has worked equally well. All this takes a little more overhead than the implicit parameters, but since I have to write custom rrules for all my model components anyhow I’m still considering it.

Arguably one advantage of this approach is that I’m not limited to using only Arrays for storing parameters but have much more flexibility.

Do you have any thoughts on this approach and how well it might interact with the future plans for Flux, or is your advise still to stick with implicit parameters for now and then transition once Flux transitions?

I would recommend against overriding params directly since that causes confusion with Flux’s definition. Case in point, you’re actually using “explicit” params here since p0 is a NamedTuple and not a Params. So assuming you’re getting the gradients you want, this should be reasonably future-proof.

Thank you!

Just to make sure I understand correctly - you don’t see a problem with the NamedTuple style explicit parameters and this might be more future proof than going back to implicit Flux paramters?

But you are suggesting to name params it something else so it doesn’t interfere with Flux.

1 Like

Thank you again - you’ve been most helpful!