Flux params restrictive

cortner · March 4, 2022, 10:06pm

I’m working on integrating the Zygote.Params idea into my codes to simplify parameter wrangling, and I was struck by the fact that

params!(p::Params, x::AbstractArray{<:Number}, seen = IdSet()) = push!(p, x)

only arrays of numbers are allowed as parameter fields, but not arrays of arrays, in particular I’m interested in using e.g. Vector{SVector{N, T}}. I get it is equivalent to matrix, but this is just an example. In fact, I’m even more interested in Array{N, T} where T is an abstract array such as struct MyT a::Float64; b::SVector{3, Float64} end

Is there a strong reason to only allow x::AbstractArray{<:Number}? What are the pitfalls if I try to extend this behaviour?

cortner · March 5, 2022, 12:06am

hm … I was just told in a Flux issue that Flux is in fact trying to drop implicit parameters altogether. Is there a discussion somewhere of why and what will it be replaced with?

ToucheSir · March 5, 2022, 1:20am

I’m not sure of the original intent, but one thing this allows is the use of Vectors for holding a list of layers. If it treated all arrays as parameters, the AbstractVector version of Chain wouldn’t work, for example.

As an easy workaround, you can add params yourself to the collection by push!ing them manually.

cortner · March 5, 2022, 1:25am

Ok but I don’t want to start using a paradigm that’s in the way out anyhow. What Should I do instead? How should I structure my models?

ToucheSir · March 5, 2022, 1:28am

“On the way out” is not going to happen for some months yet, and I imagine in the meantime you’ll want to get work done . We’ll make sure to provide a migration path when the time comes.

cortner · March 5, 2022, 3:14am

thanks for answering me in two places at once. I want to run one more test and will then maybe have one more clarifying question.

cortner · March 8, 2022, 6:07am

@ToucheSir – thanks for your comments so far; I now have one more question and would be very grateful for your input. An important piece of background information here is that I am not planning to use standard ML models through flux, but I’m working a fairly specialized class of models related to equivariant message passing, and am really writing all components from scratch. However, I can see that in the future I might leverage some of GeomtricFlux or even use a few Flux components to give additional flexibility to my models.

My alternative plan to using the implicit parameters of Flux / Zygote is to specify parameters through a nested NamedTuple, e.g., like this:

mutable struct ACE{T, TB}
   basis::TB 
   c::Vector{T}
end

params(ace::ACE) = (basis = params(ace.basis), c = ace.c)

function set_params!(ace::ACE, p::NamedTuple)
   set_params!(ace.basis, p.basis)
   ace.c = p.c
   return ace 
end

In this case, a Zygote call such as

p0 = params(ace)
g = Zygote.gradient( p -> somefunction( set_params!(ace, p), X ), p0 )

will return a gradient g::NamedTuple with precisely the same structure as the p0 params NamedTuple. I then wrap them into a type that treats them as abstract vectors so I can add, multiply etc and do optimization.

I’ve implemented a prototype for this and it works quite well. Going a step further, I’ve also experimented with specifying rrules where the gradient w.r.t. the model is the gradient w.r.t. parameters. This has worked equally well. All this takes a little more overhead than the implicit parameters, but since I have to write custom rrules for all my model components anyhow I’m still considering it.

Arguably one advantage of this approach is that I’m not limited to using only Arrays for storing parameters but have much more flexibility.

Do you have any thoughts on this approach and how well it might interact with the future plans for Flux, or is your advise still to stick with implicit parameters for now and then transition once Flux transitions?

ToucheSir · March 8, 2022, 5:06pm

I would recommend against overriding params directly since that causes confusion with Flux’s definition. Case in point, you’re actually using “explicit” params here since p0 is a NamedTuple and not a Params. So assuming you’re getting the gradients you want, this should be reasonably future-proof.

cortner · March 8, 2022, 8:27pm

Thank you!

Just to make sure I understand correctly - you don’t see a problem with the NamedTuple style explicit parameters and this might be more future proof than going back to implicit Flux paramters?

But you are suggesting to name params it something else so it doesn’t interfere with Flux.

cortner · March 9, 2022, 2:45pm

Thank you again - you’ve been most helpful!

Topic		Replies	Views
Zygote and StructArrays General Usage differentiation , flux , zygote , structarrays	6	1294	June 7, 2020
Flux.params of a matrix implemented as a struct Machine Learning zygote	11	979	May 17, 2021
Zygote docs suggest flux is out of date? Machine Learning	0	350	March 29, 2020
Lux (And Flux), "parallel" Network Input. When Input is flat, Zygote gradient works, when input is not flat it doesn't Machine Learning flux , zygote , lux	10	682	February 5, 2024
Flux.params does not recognize parameters with `x -> layer(x)` syntax Machine Learning flux	4	1158	September 18, 2020

Flux params restrictive

Related topics