Getting ChainRules to work with custom Dense and Chain regularization functions

user22 · April 11, 2022, 8:09am

I have encountered a few issues when trying to implement a regularizer function and its pullback in ChainRules to a mildly customized Flux Dense layer and Chain. For simplicity and testing purposes, I just wrote a basic L2-regularizer function for the weights of a standard Dense layer:

using ChainRulesCore
using Flux
using Random 

function weightregularization(nn::Dense)
    return sum((nn.weight).^2.0)

end

function ChainRulesCore.rrule(::typeof(weightregularization), nn::Dense)  
    y = weightregularization(nn)
    project_w = ProjectTo(nn.weight)
    function weightregularization_pullback(ȳ)
        pullb = Tangent{Dense}(weight=project_w(ȳ * 2.0*nn.weight),   bias=ZeroTangent(), σ= NoTangent())
        return NoTangent(), pullb
    end
    return y, weightregularization_pullback
end

It seems to work only partially. Calling gradient works fine, but there is still something incorrect in the pullback definition since ChainRulesTestUtils crashes with the custom pullback. It seems to try to calculate the pullback of the σ field of a Dense struct, even though there is NoTangent() in the pullback definition. What if I have a custom layer that has mode fields (which should be treated as constants that define the model), how can I make sure that TestUtils do not try to evaluate their pullback?

nn = Dense(randn(1,2), randn(1), tanh)
gr = gradient(weightregularization, nn) # Works
test_rrule(weightregularization,nn) # Crashes with MethodError: no method matching zero(::typeof(tanh))

test_rrule also crashes with TypeError: in new, expected Vector{Float32}, got a value of type Vector{Float64}, if the Dense layer weights are Float32s instead of Float64s, although I have ProjectTo in the pullback function.

Finally, the following custom regularization and pullback for a Chain crash both the gradient and test_rrule:


function totalregularization(ch::Chain{T}) where T<:Tuple{Vararg{Dense}}
    a = 0.0
    for i in ch
        a = a + sum(i.weight.^2.0)
    end
    return a

end

function ChainRulesCore.rrule(::typeof(totalregularization), ch::Chain{T})  where T<:Tuple{Vararg{Dense}}
    y = totalregularization(ch)
    function totalregularization_pullback(ȳ)
        totalpullback = [] 
        N = length(ch)
        for i = 1:N
            project_w = ProjectTo(nn.weight)
            push!(totalpullback, Tangent{Dense}(weight= project_w(ȳ * 2.0*ch[i].weight), bias = ZeroTangent(), σ= NoTangent()))
        end
        
        pullb = Tangent{Chain{T}}(totalpullback...)
        return NoTangent(), pullb
    end
    return y, totalregularization_pullback
end


l1 = Dense(randn(2,2), randn(2), tanh)
l2 = Dense(randn(1,2), randn(1), tanh)
ch = Chain(l1,l2)
gr = gradient(totalregularization, ch))  # Crashes with

MethodError: no method matching canonicalize(::Tangent{Chain{Tuple{Dense{typeof(tanh), Matrix{Float64}, Vector{Float64}}, Dense{typeof(tanh), Matrix{Float64}, Vector{Float64}}}}, Tuple{Tangent{Dense, NamedTuple{(:weight, :bias, :σ), Tuple{Matrix{Float32}, ZeroTangent, NoTangent}}}, Tangent{Dense, NamedTuple{(:weight, :bias, :σ), Tuple{Matrix{Float32}, ZeroTangent, NoTangent}}}}})

test_rrule(totalregularization,ch) # Crashes with 

Got exception outside of a @test
  return type Tuple{NoTangent, Tangent{Chain{Tuple{Dense{typeof(tanh), Matrix{Float64}, Vector{Float64}}, Dense{typeof(tanh), Matrix{Float64}, Vector{Float64}}}}, Tuple{Tangent{Dense, NamedTuple{(:weight, :bias, :σ), Tuple{Matrix{Float32}, ZeroTangent, NoTangent}}}, Tangent{Dense, NamedTuple{(:weight, :bias, :σ), Tuple{Matrix{Float32}, ZeroTangent, NoTangent}}}}}} does not match inferred return type Tuple{NoTangent, Tangent{Chain{Tuple{Dense{typeof(tanh), Matrix{Float64}, Vector{Float64}}, Dense{typeof(tanh), Matrix{Float64}, Vector{Float64}}}}}}

What am I doing wrong? I understood that a custom reverse rule for a regularization function of for a Chain variable needs to defined through a structural tangent. That is, the outermost type must of something like Tangent{Chain{T}}. Similarly, the tangents of the layers must have the type of Tangent{Dense{S}}.

user22 · April 20, 2022, 5:11pm

There seems to be some discussions regarding similar issues of implementing custom rules for Dense and Chain, like https://github.com/JuliaDiff/ChainRulesCore.jl/issues/4 , but I do not follow if it is currently straightforward to achieve.

ToucheSir · April 20, 2022, 7:46pm

That issue is about callable structs, which does not apply to your example. I would say that if the rrule works outside of test_rrule, this is likely a limitation in ChainRulesTestUtils and may be deserving of a GH issue.

Topic		Replies	Views
ChainRulesCore: Custom adjoint ignored in certain cases Machine Learning adjoint , autodiff , chainrulescore	3	607	September 7, 2021
Custom rrule for Feedback Alignment Machine Learning question , flux , chainrulescore	2	521	November 27, 2021
Problems defining rrule for Flux layer General Usage flux , zygote	1	423	September 10, 2021
ChainRulesCore.rrule for custom struct: does the pullback need to support Composite explicitly? General Usage zygote	5	560	March 8, 2021
How to make Zygote's gradient function work with a custom rrule Machine Learning question	0	182	April 13, 2023

Getting ChainRules to work with custom Dense and Chain regularization functions

Related topics