Mutate Zygote Gradients with a Custom Mask Before Update?

boltzmannbrain · March 15, 2021, 9:56pm

I’m new to Flux/Zygote and I’m trying to train a model with dense layers such that the associated weight matrices are lower triangular. It seems like the way to do this is to write a custom train! function and zero the gradients associated with “unwanted” entries of the weight matrices, e.g. via broadcast multiplication by the appropriate mask. I can quite figure out exactly how to accomplish this though since it seems like the Zygote gradients are immutable. Here is my current code:

function custom_train!(loss, ps, data, opt; cb = () -> ())
   ps = Params(ps)
    for d in data
        train_loss, back = pullback(() -> loss(d...), ps)
        gs = back(one(train_loss))
        # zero all gradients outside of lower triangular weight matrices in each dense layer
        for i in 1:2:length(ps)
            nrows, ncols = size(ps[i])
            mask = [x >= y ? 1.0 : 0. for x in 1:nrows, y in 1:ncols]
            gs[ps[i]] .*= nograd(mask)
        end
        update!(opt, ps, gs)
        cb()
    end
end

This throws the error: "ERROR: *Only reference types can be differentiated with Params* "

Is there some way to “detach” the gradients (e.g. via converting them to a new type), mutating them via the mask, and then converting them back to Zygote.gradient objects?

Any advice is very much appreciated, especially if there is a better way to impose arbitrary constraints of weight matrices – I’ve been down a rabbit hole trying to figure out how to make this work for hours…

ToucheSir · March 25, 2021, 5:07am

There’s no need to do that for the code you have above, because the only place gradients are “tracked” is inside the callback to pullback. In other words, there’s no such thing as a Zygote.gradient object and gs is just a collection of arrays .

With that in mind, the error has to be occurring within the call stack of loss somewhere. If you share a complete MWE with the loss and all the auxiliary training code, we can troubleshoot that part.

Topic		Replies	Views
Zygote.gradient(): Mutating arrays is not supported General Usage	1	786	August 18, 2020
Mutating versus non-mutating arrays for Zygote Gradient General Usage	6	347	December 25, 2022
Idea to make Zygote support mutation in easy cases Machine Learning zygote , ad	6	475	December 27, 2022
Zygote - parametrize matrix such that gradient is only performed on selected coefficients Machine Learning question , zygote , matrices , natural-gradient , autodiff	1	546	June 5, 2021
Gradient of gradient Machine Learning	9	1264	November 6, 2020

Mutate Zygote Gradients with a Custom Mask Before Update?

Related topics