Flux.jl Restrict Gradients to Non-Zero values in sparse layer

james-a-mcmanus · November 26, 2021, 4:16pm

Hi, I’m trying to make a custom network layer with sparse weights.
I only want to train the non-zero values of the layer, however when I try to create a gradient from this the gradient returns nothing.

Here is a MWE.

using Flux, SparseArrays

struct sparsewrapper{T}
	weights::SparseMatrixCSC{T, Int64}
end

Flux.trainable(l::sparsewrapper) = [l.weights.nzval]
layer = sparsewrapper(SparseArrays.sprand(100,100,0.2))
model(x) = layer.weights * x
loss(x,y) = Flux.mse(model(x),y)
g = Flux.gradient(()->loss(rand(100),rand(100)), Flux.params(layer))
println(g[layer.weights.nzval])

How do I use Flux.trainable to pick out the non-zero values.

Thanks in advance!
James

DrChainsaw · November 27, 2021, 10:47am

I’m guessing here that the AD (Zygote) does not try to differentiate the internals of the SparseMatrix multiplication operation and therefore never sees that layer.weights.nzval plays any part.

If you just point to weights as the trainable it seems like it is smart enough to give you a sparse gradient so it seems like what you want should work without any extra effort:

julia> Flux.trainable(l::sparsewrapper) = (l.weights,)

julia> g = Flux.gradient(()->loss(rand(100),rand(100)), Flux.params(layer))
Grads(...)

julia> g[layer.weights]
100×100 SparseMatrixCSC{Float64, Int64} with 1922 stored entries:
⠑⢎⠸⠐⡬⠀⠌⠓⠢⠐⠒⠱⠊⢈⠀⠂⠠⠂⡂⠐⢢⠀⠀⡐⣠⠂⠆⢈⢢⠀⠴⢉⠀⡧⢐⠅⠒⠂⠲⠄⠡⢤⠤⠄⠂⢀⡀⡙⢈⠶
⠡⠊⢘⠀⠂⢀⠅⠐⠀⢘⡃⠤⠠⠁⡈⠀⠄⢐⡠⠤⠐⠂⢄⡑⠐⡆⠂⠡⠑⡑⠣⢄⠂⡐⡁⠡⡅⠁⠜⡠⠂⠀⠀⣀⢓⣄⠀⠂⠄⠄
⠀⡈⢑⢀⠀⡀⡃⠐⠀⠂⠢⢵⠔⢁⠙⠀⡀⠂⢓⣈⡂⠀⡪⣀⠉⣀⢀⠰⠀⢊⡀⡀⢄⠄⠀⢨⢁⠐⠨⠄⠵⠁⠊⠈⡂⠀⠀⣀⠩⠀
⢁⡀⢁⠈⡀⠂⠈⠀⠂⢀⠀⠀⠉⢌⠀⠈⠀⠀⠨⠁⠉⠡⠀⣂⠀⠀⣠⠑⠆⠀⠃⠃⢀⡐⠨⢀⡀⠈⢰⢄⠠⠌⠀⠀⡔⠡⢄⠀⠠⡁
⠐⠠⠀⢀⠐⠄⠄⢐⡄⠀⠁⢇⢀⠡⠀⠀⠒⠠⠀⠂⠀⠒⠠⠀⢈⠀⢂⠁⠀⡊⠀⠆⠄⠀⡁⢀⢁⢂⢀⠂⠂⢐⠀⡕⠁⠀⠀⠁⡀⠀
⢆⠂⠁⠂⠨⠀⠑⠁⠊⠀⠐⡐⡈⠀⠀⠌⠈⠀⠀⡄⡔⡌⠉⠈⢰⠌⢬⠐⠚⢐⠆⠆⠒⠄⢀⠀⠄⠃⢈⠬⠔⠆⠀⠞⠴⢈⡄⠈⠒⠀
⢀⠄⢀⠠⡀⡈⠂⠅⠠⠐⠃⠑⠠⠸⠔⠢⠀⡀⠀⠂⡈⢆⠀⢐⠰⡂⠀⠀⠠⠀⠀⠁⣂⡀⡈⡒⠄⣈⠢⡂⠢⡁⠀⢌⠄⡂⠁⢀⠀⠀
⠅⠄⡉⠀⡀⠕⠠⠐⠤⠄⠐⢹⡀⡉⠄⠅⠀⢐⠅⡄⠬⢠⠄⢈⠀⠔⢀⡤⠁⠨⠂⢤⠈⠀⠄⠀⠀⡀⠁⠐⠈⣀⠀⠑⢀⠁⠀⢀⠒⠄
⠬⢄⠁⠀⠐⡄⡁⠀⡑⢠⠀⣀⠈⢀⢀⠠⢂⠀⠈⠘⠁⠦⠠⠄⠠⡃⠢⠀⢀⠀⠋⡡⣐⢀⠂⠄⢈⢄⠀⠠⠉⠠⠐⠀⠌⠡⠄⡪⠐⠫
⠁⠐⠀⢀⡂⠄⠂⡀⡘⠃⠃⢀⠈⡡⠀⠄⠀⢀⢕⠠⡢⢃⠀⠀⠀⠄⢊⠁⢊⠀⢠⠅⢂⡁⢀⠕⠈⢀⠬⠠⠂⡁⠠⢀⠘⠁⡁⢁⡂⣑
⠈⢌⠠⡆⠤⠁⢂⠑⠸⢂⣐⠌⠀⠀⢐⢔⢐⠁⡀⢀⡆⢑⠁⠄⠀⢐⠀⠌⡀⡠⠀⠪⠐⡁⠀⠈⠀⠂⠈⡀⠐⠀⡐⠂⠆⠔⠒⠀⠘⠀
⠆⡴⠀⠒⠂⠀⠄⢀⠢⠂⠀⠂⠢⠀⠄⠀⡈⠦⡂⠋⢖⢠⡀⢙⠈⠁⠈⠅⠑⠉⠂⠆⡇⢈⠀⠈⡀⠁⢢⠁⠨⠠⢒⠈⠨⡀⢀⡥⠠⡃
⢠⠀⠆⢀⠄⢴⠢⠢⡂⢈⠂⠁⠈⠀⠂⠊⣢⢠⠀⠐⡉⠡⠨⠀⠎⠂⠰⢂⠱⠈⡰⠪⠑⢀⠀⡂⠐⠈⠂⠀⡌⡂⠈⢄⠈⠂⢀⠢⠄⠉
⢷⣂⠈⠈⡦⠊⠠⠴⢠⠀⠐⢄⢩⢶⠈⠁⠒⠀⠀⢠⡂⢚⠐⢱⡠⣅⢖⠠⠘⢑⠀⡑⡀⠱⢔⠁⠂⢀⠐⠤⠂⠀⠀⢘⢁⠆⠑⡀⠃⠀
⠀⠄⣄⡀⢌⡀⠤⡌⡔⠐⠂⠁⠨⠐⠑⠄⢉⠈⢚⠄⠆⠡⠢⡀⡣⢴⠀⠀⠠⢂⡄⠬⢀⢰⣀⠀⠁⢀⠀⣀⠁⣁⠈⠀⠠⠌⡀⠅⢁⡀
⠤⠀⠀⢁⠀⠀⠔⠀⠩⠀⠀⠐⡐⡈⠀⠁⡒⠀⠀⠄⠰⠉⠡⠀⡁⠅⠂⠁⢂⠀⡀⢁⠀⠴⣀⠀⠤⠰⠢⠍⡀⠤⠀⣄⡉⠠⠀⠄⡂⠄
⡌⠠⠀⡐⢈⠧⢂⠂⠂⡃⠀⠈⠠⠃⠂⠴⠘⡀⠀⠘⠠⠁⢊⡄⠊⣁⠁⠂⢂⠁⢰⠀⠄⠀⠈⠥⠠⢒⢈⠀⠀⠱⠤⠀⠠⡘⠀⠈⠐⠀
⠑⡀⠰⣀⠈⠈⠀⢈⡁⠐⢠⡊⡂⡀⠨⠂⠌⠀⣁⣑⠒⣁⠀⠪⠠⠂⣀⠢⡆⠀⢊⠆⠄⠠⢈⠈⠈⠂⡄⠸⠠⠂⠠⠊⠀⠀⠀⠼⠘⠐
⠠⠀⢀⠀⠆⡂⠀⡐⠈⠀⠂⠑⠀⠀⠻⠤⠥⠅⠢⢂⠀⠑⢡⠀⠤⢠⠁⠀⠠⠀⢤⠑⠄⢐⢀⠈⠑⠀⠰⠰⠃⠠⠁⠀⡈⠂⡒⠀⢠⠠
⠘⠓⠀⠀⣂⠁⠠⡐⠨⠂⠰⡀⠌⠰⡢⢁⠠⠠⡈⡄⢐⡈⠡⢀⢀⡈⡂⢀⡅⠐⠄⣄⢃⠑⠓⢁⡰⢊⡠⣌⢠⠀⠀⠰⡈⠘⢀⡒⠆⠊
⢁⠐⢀⠈⠝⠠⠂⢂⠂⠀⠀⠀⢨⠲⡀⢄⠂⣀⣄⡠⡤⠀⠀⠀⢁⠁⡀⠀⠂⢐⠄⠀⠌⠀⠡⠂⠄⡐⢀⣀⡒⢒⠀⡀⢠⠌⢀⡀⢂⡅
⠀⠙⠔⢐⢀⢙⠀⠡⠤⣆⡂⠄⢈⠀⠉⠂⢲⠐⢁⠀⠅⠐⠔⠪⠀⠐⡘⠊⠃⢍⠈⠒⠀⠤⠁⠀⡀⢀⢐⡄⡔⠄⠌⠀⠀⠆⠁⡄⠀⠄
⡀⠡⢒⠢⡀⠈⠀⢠⡢⠀⡀⠂⠈⢘⠁⢐⠀⢑⠀⠠⠂⠢⡙⠐⡑⡠⢈⠀⠉⠄⡅⠀⠁⠠⠸⡀⣉⠠⠁⠴⠁⡐⠲⢸⢔⠯⠂⠄⠠⡁
⠘⠂⠑⢐⢄⠀⠠⠀⠠⠁⠌⡈⠄⠈⠍⡸⠙⢀⠊⠘⡀⠁⠐⢅⠐⢒⠀⠡⠠⢀⡀⡠⠈⢀⠀⠠⠠⠌⢈⢀⡑⠂⠁⢀⠂⠁⠀⠄⠈⠂
⠘⡀⠁⠁⠅⠀⡂⠩⢁⠁⠀⠀⡠⢠⢀⠠⠤⢀⢀⠃⡠⢑⢀⠁⠒⠒⠢⠄⠦⠁⠀⠁⠠⠂⠁⡈⡨⠀⠘⠬⠀⠄⠀⠡⢈⡀⢄⠀⠂⣀

dhairyagandhi96 · December 1, 2021, 7:26pm

Also prefer to use Flux.@functor over Flux.trainable.

james-a-mcmanus · December 3, 2021, 4:38pm

Thanks for the reply, that’s odd, I try the same thing but
g[layer.weights] returns a dense matrix. (This is with Flux v0.11.3)

ToucheSir · December 4, 2021, 1:55am

That’s a pretty old version of Flux. I believe only the more recent 0.12.x versions can return non-dense gradients

james-a-mcmanus · January 5, 2022, 11:07am

Oh yes, upgrading the Flux version fixed this - thanks everyone!

Topic		Replies	Views
Sparse Feed Forward NN Machine Learning flux , arrays , zygote	3	1320	July 25, 2021
Sparse loss with Flux Machine Learning flux	3	96	December 9, 2024
Will Flux/Zygote compute gradients sparsely? Machine Learning	7	546	September 4, 2021
How to implement a sparse decoder layer in Flux.jl? New to Julia question , flux	1	91	December 17, 2024
Gradient and update of custom struct with Flux New to Julia flux	2	1014	August 11, 2020

Flux.jl Restrict Gradients to Non-Zero values in sparse layer

Related topics