Flux.jl Restrict Gradients to Non-Zero values in sparse layer

Hi, I’m trying to make a custom network layer with sparse weights.
I only want to train the non-zero values of the layer, however when I try to create a gradient from this the gradient returns nothing.

Here is a MWE.

using Flux, SparseArrays

struct sparsewrapper{T}
	weights::SparseMatrixCSC{T, Int64}
end

Flux.trainable(l::sparsewrapper) = [l.weights.nzval]
layer = sparsewrapper(SparseArrays.sprand(100,100,0.2))
model(x) = layer.weights * x
loss(x,y) = Flux.mse(model(x),y)
g = Flux.gradient(()->loss(rand(100),rand(100)), Flux.params(layer))
println(g[layer.weights.nzval])

How do I use Flux.trainable to pick out the non-zero values.

Thanks in advance!
James

1 Like

I’m guessing here that the AD (Zygote) does not try to differentiate the internals of the SparseMatrix multiplication operation and therefore never sees that layer.weights.nzval plays any part.

If you just point to weights as the trainable it seems like it is smart enough to give you a sparse gradient so it seems like what you want should work without any extra effort:

julia> Flux.trainable(l::sparsewrapper) = (l.weights,)

julia> g = Flux.gradient(()->loss(rand(100),rand(100)), Flux.params(layer))
Grads(...)

julia> g[layer.weights]
100×100 SparseMatrixCSC{Float64, Int64} with 1922 stored entries:
⠑⢎⠸⠐⡬⠀⠌⠓⠢⠐⠒⠱⠊⢈⠀⠂⠠⠂⡂⠐⢢⠀⠀⡐⣠⠂⠆⢈⢢⠀⠴⢉⠀⡧⢐⠅⠒⠂⠲⠄⠡⢤⠤⠄⠂⢀⡀⡙⢈⠶
⠡⠊⢘⠀⠂⢀⠅⠐⠀⢘⡃⠤⠠⠁⡈⠀⠄⢐⡠⠤⠐⠂⢄⡑⠐⡆⠂⠡⠑⡑⠣⢄⠂⡐⡁⠡⡅⠁⠜⡠⠂⠀⠀⣀⢓⣄⠀⠂⠄⠄
⠀⡈⢑⢀⠀⡀⡃⠐⠀⠂⠢⢵⠔⢁⠙⠀⡀⠂⢓⣈⡂⠀⡪⣀⠉⣀⢀⠰⠀⢊⡀⡀⢄⠄⠀⢨⢁⠐⠨⠄⠵⠁⠊⠈⡂⠀⠀⣀⠩⠀
⢁⡀⢁⠈⡀⠂⠈⠀⠂⢀⠀⠀⠉⢌⠀⠈⠀⠀⠨⠁⠉⠡⠀⣂⠀⠀⣠⠑⠆⠀⠃⠃⢀⡐⠨⢀⡀⠈⢰⢄⠠⠌⠀⠀⡔⠡⢄⠀⠠⡁
⠐⠠⠀⢀⠐⠄⠄⢐⡄⠀⠁⢇⢀⠡⠀⠀⠒⠠⠀⠂⠀⠒⠠⠀⢈⠀⢂⠁⠀⡊⠀⠆⠄⠀⡁⢀⢁⢂⢀⠂⠂⢐⠀⡕⠁⠀⠀⠁⡀⠀
⢆⠂⠁⠂⠨⠀⠑⠁⠊⠀⠐⡐⡈⠀⠀⠌⠈⠀⠀⡄⡔⡌⠉⠈⢰⠌⢬⠐⠚⢐⠆⠆⠒⠄⢀⠀⠄⠃⢈⠬⠔⠆⠀⠞⠴⢈⡄⠈⠒⠀
⢀⠄⢀⠠⡀⡈⠂⠅⠠⠐⠃⠑⠠⠸⠔⠢⠀⡀⠀⠂⡈⢆⠀⢐⠰⡂⠀⠀⠠⠀⠀⠁⣂⡀⡈⡒⠄⣈⠢⡂⠢⡁⠀⢌⠄⡂⠁⢀⠀⠀
⠅⠄⡉⠀⡀⠕⠠⠐⠤⠄⠐⢹⡀⡉⠄⠅⠀⢐⠅⡄⠬⢠⠄⢈⠀⠔⢀⡤⠁⠨⠂⢤⠈⠀⠄⠀⠀⡀⠁⠐⠈⣀⠀⠑⢀⠁⠀⢀⠒⠄
⠬⢄⠁⠀⠐⡄⡁⠀⡑⢠⠀⣀⠈⢀⢀⠠⢂⠀⠈⠘⠁⠦⠠⠄⠠⡃⠢⠀⢀⠀⠋⡡⣐⢀⠂⠄⢈⢄⠀⠠⠉⠠⠐⠀⠌⠡⠄⡪⠐⠫
⠁⠐⠀⢀⡂⠄⠂⡀⡘⠃⠃⢀⠈⡡⠀⠄⠀⢀⢕⠠⡢⢃⠀⠀⠀⠄⢊⠁⢊⠀⢠⠅⢂⡁⢀⠕⠈⢀⠬⠠⠂⡁⠠⢀⠘⠁⡁⢁⡂⣑
⠈⢌⠠⡆⠤⠁⢂⠑⠸⢂⣐⠌⠀⠀⢐⢔⢐⠁⡀⢀⡆⢑⠁⠄⠀⢐⠀⠌⡀⡠⠀⠪⠐⡁⠀⠈⠀⠂⠈⡀⠐⠀⡐⠂⠆⠔⠒⠀⠘⠀
⠆⡴⠀⠒⠂⠀⠄⢀⠢⠂⠀⠂⠢⠀⠄⠀⡈⠦⡂⠋⢖⢠⡀⢙⠈⠁⠈⠅⠑⠉⠂⠆⡇⢈⠀⠈⡀⠁⢢⠁⠨⠠⢒⠈⠨⡀⢀⡥⠠⡃
⢠⠀⠆⢀⠄⢴⠢⠢⡂⢈⠂⠁⠈⠀⠂⠊⣢⢠⠀⠐⡉⠡⠨⠀⠎⠂⠰⢂⠱⠈⡰⠪⠑⢀⠀⡂⠐⠈⠂⠀⡌⡂⠈⢄⠈⠂⢀⠢⠄⠉
⢷⣂⠈⠈⡦⠊⠠⠴⢠⠀⠐⢄⢩⢶⠈⠁⠒⠀⠀⢠⡂⢚⠐⢱⡠⣅⢖⠠⠘⢑⠀⡑⡀⠱⢔⠁⠂⢀⠐⠤⠂⠀⠀⢘⢁⠆⠑⡀⠃⠀
⠀⠄⣄⡀⢌⡀⠤⡌⡔⠐⠂⠁⠨⠐⠑⠄⢉⠈⢚⠄⠆⠡⠢⡀⡣⢴⠀⠀⠠⢂⡄⠬⢀⢰⣀⠀⠁⢀⠀⣀⠁⣁⠈⠀⠠⠌⡀⠅⢁⡀
⠤⠀⠀⢁⠀⠀⠔⠀⠩⠀⠀⠐⡐⡈⠀⠁⡒⠀⠀⠄⠰⠉⠡⠀⡁⠅⠂⠁⢂⠀⡀⢁⠀⠴⣀⠀⠤⠰⠢⠍⡀⠤⠀⣄⡉⠠⠀⠄⡂⠄
⡌⠠⠀⡐⢈⠧⢂⠂⠂⡃⠀⠈⠠⠃⠂⠴⠘⡀⠀⠘⠠⠁⢊⡄⠊⣁⠁⠂⢂⠁⢰⠀⠄⠀⠈⠥⠠⢒⢈⠀⠀⠱⠤⠀⠠⡘⠀⠈⠐⠀
⠑⡀⠰⣀⠈⠈⠀⢈⡁⠐⢠⡊⡂⡀⠨⠂⠌⠀⣁⣑⠒⣁⠀⠪⠠⠂⣀⠢⡆⠀⢊⠆⠄⠠⢈⠈⠈⠂⡄⠸⠠⠂⠠⠊⠀⠀⠀⠼⠘⠐
⠠⠀⢀⠀⠆⡂⠀⡐⠈⠀⠂⠑⠀⠀⠻⠤⠥⠅⠢⢂⠀⠑⢡⠀⠤⢠⠁⠀⠠⠀⢤⠑⠄⢐⢀⠈⠑⠀⠰⠰⠃⠠⠁⠀⡈⠂⡒⠀⢠⠠
⠘⠓⠀⠀⣂⠁⠠⡐⠨⠂⠰⡀⠌⠰⡢⢁⠠⠠⡈⡄⢐⡈⠡⢀⢀⡈⡂⢀⡅⠐⠄⣄⢃⠑⠓⢁⡰⢊⡠⣌⢠⠀⠀⠰⡈⠘⢀⡒⠆⠊
⢁⠐⢀⠈⠝⠠⠂⢂⠂⠀⠀⠀⢨⠲⡀⢄⠂⣀⣄⡠⡤⠀⠀⠀⢁⠁⡀⠀⠂⢐⠄⠀⠌⠀⠡⠂⠄⡐⢀⣀⡒⢒⠀⡀⢠⠌⢀⡀⢂⡅
⠀⠙⠔⢐⢀⢙⠀⠡⠤⣆⡂⠄⢈⠀⠉⠂⢲⠐⢁⠀⠅⠐⠔⠪⠀⠐⡘⠊⠃⢍⠈⠒⠀⠤⠁⠀⡀⢀⢐⡄⡔⠄⠌⠀⠀⠆⠁⡄⠀⠄
⡀⠡⢒⠢⡀⠈⠀⢠⡢⠀⡀⠂⠈⢘⠁⢐⠀⢑⠀⠠⠂⠢⡙⠐⡑⡠⢈⠀⠉⠄⡅⠀⠁⠠⠸⡀⣉⠠⠁⠴⠁⡐⠲⢸⢔⠯⠂⠄⠠⡁
⠘⠂⠑⢐⢄⠀⠠⠀⠠⠁⠌⡈⠄⠈⠍⡸⠙⢀⠊⠘⡀⠁⠐⢅⠐⢒⠀⠡⠠⢀⡀⡠⠈⢀⠀⠠⠠⠌⢈⢀⡑⠂⠁⢀⠂⠁⠀⠄⠈⠂
⠘⡀⠁⠁⠅⠀⡂⠩⢁⠁⠀⠀⡠⢠⢀⠠⠤⢀⢀⠃⡠⢑⢀⠁⠒⠒⠢⠄⠦⠁⠀⠁⠠⠂⠁⡈⡨⠀⠘⠬⠀⠄⠀⠡⢈⡀⢄⠀⠂⣀
3 Likes

Also prefer to use Flux.@functor over Flux.trainable.

Thanks for the reply, that’s odd, I try the same thing but
g[layer.weights] returns a dense matrix. (This is with Flux v0.11.3)

That’s a pretty old version of Flux. I believe only the more recent 0.12.x versions can return non-dense gradients

1 Like

Oh yes, upgrading the Flux version fixed this - thanks everyone!