Sparse Feed Forward NN

olszewskip · January 10, 2020, 3:43pm

Hi! I have a question about (hypothetical) sparse layers in feedforward neural neworks:

How would I go about using sparse array in place of W(eights) and b(iases) in the Flux’es Dense layer? What I imagine is that I have sparse inputs and sparse layers, and that I could obtain sparse gradients (either including gradients for zero weights or not) via backpropagation in Julia. I’ve tried to do the following (Julia 1.3.1, Flux 0.10.0):

using Flux
using Flux: Chain, crossentropy, gradient
using SparseArrays
struct SprAffine{F,S,T}
    W::S
    b::T
    σ::F
end
SprAffine(in::Integer, out::Integer, σ::Function) = SprAffine(sprandn(out, in, 0.5), sprandn(out, 0.5), σ)
(m::SprAffine)(x) = m.σ.(m.W * x .+ m.b)
Flux.@functor SprAffine
loss(x, y) = crossentropy(m(x), y)
m = Chain(SprAffine(4, 3, σ), SprAffine(3, 2, σ), softmax);
 loss(sparse(1:4), sparse(1:2)) computes (and returns a Float64), but
gradient(params(m)) do
    loss(sparse(1:4), sparse(1:2))
end

gives me an error: MethodError: no method matching zero(::Type{Tuple{Float64,Zygote.var"#916#back#380"{Zygote.var"#378#379"{Float64}}}}).

Having downgraded to Flux v0.9.0, and using Tracker, I do get technically sparse gradients. They are filled with explicit zeros after start of tracking, but I can call dropzeros! by hand on all gradients, and they seem to remain sparse (i.e. without explicit zeros) after calling train!. Does Tracker really behave differently than Zygote when handling sparse weights, or am I just making some mistake?

Just to add some concreteness to what this is supposed to be for: I was wondering how hard would it be to implement in Julia something similar (if not necessarily identical) to experiments done in 1711.05136v5 or 1901.09181. The first paper actually does not compute gradients of sparse zeroes, whereas the second one does, as far as I understand.

Any explanations or directions would be appreciated!

olszewskip · January 21, 2020, 11:12pm

Just a small correction. The following works without the quoted error message:

using SparseArrays
using Zygote
using Flux: Chain, σ, crossentropy, softmax

struct SprAffine{F,S,T}
    W::S
    b::T
    σ::F
end
SprAffine(in::Integer, out::Integer, σ::Function) = SprAffine(sprandn(out, in, 0.5), sprandn(out, 0.5), σ)
(m::SprAffine)(x) = m.σ.(m.W * x .+ m.b)

loss(model, x, y) = crossentropy(Array(model(x)), y)
model = Chain(SprAffine(4, 3, σ), SprAffine(3, 2, σ), softmax);

x_sparse = sparse(1:4)
y = [1 2]
grads =
gradient(model) do m
    loss(m, x_sparse, y)
end[1]

typeof(grads[1][1].W)   # SparseMatrixCSC{Float64,Int64}

So the returned gradients can be sparse, which is awesome. But, unsurprisingly, they are the full correct gradients that one would obtain using dense arrays, only in the sparse format.

I would love to also learn, how to tell the gradient function to ignore zeros in the sparse matrix and to not compute gradients for them. My guess is that I should maybe redefine the adjoint for a sparse-matrix-times-vector multiplication.
As before, any explanations or directions would be appreciated!

Phil_Tomson · March 30, 2021, 4:01pm

Wondering if you figured out how to do this?

stecrotti · July 25, 2021, 1:27pm

This seems to do the job. Haven’t tried it though

Topic		Replies	Views
Flux.jl Restrict Gradients to Non-Zero values in sparse layer Machine Learning flux	5	664	January 5, 2022
Will Flux/Zygote compute gradients sparsely? Machine Learning	7	546	September 4, 2021
Sparse Gradient using Zygote & Optim Optimization (Mathematical)	2	779	November 11, 2020
Do not update neural network weights with a value of 0 Machine Learning question , cuda , flux	5	528	February 19, 2024
Zygote.jl: How to get the gradient of sparse matrix General Usage question , package , differentiation , zygote , reversediff	11	1339	June 13, 2023

Sparse Feed Forward NN

Related topics