Sparse loss with Flux

niltsz · December 9, 2024, 3:32pm

Hello, I want to train a Neural network that improves the pixels of an image, but with a sparse loss function, i.e. the true pixel information exists only sparsely, and the positions are different for each training example.
How do I write the loss function and the optimiser so that only the right weights are updated?

A minimal 1D example that I think does not do what I want it to do:

using Flux, MLUtils

#create some random data, including a mask that tells where the target is known
X = rand(Float32, 64, 1000);
Y = rand(Float32, 64, 1000);
#the mask defines the positions in which the label exists. 
#Only for these pixels the corresponding gradients should be updated
MASK = rand(Bool, 64, 1000);

train_loader = DataLoader((X, Y, MASK), batchsize = 100)

#setup some very basic network
model = Chain(Dense(64, 128), Dense(128, 64))
opt_state = Flux.setup(Flux.Adam(), model)


for (x, y, mask) in train_loader
   loss, grads = Flux.withgradient(model) do m
       y_hat = m(x)
       Flux.Losses.mse(y_hat[mask], y[mask])
    end
    #I assume that the gradients are computed for the entire network
    #and not just for the weights that affect mask
    Flux.update!(opt_state, model, grads[1])
end

mcabbott · December 9, 2024, 4:02pm

Is your complaint about the present code is that it masks the output not the input? Then you might just want y_hat = m(x .* MASK).

Or can you clarify what “does not do what I want it to do” means? I believe what’s written will only update some entries of model[2].weight, because the loss does not depend on all of them.

niltsz · December 9, 2024, 4:46pm

Thank you for your reply. I want to mask the output, not the input. I have the full 2D image that I want to postprocess, but only for a few pixels I actually have the ground truth.
If my code actually updates only a few of the gradients because the loss depends on only a few of them, my problem is actually solved… I thought that in my code Zygote would not see the mask as it was hidden away, but then it is smarter than I had thought. I will test this more and then mark your answer as solution

mcabbott · December 9, 2024, 6:24pm

I guess it does update everything, as you have a different mask per image in the batch. If you use one mask for all, then you can see that only some rows of the weight matrix are updated. The batch just does many such updates at the same time.

using Flux

X = rand(Float32, 6, 10);  # batch of 10 "images" in 1D
Y = rand(Float32, 6, 10);

MASK = [true, false, true, false, true, false]  # same mask for all images
train_loader = Flux.DataLoader((X, Y), batchsize = 1)

model = Chain(Dense(6, 12; init=ones32), Dense(12, 6; init=ones32))
opt_state = Flux.setup(Flux.Adam(), model)

for (x, y) in train_loader
   _, grads = Flux.withgradient(model) do m
       y_hat = m(x)
       Flux.Losses.mse(y_hat[MASK], y[MASK])
    end
    Flux.update!(opt_state, model, grads[1])
end

model[2].weight

#=

julia> model[2].weight  # only some rows changed
6×12 Matrix{Float32}:
 0.990377  0.990377  0.990377  0.990377  0.990377  …  0.990377  0.990377  0.990377  0.990377
 1.0       1.0       1.0       1.0       1.0          1.0       1.0       1.0       1.0
 0.990374  0.990374  0.990374  0.990374  0.990374     0.990374  0.990374  0.990374  0.990374
 1.0       1.0       1.0       1.0       1.0          1.0       1.0       1.0       1.0
 0.990375  0.990375  0.990375  0.990375  0.990375     0.990375  0.990375  0.990375  0.990375
 1.0       1.0       1.0       1.0       1.0       …  1.0       1.0       1.0       1.0
 
julia> model[1].weight  # all changed
12×6 Matrix{Float32}:
 0.990828  0.99101  0.9915  0.991691  0.99049  0.990847
 0.990828  0.99101  0.9915  0.991691  0.99049  0.990847
 0.990828  0.99101  0.9915  0.991691  0.99049  0.990847
 0.990828  0.99101  0.9915  0.991691  0.99049  0.990847
 0.990828  0.99101  0.9915  0.991691  0.99049  0.990847
 0.990828  0.99101  0.9915  0.991691  0.99049  0.990847
 0.990828  0.99101  0.9915  0.991691  0.99049  0.990847
 0.990828  0.99101  0.9915  0.991691  0.99049  0.990847
 0.990828  0.99101  0.9915  0.991691  0.99049  0.990847
 0.990828  0.99101  0.9915  0.991691  0.99049  0.990847
 0.990828  0.99101  0.9915  0.991691  0.99049  0.990847
 0.990828  0.99101  0.9915  0.991691  0.99049  0.990847

=#

Topic		Replies	Views
How can I differentiate a subset of the outputs of a neural network in Flux or Lux? General Usage question , flux , ml , lux	10	473	August 31, 2023
Apply boolean mask in loss function Machine Learning question , gpu , flux	1	942	July 23, 2020
Skip layer with masked weights in Flux General Usage	0	306	April 29, 2020
Flux.jl Restrict Gradients to Non-Zero values in sparse layer Machine Learning flux	5	661	January 5, 2022
Sparse Feed Forward NN Machine Learning flux , arrays , zygote	3	1320	July 25, 2021

Sparse loss with Flux

Related topics