I’m fairly new to ML. But my understanding is that inner layers should generally use relu
, and the final layer should use whatever function constrains the output to the range you want. In my case, sigmoid
to constrain between 0 and 1. But that sigmoid seems to be trouble.
I threw together something simple - an array with pixels set to 0.25, and a couple in the middle set to 0.75. The target to learn is the same array with values of 0 and 1, respectively.
Without any activation functions in the toy network (smaller than the complex network I actually want to train, but which I’m trying to troubleshoot), after about 1000 iterations it’s pretty close to the target.
If I add relu
to the first and second layers and leave the last as identity
, it reaches the same loss after 400 iterations.
If I then add sigmoid
to the last layer, training completely breaks down. I end up with hot pixels at each corner of the network output, and loss plateaus early.
What is happening here? And are there any tools I should be aware of that would make it easy for me to inspect what’s happening and understand it myself?
using Flux
using Plots
function train_test_network(;
niters = 100)
a = fill(0.25f0, 40, 40, 1, 1)
a[4, 4, 1, 1] = 0.75f0
a[10, 10, 1, 1] = 0.75f0
target = zeros(Float32, size(a)...)
target[4, 4, 1, 1] = 1f0
target[10, 10, 1, 1] = 1f0
network = Chain(
Conv((3, 3), 1 => 48, relu; pad = (1, 1)),
Conv((3, 3), 48 => 48, relu; pad = (1, 1)),
Conv((1, 1), 48 => 1, σ)
)
opt = Adam()
opt_state = Flux.setup(opt, network)
for i ∈ 1:niters
Flux.train!(network, ((a, target),), opt_state) do m, x, y
y1 = m(x)
loss = Flux.mse(y1, y)
return loss
end
if (i - 1) % 50 == 0
y1 = network(a)
loss = Flux.mse(y1, target)
display(heatmap(y1[:, :, 1, 1], aspectratio = 1))
@info loss
end
end
return network
end