I’m fairly new to ML. But my understanding is that inner layers should generally use relu, and the final layer should use whatever function constrains the output to the range you want. In my case, sigmoid to constrain between 0 and 1. But that sigmoid seems to be trouble.
I threw together something simple - an array with pixels set to 0.25, and a couple in the middle set to 0.75. The target to learn is the same array with values of 0 and 1, respectively.
Without any activation functions in the toy network (smaller than the complex network I actually want to train, but which I’m trying to troubleshoot), after about 1000 iterations it’s pretty close to the target.
If I add relu to the first and second layers and leave the last as identity, it reaches the same loss after 400 iterations.
If I then add sigmoid to the last layer, training completely breaks down. I end up with hot pixels at each corner of the network output, and loss plateaus early.
What is happening here? And are there any tools I should be aware of that would make it easy for me to inspect what’s happening and understand it myself?
using Flux
using Plots
function train_test_network(;
    niters = 100)
    a = fill(0.25f0, 40, 40, 1, 1)
    a[4, 4, 1, 1] = 0.75f0
    a[10, 10, 1, 1] = 0.75f0
    target = zeros(Float32, size(a)...)
    target[4, 4, 1, 1] = 1f0
    target[10, 10, 1, 1] = 1f0
    network = Chain(
        Conv((3, 3), 1 => 48, relu; pad = (1, 1)),
        Conv((3, 3), 48 => 48, relu; pad = (1, 1)),
        Conv((1, 1), 48 => 1, σ)
    )
    opt = Adam()
    opt_state = Flux.setup(opt, network)
    for i ∈ 1:niters
        Flux.train!(network, ((a, target),), opt_state) do m, x, y
            y1 = m(x)
            loss = Flux.mse(y1, y)
            return loss
        end
        if (i - 1) % 50 == 0
            y1 = network(a)
            loss = Flux.mse(y1, target)
            display(heatmap(y1[:, :, 1, 1], aspectratio = 1))
            @info loss
        end
    end
    return network
end
