I have a neural network with only one binary input node, one hidden layer and one output node. When I try to train the network, such that it outputs the value x_1 when the input is 0 and x_2 when the input is 1, it will sometimes work and sometimes not. I use the Flux library in Julia:
using Flux
using Flux.Optimise: update!
function train_v(v, initial_lr, target1, target2)
for i in 1:1000
lr = initial_lr * (1-i/1000)
update_v(v, [0], target1, lr)
update_v(v, [1], target2, lr)
end
end
function update_v(v, input, target, lr)
ps = params(v)
gs = gradient(ps) do
Flux.Losses.mae(v(input), target)
end
update!(Descent(lr), ps, gs)
end
function test(target1, target2, nodes=4, initial_lr=0.1, print = true, return_v = false)
v = Chain(Dense(1, nodes, relu), Dense(nodes, 1))
if print
println("Before training: v(0):", v([0])[1], " / v(1):", v([1])[1])
#for i in 1:3
# println(params(v)[i])
#end
end
train_v(v, initial_lr, target1, target2)
if print
println("After training: v(0):", v([0])[1], " / v(1):", v([1])[1])
#for i in 1:3
# println(params(v)[i])
#end
end
if return_v
return v
end
end
Running for example test(1,2)
will sometimes work just fine and result in
julia> test(1,2)
Before training: v(0):0.0 / v(1):-0.35872597
After training: v(0):0.99979866 / v(1):1.9998001
and sometimes not work as expected and result in outputting the same values for both inputs
julia> test(1,2)
Before training: v(0):0.0 / v(1):-0.33133784
After training: v(0):1.1958001 / v(1):1.1958001
Observations I made:
It is working properly more often, ifâŚ
- the absolute values of x_1 and x_2 are rather small.
- the difference between x_1 and x_2 is rather small.
- I use more nodes in the hidden layer.
Especially using a lot of nodes seem to make it work consistently. This one, however, I find especially counter intuitive, because I thought for non-complex functions it is better to use few nodes. Even with just one node in the hidden layer it is easy to find weights such that the function is exact. Can someone explain me this behaviour? The function I used for testing different settings was:
function get_success_probability(target1, target2, nodes=4, initial_lr=0.1, n=100, epsilon=abs(target1-target2)*0.1)
success_count = 0
for i in 1:n
println(i)
v = test(target1, target2, nodes, initial_lr, false, true)
if abs(v([0])[1]-target1)<epsilon && abs(v([1])[1]-target2)<epsilon
success_count += 1
end
end
return success_count/n
end
I use v1.6.2.