You are right, with the same version of Flux gives me:
julia> train_loader = Flux.Data.DataLoader((a,b),batchsize=10)
Flux.Data.DataLoader{Tuple{Array{Int64,2},Array{Int64,2}}}(([9 4 … 4 7; 7 7 … 4 7; … ; 2 4 … 9 6; 2 2 … 3 3], [2 4 … 0 2]), 10, 50000, true, 50000, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10 … 49991, 49992, 49993, 49994, 49995, 49996, 49997, 49998, 49999, 50000], false)
julia> trainmode!(model,true)
Chain(Dense(10, 1, σ))
julia> loss(x, y) = Flux.mae(model(x), y)
loss (generic function with 1 method)
julia> opt = ADAM(0.5)
ADAM(0.5, (0.9, 0.999), IdDict{Any,Any}())
julia> ps = params(model)
Params([Float32[0.53760564 0.18841723 … 0.22766067 0.6651586], Float32[0.0]])
julia> ps
Params([Float32[0.53760564 0.18841723 … 0.22766067 0.6651586], Float32[0.0]])
julia> Flux.train!(loss, ps, train_loader, opt)
julia> ps
Params([Float32[3.540263 3.1911578 … 3.2303967 3.667896], Float32[3.002677]])
julia> Flux.train!(loss, ps, train_loader, opt)
julia> ps
Params([Float32[3.540263 3.1911578 … 3.2303967 3.667896], Float32[3.002677]])
julia>
As you can see, ps is updated after the first epoch training, but it is not updated more.
Anyone have another idea?
PS: I suggest you to try to learn a real function (even a simple one) more than a random function.