Hi all, I’m having trouble using Flux to learn a non-linear function of two independent variables, x1 and x2. Everything is running, but I have a feeling that the parameters aren’t actually being updated and every time that I train, I’m starting off from the initialized values again.
I’ve made up a function that is kind of similar to my real data and it’s having the same issues. Notice that in the plots of the results the shape is only kind of there and the scale is way off. Also the results just seem to more or less stay the same as what is predicted after the first training iteration. I had an earlier version which actually did manage to take the shape well, but where the range of predicted values should have gone from [0,1], the predicted range was more like [0.225, 0.235] with no way to make it budge.
I’'m not sure if this has to do with the configuration of the NN itself, the activation functions, the batch sizes, updating the parameters in the train! or something else.
I’d tried to use the DataLoader but was having issues so I just tried to roll my own. Similarly, I wasn’t sure if the
@epochs macro was causing problems so I just made my own iteration loop.
Any suggestions are welcome even if they don’t pertain to the particular problem. Thanks in advance!
using DataFrames, Plots, Flux using Flux:@epochs x1 = DataFrame!(x1 = [0.2, 0.5, 1, 2, 5, 10, 15, 25, 50, 100, 200, 300, 400, 500, 700, 900, 1000, 1250, 1500, 1800, 2000, 3000, 5000]) x2 = DataFrame!(x2 = range(0., 90., length=30) |> collect) df = crossjoin(x1, x2) # faking up some data based on a nonlinear function that should be somewhat # similar to mine. My actual data is a little more complex but I can't share it df[:, :y] .= 0. for row in eachrow(df) row.y = (row.x1)^(1/2) * (row.x2)^2 end stats = describe(df, :min, :max,:mean, :std) # Normalize the data x1 = (df.x1 .- stats.min) / (stats.max - stats.min) x2 = (df.x2 .- stats.min) / (stats.max - stats.min) y = (df.y .- stats.min) / (stats.max - stats.min) scatter(df.x1, df.x2, df.y, lab="True Values- unscaled") scatter(x1, x2, y, lab="True Values- normalized")
Unscaled scatter plot
Normalized scatter plot
z = 5 m = Chain( Dense(2, z) , Dense(z, z, tanh) , Dense(z, z, σ) , Dense(z, 1) ) ps = params(m) opt = Descent() loss(X, y) = Flux.Losses.mse(m(X), y) n = 1000 # how many batches I want batches = 1:1:n # range to iterate on the batches batch_size = 32 # number of random data in each batch num_epochs = 250 # number of times to train on each batch num_datum = size(df) # getting total number of data for batch in batches # making a random index to make random minibatches rd_idx =  # empty list # randomly select the batch size number of points within the range of data for i in 1:1:batch_size new = rand(1:num_datum) push!(rd_idx, new) end # creating new X and Y minibatch based on the random index selected x1_minibatch = x1[rd_idx,:] x2_minibatch = x2[rd_idx,:] y_minibatch = y[rd_idx, :] # putting X in correct dimensions X = transpose(hcat(x1_minibatch, x2_minibatch)) |> Array Y = y_minibatch |> Array data = [(X, Y)] for epoch in 1:1:num_epochs Flux.train!(loss, ps, data, opt) end end # Plotting Results x1_test = 0.0:0.1:1.0 x2_test = 0.0:0.1:1.0 ŷ(x1_test, x2_test) = m([x1_test, x2_test]) plot(x1_test, x2_test, ŷ, st=:surface)
After one complete training interation
After many many iterations
(jgpr) pkg> st Status `C:\Users\~\jgpr\Project.toml` [336ed68f] CSV v0.7.7 [052768ef] CUDA v1.3.3 [a93c6f00] DataFrames v0.21.7 [587475ba] Flux v0.11.1 [91a5bcdd] Plots v1.6.7 [08abe8d2] PrettyTables v0.9.1 [bd369af6] Tables v1.0.5 [37e2e46d] LinearAlgebra