#### Problem Description

Hi all, I’m having trouble using Flux to learn a non-linear function of two independent variables, x1 and x2. Everything is running, but I have a feeling that the parameters aren’t actually being updated and every time that I train, I’m starting off from the initialized values again.

I’ve made up a function that is kind of similar to my real data and it’s having the same issues. Notice that in the plots of the results the shape is only kind of there and the scale is way off. Also the results just seem to more or less stay the same as what is predicted after the first training iteration. I had an earlier version which actually did manage to take the shape well, but where the range of predicted values should have gone from [0,1], the predicted range was more like [0.225, 0.235] with no way to make it budge.

I’'m not sure if this has to do with the configuration of the NN itself, the activation functions, the batch sizes, updating the parameters in the train! or something else.

I’d tried to use the DataLoader but was having issues so I just tried to roll my own. Similarly, I wasn’t sure if the `@epochs`

macro was causing problems so I just made my own iteration loop.

Any suggestions are welcome even if they don’t pertain to the particular problem. Thanks in advance!

#### Code

```
using DataFrames, Plots, Flux
using Flux:@epochs
x1 = DataFrame!(x1 = [0.2, 0.5, 1, 2, 5, 10, 15, 25, 50, 100, 200, 300, 400,
500, 700, 900, 1000, 1250, 1500, 1800, 2000, 3000, 5000])
x2 = DataFrame!(x2 = range(0., 90., length=30) |> collect)
df = crossjoin(x1, x2)
# faking up some data based on a nonlinear function that should be somewhat
# similar to mine. My actual data is a little more complex but I can't share it
df[:, :y] .= 0.
for row in eachrow(df)
row.y = (row.x1)^(1/2) * (row.x2)^2
end
stats = describe(df, :min, :max,:mean, :std)
# Normalize the data
x1 = (df.x1 .- stats.min[1]) / (stats.max[1] - stats.min[1])
x2 = (df.x2 .- stats.min[2]) / (stats.max[2] - stats.min[2])
y = (df.y .- stats.min[3]) / (stats.max[3] - stats.min[3])
scatter(df.x1, df.x2, df.y, lab="True Values- unscaled")
scatter(x1, x2, y, lab="True Values- normalized")
```

#### Unscaled scatter plot

#### Normalized scatter plot

```
z = 5
m = Chain(
Dense(2, z)
, Dense(z, z, tanh)
, Dense(z, z, σ)
, Dense(z, 1)
)
ps = params(m)
opt = Descent()
loss(X, y) = Flux.Losses.mse(m(X)[1], y)
n = 1000 # how many batches I want
batches = 1:1:n # range to iterate on the batches
batch_size = 32 # number of random data in each batch
num_epochs = 250 # number of times to train on each batch
num_datum = size(df)[1] # getting total number of data
for batch in batches
# making a random index to make random minibatches
rd_idx = [] # empty list
# randomly select the batch size number of points within the range of data
for i in 1:1:batch_size
new = rand(1:num_datum)
push!(rd_idx, new)
end
# creating new X and Y minibatch based on the random index selected
x1_minibatch = x1[rd_idx,:]
x2_minibatch = x2[rd_idx,:]
y_minibatch = y[rd_idx, :]
# putting X in correct dimensions
X = transpose(hcat(x1_minibatch, x2_minibatch)) |> Array
Y = y_minibatch |> Array
data = [(X, Y)]
for epoch in 1:1:num_epochs
Flux.train!(loss, ps, data, opt)
end
end
# Plotting Results
x1_test = 0.0:0.1:1.0
x2_test = 0.0:0.1:1.0
ŷ(x1_test, x2_test) = m([x1_test, x2_test])[1]
plot(x1_test, x2_test, ŷ, st=:surface)
```

#### After one complete training interation

#### After several

#### After many many iterations

#### Environment

```
(jgpr) pkg> st
Status `C:\Users\~\jgpr\Project.toml`
[336ed68f] CSV v0.7.7
[052768ef] CUDA v1.3.3
[a93c6f00] DataFrames v0.21.7
[587475ba] Flux v0.11.1
[91a5bcdd] Plots v1.6.7
[08abe8d2] PrettyTables v0.9.1
[bd369af6] Tables v1.0.5
[37e2e46d] LinearAlgebra
```