My own Feedforward Neural Network library :-)

@sylvaticus Sorry, script below:


xtrain = pi*rand(10000,5);
ytrain = sin.(xtrain) + 0.5 * cos.(xtrain);

xtest = pi*rand(1,5);
ytest = sin.(xtest) + 0.5 * cos.(xtest);

# dimension of input
Nod₀ = size(xtrain,2);
# dimension of output
Nodₗ = size(ytrain,2);
# number of layers
Nₗ = 4;
# layers dimensions;
lᵨ      = Int64.(ceil.(Int64,(Nₗ*2)*rand(Nₗ).+1));
lᵨ[1]   = Nod₀;
lᵨ[end] = Nodₗ;
all_layers = Layer[];
for j = 2:Nₗ;
    # Populate layer j
    l1 = BuildLayer(relu,drelu,lᵨ[j],lᵨ[j-1]);
    # Push populated layers into one array.
    push!(all_layers,l1);
end

# Assemble the NN structure.
myfnn = BuildNetwork(all_layers,cost,dcost);

Train!(myfnn,xtrain,ytrain,epochs=20,η=0.001,rshuffle=true); 
Errors(myfnn,xtest,ytest) 
y_pred = Predict(myfnn,xtest)


gr() # We will continue onward using the GR backend
p1=scatter(xtrain,
    ytrain,
    markershape = :hexagon,
    markersize = 2,
    markeralpha = 0.6,
    markercolor = :green,
    markerstrokewidth = 2,
    markerstrokealpha = 0.2,
    markerstrokecolor = :green,
    markerstrokestyle = :dot,
    xlabel = "x data",
    ylabel = "y data",
    xlims = (0,pi),
    xticks = (0:pi/4:pi,["0", "\\pi/4","\\pi/2", "3\\pi/2","\\pi"]),
    xflip = false,
    ylims = (0,1.2),
    yticks = (0:0.2:1.2,string.(collect(0:0.2:1.2))),
    yflip = false,
    xtickfont = font(10, "Courier"),
    ytickfont = font(10, "Courier"),
    label = "",
    title = "My NN example")
scatter!(p1,xtest, ytest,
    markershape = :hexagon,
    markersize = 10,
    markeralpha = 0.6,
    markercolor = :red,
    markerstrokewidth = 2,
    markerstrokealpha = 0.2,
    markerstrokecolor = :red,
    markerstrokestyle = :dot,
    label = "",
    color=:red)
scatter!(p1,xtest, y_pred,
        markershape = :hexagon,
        markersize = 10,
        markeralpha = 0.6,
        markercolor = :blue,
        markerstrokewidth = 2,
        markerstrokealpha = 0.2,
        markerstrokecolor = :blue,
        markerstrokestyle = :dot,
        label = "",
        color=:blue)


The problem is not the new/old version, is that you are using lots of of parameters and with the simple stochastic gradient algorithm that is currently implemented you don’t obtain a good model at all (the output is always the same… some weights in the chain must go to zero). Also you have trigonometric figures that simple relu nodes have problem to deal with…

Here is an attempt at your example with somehow better results (BetaML v0.1.0):

using BetaML.Nn, Plots, Random
Random.seed!(123)
xtrain = pi*rand(10000,5)
ytrain = sin.(xtrain) + 0.5 * cos.(xtrain)
xtest = pi*rand(3,5)
ytest = sin.(xtest) + 0.5 * cos.(xtest)
all_layers = [DenseLayer(5,7,f=relu,df=drelu),
              DenseLayer(7,7,f=tanh,df=dtanh),
              DenseNoBiasLayer(7,5,f=identity,df=didentity)]         
# Assemble the NN structure.
myfnn = buildNetwork(all_layers,squaredCost,dcf=dSquaredCost);
scaleFactors = getScaleFactors(xtrain)
train!(myfnn,scale(xtrain,scaleFactors),ytrain,epochs=100,batchSize=8,optAlg=SGD(λ=0.1))
y_pred = predict(myfnn,scale(xtest,scaleFactors))

(the scaling factors are needed as the test is only three 5-dimensional elements. with many test records you can just use scale(x) in both training and prediction)

The results are ok but not superb:

julia> ytest
3×5 Array{Float64,2}:
 0.544999  0.574536  0.839166  -0.105177  -0.0254647
 1.09706   1.08789   1.0958    -0.464873   1.10077
 1.1178    0.364736  1.0987     1.05349    0.67492
julia> y_pred
3×5 Array{Float64,2}:
 0.625131  0.528122  0.845183  -0.12004   -0.0569592
 0.680878  1.00831   0.985886  -0.399866   0.988038
 0.6323    0.325575  0.91063    0.934274   0.873246

The main problem here is that there are 5 pseudo-independent dimensions (as mush as the generator is pseudo-random at least).
Now I am not sure what this model \mathbb{R}^5 \to \mathbb{R}^5 is “catching”, how to interpret it.
So, I run the same “model” but with the dimension “really” independent, i.e. using a function \mathbb{R} \to \mathbb{R}.

This was my first attempt:

xtrain = pi*rand(1000)
ytrain = sin.(xtrain) + 0.5 * cos.(xtrain)
xtest = pi*rand(200)
ytest = sin.(xtest) + 0.5 * cos.(xtest)

all_layers = [DenseLayer(1,3,f=relu,df=drelu),
              DenseLayer(3,1,f=tanh,df=dtanh)]
myfnn = buildNetwork(all_layers,squaredCost,dcf=dSquaredCost);
train!(myfnn,scale(xtrain),ytrain,epochs=100,batchSize=8)
y_pred = predict(myfnn,scale(xtest))
sortIdx = sortperm(ytest)
sortedYtest = ytest[sortIdx]
sortedYpred = y_pred[sortIdx]
plot(1:size(ytest,1),[sortedYtest  sortedYpred],label=["ytest" "y_pred"])

image

Note the ending of the chart… the model seems not to be able to match the test data when these have high values… then I realised that “of course” is such… the last layer is a tanh, can’t go above 1… so I inverted the layers and this is the result:

Much better, although one may want to investigate why the error ends to be autocorrelated after a certain point only (at around ytest=0.5).

So… as I am exploring it myself, it seems that neural networks are not the panacea, in the sense that one model fits all, still you need to know your data and somehow use your expertise to still make a “template model” out of your data, although not so much in detail as a classical statistical model.

Even better, with the new scaling/rescaling of the y (in branch master):

using BetaML.Nn, Plots, Random
Random.seed!(123)
xtrain = pi*rand(1000)
ytrain = sin.(xtrain) + 0.5 * cos.(xtrain)
xtest = pi*rand(200)
ytest = sin.(xtest) + 0.5 * cos.(xtest)
all_layers = [DenseLayer(1,3,f=tanh,df=dtanh),
              DenseLayer(3,1,f=identity,df=didentity)]
myfnn = buildNetwork(all_layers,squaredCost,dcf=dSquaredCost);
xScaleFactors = getScaleFactors(xtrain)
yScaleFactors = getScaleFactors(ytrain)
train!(myfnn,scale(xtrain),scale(ytrain),epochs=100,batchSize=8)
y_pred = scale(predict(myfnn,scale(xtest,xScaleFactors)),yScaleFactors,rev=true)
sortIdx = sortperm(ytest)
sortedYtest = ytest[sortIdx]
sortedYpred = y_pred[sortIdx]
plot(1:size(ytest,1),[sortedYtest  sortedYpred],label=["ytest" "y_pred"])

:slight_smile: :slight_smile: :slight_smile: :slight_smile:

(ps: using two-side scaling improves also the original 5-dimensional problem with an average relative error (l-1) of 8%)

Ok, @sylvaticus that looks really really nice. I now see some issues that I was having. Thanks a lot for the help. However, you mention that there was a R^5 to R^5 relation that was difficult to interpret.
That is because I was trying to simulate an experiment that takes 5 measures of some variables in 5 different hours of the day, and reproduce that experiment 10000 times. Thats why I wrote:

xtrain = pi*rand(10000,5); # time series
ytrain = sin.(xtrain) + 0.5 * cos.(xtrain); # measures

I am testing this NN for a further experiment that would require to predict a relation of the kind:
R^{10}(some data series of 10 measurements) to R^6 (a simetric 3x3 matrix), based on previous training with a big data set of the same kind!. What do you think?. Can you tell of any evident limitation regarding the use of a NN of this type over a data set of this kind?

Thanks a lot in advance!!! You really are helping me a lot!

There shouldn’t be particular problems… but if your observations have a temporal dimension, I think you would be better off interpreting them as sequences and use some recursive neural network. I didn’t yet implemented them, but they are available in both Flux and knet…

@sylvaticus In fact, there is no temporal dependencies. My real problem consiste of a data that can be arrange as a vector of (say) 10 numbers, many of them represents a coordinate, an amplitude, or any other attribute that I have jet to define, which then map to a data of 6 numbers (as I told you, a simetric 3x3 matrix). But the temporal occurrence of that input is not important. So I think a RNN will complicate things unnecessarily. However, I will take the advice and investigate them a little bit!. Thanks again!

@sylvaticus Hi there,
Runing the last of your examples is giving me this error (related to scale function and rev variable)

julia> y_pred = scale(predict(myfnn,scale(xtest,xScaleFactors)),yScaleFactors,rev=true)
ERROR: MethodError: no method matching scale(::Array{Float64,2}, ::Tuple{Array{Float64,1},Array{Float64,1}}; rev=true)}
Closest candidates are:
scale(::Any, ::Any) at /home/gbrunini/.julia/packages/BetaML/b0CH8/src/Utils.jl:147 got unsupported keyword argument “rev”
scale(::Any) at /home/gbrunini/.julia/packages/BetaML/b0CH8/src/Utils.jl:147 got unsupported keyword argument “rev”
Stacktrace:
[1] top-level scope at REPL[75]:1

What can be happening?
Thank you for your time! as always!

I wrote that u need master, I didn’t yet created a release for the package with reverse scaling, as it was only a minor point…

@sylvaticus
Oh, so that was what you meant by “master”. Sorry, my bad. I am really not well versed on github (or package development) terminology!. I am looking forward for this feature to be incorporated into the package!. Thanks and keep the awesome work!