Hi everyone, I have a question that is related to training a Neural Network using the sciml_train
function of DiffEqFlux.jl.
Context:
I want to train a Neural Stochastic Differential Equation where the loss explicitly depends on the Neural Network inside the SDE. In a particular part of the loss, a particular weight of the Neural Network is not used at all. When running my code, I believe this gives an error. Below I present a MWE that gives a similar error.
Code:
# Importing the used packages
using Flux, DiffEqFlux, LinearAlgebra
# Defining some constants for creating the Neural Network
input_size = 2
output_size = 2
# Creating a simple neural network consisting of two linear layers
nn_initial = Chain(Dense(input_size, output_size))
# Using destructure
Θ, nn = Flux.destructure(nn_initial)
# Defining the loss function
function loss_intermediate(NN, data_x, data_y)
#diff = (NN(data_x) - data_y)
loss_1 = 0 #0.5*dot(diff, diff)
bias = NN.layers[1].bias
loss_2 = 0.5*dot(bias, bias)
return loss_1 + 0.1*loss_2 #+ 0.5*dot(NN.layers[1].W, NN.layers[1].W)
end
function loss(nn, Θ, data_x, data_y)
NN = nn(Θ)
return loss_intermediate(NN, data_x, data_y)
end
# Create a function that generates data from a linear function and adds noise
function data_generator(n)
data_x = randn(input_size,n)
data_y = data_x + 0.01*randn(input_size,n)
return data_x, data_y
end
# Getting the data
data_x, data_y = data_generator(50)
# Testing the loss function
testLossVal = loss(nn, Θ, data_x, data_y)
println("The value of the loss function at the start of training is: $testLossVal \n")
# Training everything using sciml_train
result = DiffEqFlux.sciml_train((p) -> loss(nn, p, data_x, data_y), Θ,
ADAM(0.01), maxiters = 10)
println("The final weights are: \n")
println(result.minimizer)
Error:
LoadError: DimensionMismatch("array could not be broadcast to match destination")
Stacktrace:
[1] check_broadcast_shape at .\broadcast.jl:520 [inlined]
[2] check_broadcast_axes at .\broadcast.jl:523 [inlined]
[3] check_broadcast_axes at .\broadcast.jl:527 [inlined]
[4] instantiate at .\broadcast.jl:269 [inlined]
[5] materialize! at .\broadcast.jl:848 [inlined]
[6] materialize! at .\broadcast.jl:845 [inlined]
[7] apply!(::ADAM, ::Array{Float32,1}, ::Array{Float64,1}) at C:\Users\Sven\.julia\packages\Flux\qp1gc\src\optimise\optimisers.jl:179
[8] update!(::ADAM, ::Array{Float32,1}, ::Array{Float64,1}) at C:\Users\Sven\.julia\packages\Flux\qp1gc\src\optimise\train.jl:23
[9] update!(::ADAM, ::Zygote.Params, ::Zygote.Grads) at C:\Users\Sven\.julia\packages\Flux\qp1gc\src\optimise\train.jl:29
[10] __solve(::SciMLBase.OptimizationProblem{false,SciMLBase.OptimizationFunction{false,GalacticOptim.AutoZygote,SciMLBase.OptimizationFunction{true,GalacticOptim.AutoZygote,DiffEqFlux.var"#69#70"{var"#29#30"},Nothing,Nothing,Nothing,Nothing,Nothing,Nothing},GalacticOptim.var"#148#158"{GalacticOptim.var"#147#157"{SciMLBase.OptimizationFunction{true,GalacticOptim.AutoZygote,DiffEqFlux.var"#69#70"{var"#29#30"},Nothing,Nothing,Nothing,Nothing,Nothing,Nothing},Nothing}},GalacticOptim.var"#151#161"{GalacticOptim.var"#147#157"{SciMLBase.OptimizationFunction{true,GalacticOptim.AutoZygote,DiffEqFlux.var"#69#70"{var"#29#30"},Nothing,Nothing,Nothing,Nothing,Nothing,Nothing},Nothing}},GalacticOptim.var"#156#166",Nothing,Nothing,Nothing},Array{Float32,1},SciMLBase.NullParameters,Nothing,Nothing,Nothing,Base.Iterators.Pairs{Symbol,Int64,Tuple{Symbol},NamedTuple{(:maxiters,),Tuple{Int64}}}}, ::ADAM, ::Base.Iterators.Cycle{Tuple{GalacticOptim.NullData}}; maxiters::Int64, cb::Function, progress::Bool, save_best::Bool, kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at C:\Users\Sven\.julia\packages\GalacticOptim\Ts0Bu\src\solve.jl:103
[11] #solve#468 at C:\Users\Sven\.julia\packages\SciMLBase\Z1NtH\src\solve.jl:3 [inlined]
[12] sciml_train(::var"#29#30", ::Array{Float32,1}, ::ADAM, ::GalacticOptim.AutoZygote; lower_bounds::Nothing, upper_bounds::Nothing, kwargs::Base.Iterators.Pairs{Symbol,Int64,Tuple{Symbol},NamedTuple{(:maxiters,),Tuple{Int64}}}) at C:\Users\Sven\.julia\packages\DiffEqFlux\GQl0U\src\train.jl:6
[13] top-level scope at C:\Users\Sven\Documents\Master\Master_thesis\Julia\Code\TestRepository\TestFileDiscourse_question3.jl:43
[14] include_string(::Function, ::Module, ::String, ::String) at .\loading.jl:1088
in expression starting at C:\Users\Sven\Documents\Master\Master_thesis\Julia\Code\TestRepository\TestFileDiscourse_question3.jl:43
Possible cause of the error:
The code presented earlier does not work. However, with several adjustments the code works. For instance, if one adds + 0.5*dot(NN.layers[1].W, NN.layers[1].W)
to the loss function (and do not change anything else), the code runs. Moreover, when changing loss_1
from 0
to 0.5*dot(diff, diff)
where diff = (NN(data_x) - data_y)
, the code works as well. This makes me believe that the error is related to the fact that not ALL the parameters of the NN are present in the loss function.
Question:
How to resolve the above presented issue?