Thanks a lot for the reply. This is interesting, showing the parameters and their gradients only seems to work with implicit parameters, as you said. It does fail when passing explicit parameters. The documentation recommends not using implicit parameters unless it is needed, and it seems to offer two alternatives passing explicit parameters. How come it doesn’t work for our case?
Concerning the size of the parameters and the gradients, funny enough when I display them like this:
for ps in ps_UA
@show ps, ∇_UA[ps]
println("size ps: ", size(ps))
println("size ∇_UA[p]: ", size(∇_UA[ps]))
println("type ps: ", typeof(ps))
println("type ∇_UA[p]: ", typeof(∇_UA[ps]))
end
All the sizes seem to match, except for the first parameter which has shape (10,1)
and its gradient (10,)
. Nonetheless, the gradient seems to work on the parameters when doing ∇_UA[p]
. There seems to be a misalignment from the parameters using Float32
and the gradients Float64
though:
(ps, ∇_UA[ps]) = (Float32[0.30798444; 0.52923024; -0.38974404; -0.44488695; -0.007705779; -0.46781763; 0.30756167; -0.73545146; -0.60995233; -0.12095741], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0])
size ps: (10, 1)
size ∇_UA[p]: (10,)
type ps: Matrix{Float32}
type ∇_UA[p]: Vector{Float64}
(ps, ∇_UA[ps]) = (Float32[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0])
size ps: (10,)
size ∇_UA[p]: (10,)
type ps: Vector{Float32}
type ∇_UA[p]: Vector{Float64}
(ps, ∇_UA[ps]) = (Float32[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0])
size ps: (10,)
size ∇_UA[p]: (10,)
type ps: Vector{Float32}
type ∇_UA[p]: Vector{Float64}
(ps, ∇_UA[ps]) = (Float32[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0])
size ps: (10,)
size ∇_UA[p]: (10,)
type ps: Vector{Float32}
type ∇_UA[p]: Vector{Float64}
(ps, ∇_UA[ps]) = (Float32[-0.49900624 0.1739867 -0.50695115 -0.012244531 0.47494888 0.3235831 0.021464685 -0.48237133 -0.586465 -0.38168398; -0.40725645 -0.26300266 -0.14521688 0.020233944 -0.07136398 -0.56981426 -0.05533645 0.16115816 -0.4485389 -0.56794554; -0.37900218 -0.08815088 0.10154217 0.558363 -0.22744176 0.12258495 0.18857977 -0.16126387 -0.45260283 -0.54091734; -0.47956002 -0.27310026 -0.43743765 0.032916818 0.095131814 -0.6059501 -0.40490097 0.43668085 -0.31058735 -0.21437271; -0.031416014 0.21674222 0.485597 -0.3657828 -0.24838457 0.52909964 0.44705272 0.16652822 0.5047817 -0.5061942], [0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0; 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0; 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0; 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0; 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0])
size ps: (5, 10)
size ∇_UA[p]: (5, 10)
type ps: Matrix{Float32}
type ∇_UA[p]: Matrix{Float64}
(ps, ∇_UA[ps]) = (Float32[0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0])
size ps: (5,)
size ∇_UA[p]: (5,)
type ps: Vector{Float32}
type ∇_UA[p]: Vector{Float64}
(ps, ∇_UA[ps]) = (Float32[0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0])
size ps: (5,)
size ∇_UA[p]: (5,)
type ps: Vector{Float32}
type ∇_UA[p]: Vector{Float64}
(ps, ∇_UA[ps]) = (Float32[1.0, 1.0, 1.0, 1.0, 1.0], [0.0, 0.0, 0.0, 0.0, 0.0])
size ps: (5,)
size ∇_UA[p]: (5,)
type ps: Vector{Float32}
type ∇_UA[p]: Vector{Float64}
(ps, ∇_UA[ps]) = (Float32[-0.65684795 0.9604726 -0.8283665 0.7575054 0.6725607], [0.0 0.0 0.0 0.0 0.0])
size ps: (1, 5)
size ∇_UA[p]: (1, 5)
type ps: Matrix{Float32}
type ∇_UA[p]: Matrix{Float64}
(ps, ∇_UA[ps]) = (Float32[0.0], [0.0])
size ps: (1,)
size ∇_UA[p]: (1,)
type ps: Vector{Float32}
type ∇_UA[p]: Vector{Float64}
Hit `@infiltrate` in hybrid_train!(loss::typeof(loss), UA::Chain{Tuple{Dense{typeof(identity), Matrix{Float32}, Vector{Float32}}, BatchNorm{var"#leakyrelu#180", Vector{Float32}, Float32, Vector{Float32}}, Dense{typeof(identity), Matrix{Float32}, Vector{Float32}}, BatchNorm{var"#leakyrelu#180", Vector{Float32}, Float32, Vector{Float32}}, Dense{var"#relu_A#181", Matrix{Float32}, Vector{Float32}}}}, opt::ADAM, H::Matrix{Float32}, p::Tuple{Int64, Int64, Float64, Float64, Matrix{Float32}, Array{Float64, 3}, Array{Float32, 3}, Vector{Any}, Float64, Int64}, t::Int64, t₁::Float64) at iceflow.jl:79:
Still, when I do Flux.update!(opt, ps_UA, ∇_UA)
I get the same error as mentioned previously:
ERROR: DimensionMismatch("cannot broadcast array to have fewer dimensions")
Stacktrace:
[1] check_broadcast_shape(#unused#::Tuple{}, Ashp::Tuple{Base.OneTo{Int64}})
@ Base.Broadcast ./broadcast.jl:518
[2] check_broadcast_shape(shp::Tuple{Base.OneTo{Int64}}, Ashp::Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}})
@ Base.Broadcast ./broadcast.jl:521
[3] check_broadcast_axes
@ ./broadcast.jl:523 [inlined]
[4] check_broadcast_axes
@ ./broadcast.jl:526 [inlined]
[5] instantiate
@ ./broadcast.jl:269 [inlined]
[6] materialize!
@ ./broadcast.jl:894 [inlined]
[7] materialize!
@ ./broadcast.jl:891 [inlined]
[8] apply!(o::ADAM, x::Matrix{Float32}, Δ::Vector{Float64})
@ Flux.Optimise ~/.julia/packages/Flux/qp1gc/src/optimise/optimisers.jl:181
[9] update!(opt::ADAM, x::Matrix{Float32}, x̄::Vector{Float64})
@ Flux.Optimise ~/.julia/packages/Flux/qp1gc/src/optimise/train.jl:23
[10] update!(opt::ADAM, xs::Params, gs::Zygote.Grads)
@ Flux.Optimise ~/.julia/packages/Flux/qp1gc/src/optimise/train.jl:29
How come I seem to be able to apply the gradients to the parameters in the loop but they don’t work with Flux.update!()
?
I have also tried applying the gradients inside the for loop doing Flux.update!(opt, ps, ∇_UA)
, but I get another error:
ERROR: MethodError: no method matching +(::Float64, ::Vector{Float64})
For element-wise addition, use broadcasting with dot syntax: scalar .+ array
Closest candidates are:
+(::Any, ::Any, ::Any, ::Any...) at operators.jl:560
+(::Union{Float16, Float32, Float64}, ::BigFloat) at mpfr.jl:392
+(::FillArrays.Zeros{T, N, Axes} where Axes, ::AbstractArray{V, N}) where {T, V, N} at /Users/Bolib001/.julia/packages/FillArrays/cVkp8/src/fillalgebra.jl:180
...
Stacktrace:
[1] _broadcast_getindex_evalf
@ ./broadcast.jl:648 [inlined]
[2] _broadcast_getindex
@ ./broadcast.jl:621 [inlined]
[3] getindex
@ ./broadcast.jl:575 [inlined]
[4] macro expansion
@ ./broadcast.jl:984 [inlined]
[5] macro expansion
@ ./simdloop.jl:77 [inlined]
[6] copyto!
@ ./broadcast.jl:983 [inlined]
[7] copyto!
@ ./broadcast.jl:936 [inlined]
[8] materialize!
@ ./broadcast.jl:894 [inlined]
[9] materialize!
@ ./broadcast.jl:891 [inlined]
[10] apply!(o::ADAM, x::Matrix{Float32}, Δ::Zygote.Grads)
@ Flux.Optimise ~/.julia/packages/Flux/qp1gc/src/optimise/optimisers.jl:179
[11] update!(opt::ADAM, x::Matrix{Float32}, x̄::Zygote.Grads)
@ Flux.Optimise ~/.julia/packages/Flux/qp1gc/src/optimise/train.jl:23
Thanks again for your help!