Hello everyone,
I am fairly new to Julia and even newer to this forum. I hope, this is the right subforum for my problem. Currently, I am working on a project that involves a Convolutional VAE which I implemented using Flux.
So far, everything ran smoothly until I upgraded from Julia-1.2.0 (Flux-0.9.0) to 1.4.1 (Flux-10.4).
I tried to extract a minimal example for the error from my current model, which you can see below (its still quiet a lot of code though, so please excuse):
using Flux
using Flux: @epochs, binarycrossentropy
using Distributions
# dummy data
d = Array{Float64}(zeros((796, 512, 1, 10))) .+ 1
batches = [reshape(d[:,:,:, i:i+4], (796, 512, 1, 5)) for i in 1:5]
# convolutional encoder
conv1 = Conv((14, 10), 1 => 4, relu, stride = (10, 10), pad = 4)
pool1 = MaxPool((8, 8), stride = (4, 4), pad = 2)
conv2 = Conv((4, 3), 4 => 4, stride = (2, 2), pad = 1)
enc1(X) = reshape(conv2(pool1(conv1(X))), (280, :))
# mean and log-variance of vae1's z-variable/latent space
μ1 = Dense(280, 4)
logσ1 = Dense(280, 4)
# sample from z-distribution
z(μ, logσ) = μ + exp(logσ) * randn(Float64)
z(μ, logσ, eps) = μ + exp(logσ) * eps
# decoder, I am using the one with transposed convolutions
dense_decoder = false # change accordingly
if dense_decoder
dec = Dense(4, 796*512, sigmoid)
dec1(X) = reshape(dec(X), (796, 512, 1, :))
else
interaction1 = Dense(4, 280) # specific to my setup
int1(X) = reshape(interaction1(X), (10, 7, 4, :))
tc1 = ConvTranspose((4, 3), 4 => 4, relu, stride = (2, 2), pad = 1)
tc2 = ConvTranspose((8, 8), 4 => 4, relu, stride = (4, 4), pad = 2)
tc3 = ConvTranspose((14, 10), 4 => 1, sigmoid, stride = (10, 10), pad = 4)
dec = Chain(interaction1, tc1, tc2, tc3) # for params
dec1 = Chain(int1, tc1, tc2, tc3)
end
# log(p(x|z)), log(p(z)), log(q(z|x))
logp_x_z1(X, z) = -sum(binarycrossentropy.(dec1(z), X))
logp_z(z) = sum(Float64.(logpdf.(Normal(0, 1), z)))
log_q_z_x(ϵ, log_sigma) = Float64(logpdf(Normal(0, 1), ϵ) - log_sigma)
# vae loss estimator
function L1(X)
output_enc = enc1(X)
mu, l = μ1(output_enc), logσ1(output_enc)
e = randn(Float64, size(l)) # latentdim1
z_ = z.(mu, l, e)
return -(logp_x_z1(X, z_) + logp_z(z_) - sum(log_q_z_x.(e, l))) * 1//5
end
# train vae1
ps1 = Flux.params(enc1, μ1, logσ1, dec)
@epochs 3 Flux.train!(L1, ps1, zip(batches), ADAM())
In Julia-1.2.0 this runs perfectly, upgrading to 1.4.1 the code yields the following error:
UndefRefError: access to undefined reference
in top-level scope at Juno/f8hj2/src/progress.jl:119
in macro expansion at Flux/Fj3bt/src/optimise/train.jl:122
in train! at Flux/Fj3bt/src/optimise/train.jl:79
in #train!#12 at Flux/Fj3bt/src/optimise/train.jl:81
in macro expansion at Juno/f8hj2/src/progress.jl:119
in macro expansion at Flux/Fj3bt/src/optimise/train.jl:88
in gradient at Zygote/YeCEW/src/compiler/interface.jl:55
in at Zygote/YeCEW/src/compiler/interface.jl:179
in at Zygote/YeCEW/src/compiler/interface2.jl
in #15 at Flux/Fj3bt/src/optimise/train.jl:89
in #347#back at ZygoteRules/6nssF/src/adjoint.jl:49
in #174 at Zygote/YeCEW/src/lib/lib.jl:182
in at Zygote/YeCEW/src/compiler/interface2.jl
in L1 at hello.jl:48
in at Zygote/YeCEW/src/compiler/interface2.jl
in logp_x_z1 at hello.jl:38
in at Zygote/YeCEW/src/compiler/interface2.jl
in Chain at Flux/Fj3bt/src/layers/basic.jl:38
in at Zygote/YeCEW/src/compiler/interface2.jl
in applychain at Flux/Fj3bt/src/layers/basic.jl:36
in at Zygote/YeCEW/src/compiler/interface2.jl
in applychain at Flux/Fj3bt/src/layers/basic.jl:36
in at Zygote/YeCEW/src/compiler/interface2.jl
in applychain at Flux/Fj3bt/src/layers/basic.jl:36
in at Zygote/YeCEW/src/compiler/interface2.jl
in ConvTranspose at Flux/Fj3bt/src/layers/conv.jl:148
in at ZygoteRules/6nssF/src/adjoint.jl:49
in #1837 at Zygote/YeCEW/src/lib/nnlib.jl:41
in conv at NNlib/FAI3o/src/conv.jl:114
in #conv#89 at NNlib/FAI3o/src/conv.jl:116
in conv! at NNlib/FAI3o/src/conv.jl:70
in #conv!#48 at NNlib/FAI3o/src/conv.jl:70
in conv! at NNlib/FAI3o/src/conv.jl:97
in #conv!#83 at NNlib/FAI3o/src/conv.jl:99
in conv_direct! at NNlib/FAI3o/src/impl/conv_direct.jl:51
in #conv_direct!#149 at NNlib/FAI3o/src/impl/conv_direct.jl:98
in getindex at base/array.jl:789
So far, I’ve narrowed the problem down to the decoder-part of the network. In the code, I have included a bit that lets you switch the convolutional decoder to a simple dense layer. With that, the training works as expected. Additionally, I have tried simpler networks with only one layer of transposed convolution in the decoder and lower-dimensional data which also ran smoothly. Being a noob in Julia, I failed miserably trying to find what causes the error in the package code.
Having spent quiet a few days, I can’t figure out what the problem might possibly be. Any help or feedback would be highly appreciated!
Best,
Flo