Flux.train failing with CUDA GPU

#1

I’m testing some julia Deep Learning code and at the point of invoking Flux.train, I get the following:

ReadOnlyMemoryError()

specifically, I’m invoking the train function call like so:

@time for i in 1:100
Flux.train!(loss, params, data, opt)
end

and I get the following stack trace:

Stacktrace:
[1] gemv!(::Char, ::Float32, ::CuArray{Float32,2}, ::Array{Float32,1}, ::Float32, ::Array{Float32,1}) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/LinearAlgebra/src/blas.jl:577
[2] gemv!(::Array{Float32,1}, ::Char, ::CuArray{Float32,2}, ::Array{Float32,1}) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/LinearAlgebra/src/matmul.jl:360
[3] * at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/LinearAlgebra/src/matmul.jl:64 [inlined]
[4] _forward at /home/bwj/.julia/packages/Flux/8XpDt/src/tracker/lib/array.jl:361 [inlined]
[5] #track#1 at /home/bwj/.julia/packages/Flux/8XpDt/src/tracker/Tracker.jl:51 [inlined]
[6] track at /home/bwj/.julia/packages/Flux/8XpDt/src/tracker/Tracker.jl:51 [inlined]
[7] * at /home/bwj/.julia/packages/Flux/8XpDt/src/tracker/lib/array.jl:353 [inlined]
[8] Dense at /home/bwj/.julia/packages/Flux/8XpDt/src/layers/basic.jl:82 [inlined]
[9] Dense at /home/bwj/.julia/packages/Flux/8XpDt/src/layers/basic.jl:122 [inlined]
[10] (::Dense{typeof(σ),TrackedArray{…,CuArray{Float32,2}},TrackedArray{…,CuArray{Float32,1}}})(::Array{Float64,1}) at /home/bwj/.julia/packages/Flux/8XpDt/src/layers/basic.jl:125
[11] applychain(::Tuple{Dense{typeof(σ),TrackedArray{…,CuArray{Float32,2}},TrackedArray{…,CuArray{Float32,1}}},Dense{typeof(σ),TrackedArray{…,CuArray{Float32,2}},TrackedArray{…,CuArray{Float32,1}}}}, ::Array{Float64,1}) at /home/bwj/.julia/packages/Flux/8XpDt/src/layers/basic.jl:31
[12] Chain at /home/bwj/.julia/packages/Flux/8XpDt/src/layers/basic.jl:33 [inlined]
[13] loss(::Array{Float64,1}, ::Flux.OneHotVector) at ./In[13]:1
[14] macro expansion at /home/bwj/.julia/packages/Flux/8XpDt/src/optimise/train.jl:74 [inlined]
[15] macro expansion at /home/bwj/.julia/packages/Juno/B1s6e/src/progress.jl:133 [inlined]
[16] #train!#12(::getfield(Flux.Optimise, Symbol("##14#18")), ::Function, ::Function, ::Function, ::Base.Iterators.Zip{Tuple{Array{Array{Float64,1},1},Array{Flux.OneHotVector,1}}}, ::Function) at /home/bwj/.julia/packages/Flux/8XpDt/src/optimise/train.jl:72
[17] train!(::Function, ::Function, ::Base.Iterators.Zip{Tuple{Array{Array{Float64,1},1},Array{Flux.OneHotVector,1}}}, ::Function) at /home/bwj/.julia/packages/Flux/8XpDt/src/optimise/train.jl:70
[18] macro expansion at ./In[17]:2 [inlined]
[19] macro expansion at ./util.jl:156 [inlined]
[20] top-level scope at ./In[17]:1 [inlined]
[21] top-level scope at ./none:0


  • I am running on Ubuntu 18.04 LTS
  • nvidia-settings reports NVIDIA Driver version: 418.43
  • running the MNIST cuDNN outputs:

cudnnGetVersion() : 7500 , CUDNN_VERSION from cudnn.h : 7500 (7.5.0)
Host compiler version : GCC 7.3.0

When I run Pkg.test(“Flux”) I get the following:

Test Summary: | Pass Fail Error Total
Flux | 448 4 3 455
Throttle | 11 11
Jacobian | 1 1
Initialization | 12 12
Params | 2 2
Precision | 6 6
Stacking | 3 3
onecold | 4 4
Optimise | 11 11
Optimiser | 3 3
Training Loop | 2 2
basic | 17 17
Dropout | 8 8
BatchNorm | 13 13
losses | 30 30
Pooling | 2 2
CNN | 1 1
Depthwise Conv | 2 2
Tracker | 261 261
CuArrays | 8 8
onecold gpu | 1 1
CUDNN BatchNorm | 10 10
RNN | 33 4 2 39
R = Flux.RNN | 16 16
R = Flux.GRU | 6 4 1 11
batch_size = 1 | 2 1 3
batch_size = 5 | 4 4 8
R = Flux.LSTM | 11 1 12
batch_size = 1 | 9 9
batch_size = 5 | 2 1 3
ERROR: LoadError: Some tests did not pass: 448 passed, 4 failed, 3 errored, 0 broken.

0 Likes

#2

does this report any error?

(v1.1) pkg> build
0 Likes

#3

I assume you mean: Pkg.build(“Flux”) ?

0 Likes

#4

output from issuing the command Pkg.build(“Flux”)

julia> Pkg.build(“Flux”)
Building SpecialFunctions → ~/.julia/packages/SpecialFunctions/fvheQ/deps/build.log
Building ZipFile ─────────→ ~/.julia/packages/ZipFile/p60bh/deps/build.log
Building CodecZlib ───────→ ~/.julia/packages/CodecZlib/9jDi1/deps/build.log

I get 0 errors. Re-running Pkg.test(“Flux”) gives me errors starting with GPU tests…

0 Likes

#5

actually I meant build as that would build everything because I ran into issues with arpack doesn’t build but only affects GPU Flux.

0 Likes

#6

[SOLVED] Ok. The key step was to go into (what I can only assume is called) “command mode” by hitting the ‘]’ key (which gave me that “(v1.1) pkg” prompt and then typing “update”. This then updated my CUDA related packages and then built them for 1.1.0. Then, running “test Flux” gave me a clean and error free set of test results. Yay!

0 Likes