Zygote: CuArray only supports bits types

Hi all, I encountered the

ERROR: CuArray only supports bits types

while doing gradient calculation.

The calculation is a simple RNN that would pick the output sequence element based on sequence length in a batched training setting.

I run the code on CPU and it worked out smoothly. It only has issues when the code and data are converted into GPU.

Here is the code to reproduce the error using an Nvidia GTX 1050 GPU.

#================ GPU select sequence length ===========================#
using Flux
using Flux.Optimise: update!, Momentum
using Plots
using Zygote

function select_last_entry_based_on_length(rnn_output, sequence_length_array)
    # based on the sequence length, select the relevant entry. 
    output_array = Zygote.Buffer([0.0], 1, size(sequence_length_array)[1])
    for i in 1:size(sequence_length_array)[1]
        _len = sequence_length_array[i]
        output_array[i] = rnn_output[_len][i]
    end
    return copy(output_array)
end

rnn_model = Chain(GRU(3, 8), Dense(8, 1, relu, initW=Flux.glorot_normal)) |> gpu
rnn_params = params(rnn_model)

function eval_model(x)
    out = rnn_model.(x)
    Flux.reset!(rnn_model[1])
    return out
end

function model_output(input, sequence_length_array)
    output = eval_model(input)
    selected_output = select_last_entry_based_on_length(output, sequence_length_array) |> gpu
    return selected_output
end

# The input sequence would have different length. 
batch_size = 12
sequence_length = 20
input = [rand(3,batch_size) for i in 1:sequence_length] |> gpu
target = ones(1,12) |> gpu
sequence_length_array = [rand(1:sequence_length) for i in 1:batch_size] |> gpu

loss(x, sequence_length_array,y) = Flux.Losses.mae(model_output(x, sequence_length_array), y)

initial_loss = loss(input, sequence_length_array, target)
println("initial_loss: $initial_loss")

opt = Momentum(0.005, 0.8)

loss_trajectory = []

for i in 1:300

grads = gradient(() -> loss(input, sequence_length_array, target), rnn_params)

for (i,p) in enumerate(rnn_params)
    if !isnothing(grads[p])
        Flux.Optimise.update!(opt, p, grads[p]) # the nothing gradient is fixed by reset! 
    end
end

loss_after = loss(input, sequence_length_array, target)
println("Iteration: $i loss_after: $loss_after")
push!(loss_trajectory, [i, loss_after])
end

plot([i[1] for i in loss_trajectory], [i[2] for i in loss_trajectory])

CUDA Version Info

CUDA driver 11.0.0

Libraries: 
- CUBLAS: 11.2.0
- CURAND: 10.2.1
- CUFFT: 10.2.1
- CUSOLVER: 10.6.0
- CUSPARSE: 11.1.1
- CUPTI: 13.0.0
- NVML: missing
- CUDNN: 8.0.2 (for CUDA 11.0.0)   
- CUTENSOR: 1.2.0 (for CUDA 11.0.0)

Toolchain:
- Julia: 1.5.0
- LLVM: 9.0.1
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4
- Device support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75

1 device:
  0: GeForce GTX 1050 (sm_61, 3.146 GiB / 4.000 GiB available)

I can’t reproduce that error with Julia 1.5.2. As written, I get a different error that I don’t understand. After training for 300 iterations and printing a decreasing loss each time, the following error is printed:

ERROR: MethodError: no method matching _pullback(::Zygote.Context, ::typeof(Base.Broadcast.combine_styles), ::Array{Array{Float64,2},1})
The applicable method may be too new: running in world age 27827, while current world is 28153.
Closest candidates are:
  _pullback(::ZygoteRules.AContext, ::Any, ::Any, ::typeof(sum), ::Any, ::AbstractArray) at /home/russel/.julia/packages/Zygote/c0awc/src/lib/array.jl:240 (method too new to be called from this world context.)
  _pullback(::ZygoteRules.AContext, ::Any, ::Any...) at /home/russel/.julia/packages/Zygote/c0awc/src/compiler/interface2.jl:12 (method too new to be called from this world context.)
  _pullback(::Any, ::Any...) at /home/russel/.julia/packages/Zygote/c0awc/src/compiler/interface.jl:38 (method too new to be called from this world context.)
  ...
adjoint(::Zygote.Context, ::typeof(Core._apply_iterate), ::typeof(iterate), ::typeof(Base.Broadcast.combine_styles), ::Tuple{Tuple{Array{Array{Float64,2},1}},Tuple{}}) at /home/russel/.julia/packages/Zygote/c0awc/src/lib/lib.jl:188
_pullback(::Zygote.Context, ::typeof(Core._apply_iterate), ::typeof(iterate), ::typeof(Base.Broadcast.combine_styles), ::Tuple{Tuple{Array{Array{Float64,2},1}},Tuple{}}) at /home/russel/.julia/packages/ZygoteRules/6nssF/src/adjoint.jl:47
_pullback(::Zygote.Context, ::typeof(Base.Broadcast.broadcasted), ::Tuple{Chain{Tuple{Flux.Recur{Flux.GRUCell{Array{Float32,2},Array{Float32,1}}},Dense{typeof(relu),Array{Float32,2},Array{Float32,1}}}},Array{Array{Float64,2},1}}) at broadcast.jl:1257
_pullback(::Zygote.Context, ::typeof(eval_model), ::Tuple{Array{Array{Float64,2},1}}) at /tmp/rnn.jl:23
_pullback(::Zygote.Context, ::typeof(model_output), ::Tuple{Array{Array{Float64,2},1},Array{Int64,1}}) at /tmp/rnn.jl:29
_pullback(::Zygote.Context, ::typeof(loss), ::Tuple{Array{Array{Float64,2},1},Array{Int64,1},Array{Float64,2}}) at /tmp/rnn.jl:41
_pullback(::Zygote.Context, ::var"#21#22", ::Tuple{}) at /tmp/rnn.jl:52
pullback(::var"#21#22", ::Params) at /home/russel/.julia/packages/Zygote/c0awc/src/compiler/interface.jl:172
gradient(::var"#21#22", ::Tuple{Params}) at /home/russel/.julia/packages/Zygote/c0awc/src/compiler/interface.jl:53
top-level scope at /tmp/rnn.jl:52
in expression starting at /tmp/rnn.jl:52

I haven’t seen an error like that before, so I attempted to debug that by moving the training loop into a function and getting rid of all global variables, this resulted in a successful training and a plot.

using Pkg                                                                        
pkg"activate --temp"                                                             
pkg"add Flux Plots Zygote"                                                       
using Flux                                                                       
using Flux.Optimise: update!, Momentum                                           
using Plots                                                                      
using Zygote                                                                     
                                                                                 
function select_last_entry_based_on_length(rnn_output, sequence_length_array)    
    # based on the sequence length, select the relevant entry.                   
    output_array = Zygote.Buffer([0.0], 1, size(sequence_length_array)[1])       
    for i = 1:size(sequence_length_array)[1]                                     
        _len = sequence_length_array[i]                                          
        output_array[i] = rnn_output[_len][i]                                    
    end                                                                          
    return copy(output_array)                                                    
end                                                                              
                                                                                 
function eval_model(x)                                                           
    out = rnn_model.(x)                                                          
    Flux.reset!(rnn_model[1])                                                    
    return out                                                                   
end                                                                              
                                                                                 
function model_output(input, sequence_length_array)                              
    output = eval_model(input)                                                   
    selected_output =                                                            
        select_last_entry_based_on_length(output, sequence_length_array) |> gpu  
    return selected_output                                                       
end                                                                              
                                                                                 
function train(rnn_model, batch_size, sequence_length)                           
    input = [rand(3, batch_size) for i = 1:sequence_length] |> gpu               
    target = ones(1, 12) |> gpu                                                  
    sequence_length_array = [rand(1:sequence_length) for i = 1:batch_size] |> gpu
                                                                                 
    loss(x, sequence_length_array, y) =                                          
        Flux.Losses.mae(model_output(x, sequence_length_array), y)               
                                                                                 
    initial_loss = loss(input, sequence_length_array, target)                    
    println("initial_loss: $initial_loss")                                       
                                                                                 
    opt = Momentum(0.005, 0.8)                                                   
                                                                                 
    loss_trajectory = []                                                         
                                                                                 
    rnn_params = params(rnn_model)                                               
    for i = 1:300                                                                
        grads = gradient(() -> loss(input, sequence_length_array, target), rnn_params)
                                                                                 
        for (i, p) in enumerate(rnn_params)                                      
            if !isnothing(grads[p])                                              
                Flux.Optimise.update!(opt, p, grads[p]) # the nothing gradient is fixed by reset!
            end                                                                  
        end                                                                      
                                                                                 
        loss_after = loss(input, sequence_length_array, target)                  
        println("Iteration: $i loss_after: $loss_after")                         
        push!(loss_trajectory, [i, loss_after])                                  
    end                                                                          
    return loss_trajectory                                                       
end                                                                              
                                                                                 
rnn_model = Chain(GRU(3, 8), Dense(8, 1, relu, initW = Flux.glorot_normal)) |> gpu
                                                                                 
# The input sequence would have different length.                                
# batch_size = 12                                                                
# sequence_length = 20                                                           
loss_trajectory = train(rnn_model, 12, 20)                                       
                                                                                 
plot([i[1] for i in loss_trajectory], [i[2] for i in loss_trajectory])

Creating a new temporary environment and installing just the required packages resulted in these versions:

(jl_QI87C4) pkg> st
Status `/tmp/jl_QI87C4/Project.toml`
  [587475ba] Flux v0.11.1
  [91a5bcdd] Plots v1.7.3
  [e88e6eb3] Zygote v0.5.9

I’m not sure what to say about the world age errors, other than Julia programs work much better with more functions and fewer global variables, but it seems to me you likely just have some outdated packages, I would try ]up before upgrading to 1.5.2.

2 Likes

Appreciate the time and effort, along with the beautifully refactored code. Let me check if upgrading would work.

1 Like

I upgraded packages and upgraded Julia to 1.5.2.
The old error still sometimes occur, and there is a new error pops up.

ERROR: LoadError: CUDNNError: CUDNN_STATUS_NOT_SUPPORTED (code 9)
Stacktrace:
 [1] throw_api_error(::CUDA.CUDNN.cudnnStatus_t) at C:\Users\jack\.julia\packages\CUDA\dZvbp\lib\cudnn\error.jl:19
 [2] macro expansion at C:\Users\jack\.julia\packages\CUDA\dZvbp\lib\cudnn\error.jl:30 [inlined]
 [3] cudnnGetRNNWorkspaceSize(::Ptr{Nothing}, ::CUDA.CUDNN.RNNDesc{Float32}, ::Int64, ::Array{CUDA.CUDNN.TensorDesc,1}, ::Base.RefValue{UInt64}) at C:\Users\jack\.julia\packages\CUDA\dZvbp\lib\utils\call.jl:93
 [4] macro expansion at C:\Users\jack\.julia\packages\CUDA\dZvbp\lib\utils\call.jl:121 [inlined]
 [5] cudnnRNNBackwardData(::CUDA.CUDNN.RNNDesc{Float32}, ::Int64, ::Array{CUDA.CUDNN.TensorDesc,1}, ::CUDA.CuArray{Float32,2}, ::Array{CUDA.CUDNN.TensorDesc,1}, ::CUDA.CuArray{Float64,2}, ::Ptr{Nothing}, ::CUDA.CuPtr{Nothing}, ::Ptr{Nothing}, ::CUDA.CuPtr{Nothing}, ::CUDA.CUDNN.FilterDesc, ::CUDA.CuArray{Float32,1}, ::CUDA.CUDNN.TensorDesc, ::CUDA.CuArray{Float32,2}, ::Ptr{Nothing}, ::CUDA.CuPtr{Nothing}, ::Array{CUDA.CUDNN.TensorDesc,1}, ::CUDA.CuArray{Float64,2}, ::CUDA.CUDNN.TensorDesc, ::CUDA.CuArray{Float32,2}, ::Ptr{Nothing}, ::CUDA.CuPtr{Nothing}, ::CUDA.CuArray{UInt8,1}) at C:\Users\jack\.julia\packages\CUDA\dZvbp\lib\cudnn\rnn.jl:139
 [6] backwardData at C:\Users\jack\.julia\packages\CUDA\dZvbp\lib\cudnn\rnn.jl:157 [inlined]
 [7] backwardData at C:\Users\jack\.julia\packages\CUDA\dZvbp\lib\cudnn\rnn.jl:163 [inlined]
 [8] (::CUDA.CUDNN.var"#490#491"{CUDA.CUDNN.RNNDesc{Float32},CUDA.CuArray{Float32,2},CUDA.CuArray{Float32,1},CUDA.CuArray{UInt8,1},CUDA.CuArray{Float32,2}})(::CUDA.CuArray{Float64,2}, ::Nothing) at C:\Users\jack\.julia\packages\CUDA\dZvbp\lib\cudnn\rnn.jl:189
 [9] #9 at C:\Users\jack\.julia\packages\Flux\05b38\src\cuda\curnn.jl:73 [inlined]
 [10] #407#back at C:\Users\jack\.julia\packages\ZygoteRules\6nssF\src\adjoint.jl:49 [inlined]
 [11] #150 at C:\Users\jack\.julia\packages\Zygote\c0awc\src\lib\lib.jl:191 [inlined]
 [12] (::Zygote.var"#1693#back#152"{Zygote.var"#150#151"{Flux.CUDAint.var"#407#back#11"{Flux.CUDAint.var"#9#10"{Zygote.Context,Flux.GRUCell{CUDA.CuArray{Float32,2},CUDA.CuArray{Float32,1}},CUDA.CuArray{Float32,1},CUDA.CUDNN.var"#490#491"{CUDA.CUDNN.RNNDesc{Float32},CUDA.CuArray{Float32,2},CUDA.CuArray{Float32,1},CUDA.CuArray{UInt8,1},CUDA.CuArray{Float32,2}}}},Tuple{Tuple{Nothing},Tuple{Nothing}}}})(::Tuple{Nothing,CUDA.CuArray{Float64,2}}) at C:\Users\jack\.julia\packages\ZygoteRules\6nssF\src\adjoint.jl:49
 [13] Recur at C:\Users\jack\.julia\packages\Flux\05b38\src\layers\recurrent.jl:36 [inlined]
 [14] (::typeof(∂(λ)))(::CUDA.CuArray{Float64,2}) at C:\Users\jack\.julia\packages\Zygote\c0awc\src\compiler\interface2.jl:0
 [15] applychain at C:\Users\jack\.julia\packages\Flux\05b38\src\layers\basic.jl:36 [inlined]
 [16] (::typeof(∂(applychain)))(::CUDA.CuArray{Float64,2}) at C:\Users\jack\.julia\packages\Zygote\c0awc\src\compiler\interface2.jl:0
 [17] Chain at C:\Users\jack\.julia\packages\Flux\05b38\src\layers\basic.jl:38 
[inlined]
 [18] (::typeof(∂(λ)))(::CUDA.CuArray{Float64,2}) at C:\Users\jack\.julia\packages\Zygote\c0awc\src\compiler\interface2.jl:0
 [19] #1071 at C:\Users\jack\.julia\packages\Zygote\c0awc\src\lib\broadcast.jl:140 [inlined]
 [20] (::Base.var"#3#4"{Zygote.var"#1071#1078"})(::Tuple{typeof(∂(λ)),CUDA.CuArray{Float64,2}}) at .\generator.jl:36
 [21] iterate at .\generator.jl:47 [inlined]
 [22] collect(::Base.Generator{Base.Iterators.Zip{Tuple{Array{typeof(∂(λ)),1},Array{Union{Nothing, CUDA.CuArray{Float64,2}},1}}},Base.var"#3#4"{Zygote.var"#1071#1078"}}) at .\array.jl:686
 [23] map at .\abstractarray.jl:2248 [inlined]
 [24] (::Zygote.var"#1070#1077"{Tuple{Array{CUDA.CuArray{Float32,2},1}},Val{2},Array{typeof(∂(λ)),1}})(::Array{Union{Nothing, CUDA.CuArray{Float64,2}},1}) at C:\Users\jack\.julia\packages\Zygote\c0awc\src\lib\broadcast.jl:140
 [25] #3862#back at C:\Users\jack\.julia\packages\ZygoteRules\6nssF\src\adjoint.jl:49 [inlined]
 [26] (::Zygote.var"#150#151"{Zygote.var"#3862#back#1081"{Zygote.var"#1070#1077"{Tuple{Array{CUDA.CuArray{Float32,2},1}},Val{2},Array{typeof(∂(λ)),1}}},Tuple{Tuple{Nothing,Nothing,Nothing},Tuple{}}})(::Array{Union{Nothing, CUDA.CuArray{Float64,2}},1}) at C:\Users\jack\.julia\packages\Zygote\c0awc\src\lib\lib.jl:191   
 [27] #1693#back at C:\Users\jack\.julia\packages\ZygoteRules\6nssF\src\adjoint.jl:49 [inlined]
 [28] broadcasted at .\broadcast.jl:1257 [inlined]
 [29] eval_model at c:\Users\jack\TerminateGoogleDrive\TretarData\forex\julia_temp\rnn_discourse.jl:20 [inlined]
 [30] (::typeof(∂(eval_model)))(::Array{Union{Nothing, CUDA.CuArray{Float64,2}},1}) at C:\Users\jack\.julia\packages\Zygote\c0awc\src\compiler\interface2.jl:0 
 [31] model_output at c:\Users\jack\TerminateGoogleDrive\TretarData\forex\julia_temp\rnn_discourse.jl:26 [inlined]
 [32] (::typeof(∂(model_output)))(::CUDA.CuArray{Float32,2}) at C:\Users\jack\.julia\packages\Zygote\c0awc\src\compiler\interface2.jl:0
 [33] loss at c:\Users\jack\TerminateGoogleDrive\TretarData\forex\julia_temp\rnn_discourse.jl:37 [inlined]
 [34] (::typeof(∂(loss)))(::Float32) at C:\Users\jack\.julia\packages\Zygote\c0awc\src\compiler\interface2.jl:0
 [35] #14 at c:\Users\jack\TerminateGoogleDrive\TretarData\forex\julia_temp\rnn_discourse.jl:49 [inlined]
 [36] (::typeof(∂(λ)))(::Float32) at C:\Users\jack\.julia\packages\Zygote\c0awc\src\compiler\interface2.jl:0
 [37] (::Zygote.var"#54#55"{Params,Zygote.Context,typeof(∂(λ))})(::Float32) at C:\Users\jack\.julia\packages\Zygote\c0awc\src\compiler\interface.jl:177        
 [38] gradient(::Function, ::Params) at C:\Users\jack\.julia\packages\Zygote\c0awc\src\compiler\interface.jl:54
 [39] train(::Chain{Tuple{Flux.Recur{Flux.GRUCell{CUDA.CuArray{Float32,2},CUDA.CuArray{Float32,1}}},Dense{typeof(relu),CUDA.CuArray{Float32,2},CUDA.CuArray{Float32,1}}}}, ::Int64, ::Int64) at c:\Users\jack\TerminateGoogleDrive\TretarData\forex\julia_temp\rnn_discourse.jl:49
 [40] top-level scope at c:\Users\jack\TerminateGoogleDrive\TretarData\forex\julia_temp\rnn_discourse.jl:69
 [41] include_string(::Function, ::Module, ::String, ::String) at .\loading.jl:1091
 [42] invokelatest(::Any, ::Any, ::Vararg{Any,N} where N; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at .\essentials.jl:710
 [43] invokelatest(::Any, ::Any, ::Vararg{Any,N} where N) at .\essentials.jl:709 [44] inlineeval(::Module, ::String, ::Int64, ::Int64, ::String; softscope::Bool) at c:\Users\jack\.vscode\extensions\julialang.language-julia-1.0.8\scripts\packages\VSCodeServer\src\eval.jl:83
 [45] (::VSCodeServer.var"#43#45"{VSCodeServer.ReplRunCodeRequestParams,String,Int64,Int64,String,Module,Bool})() at c:\Users\jack\.vscode\extensions\julialang.language-julia-1.0.8\scripts\packages\VSCodeServer\src\eval.jl:45
 [46] withpath(::VSCodeServer.var"#43#45"{VSCodeServer.ReplRunCodeRequestParams,String,Int64,Int64,String,Module,Bool}, ::String) at c:\Users\jack\.vscode\extensions\julialang.language-julia-1.0.8\scripts\packages\VSCodeServer\src\repl.jl:118
 [47] (::VSCodeServer.var"#42#44"{VSCodeServer.ReplRunCodeRequestParams,String,Int64,Int64,String,Module,Bool,Bool})() at c:\Users\jack\.vscode\extensions\julialang.language-julia-1.0.8\scripts\packages\VSCodeServer\src\eval.jl:43
 [48] hideprompt(::VSCodeServer.var"#42#44"{VSCodeServer.ReplRunCodeRequestParams,String,Int64,Int64,String,Module,Bool,Bool}) at c:\Users\jack\.vscode\extensions\julialang.language-julia-1.0.8\scripts\packages\VSCodeServer\src\repl.jl:36 
 [49] repl_runcode_request(::VSCodeServer.JSONRPC.JSONRPCEndpoint{Base.PipeEndpoint,Base.PipeEndpoint}, ::VSCodeServer.ReplRunCodeRequestParams) at c:\Users\jack\.vscode\extensions\julialang.language-julia-1.0.8\scripts\packages\VSCodeServer\src\eval.jl:23
 [50] dispatch_msg(::VSCodeServer.JSONRPC.JSONRPCEndpoint{Base.PipeEndpoint,Base.PipeEndpoint}, ::VSCodeServer.JSONRPC.MsgDispatcher, ::Dict{String,Any}) at c:\Users\jack\.vscode\extensions\julialang.language-julia-1.0.8\scripts\packages\JSONRPC\src\typed.jl:66
 [51] macro expansion at c:\Users\jack\.vscode\extensions\julialang.language-julia-1.0.8\scripts\packages\VSCodeServer\src\VSCodeServer.jl:95 [inlined]        
in expression starting at c:\Users\jack\TerminateGoogleDrive\TretarData\forex\julia_temp\rnn_discourse.jl:69

The new error seems related to Cuda does not handle nothing well.

Well, that was certainly an adventure. It turns out that my GPU drivers were not correctly installed when I tried this the first time, and somehow everything failed silently. After I discovered and fixed this, I was able to reproduce the new error, as well as another warning:

Warning: Performing scalar operations on GPU arrays: This is very slow, consider disallowing these operations with `allowscalar(false)`

I believe this warning comes from select_last_entry_based_on_length as it has to do scalar indexing to build the final result.

In order to remove the scalar operations, I tried using one big gather operation to collect the values, this seems to fix the warning and the CUDNN_STATUS_NOT_SUPPORTED error. This version usually works, but has two interesting failures. Both can be triggered by recreating the model and re-running train several times.

The more common one is a CUDNNError: CUDNN_STATUS_BAD_PARAM (code 3) which seems to be mentioned in several issues as a hard-to-trigger intermittent error. I added some code to print the inputs when this happened, hoping to catch an obvious NaN or something, but so far no such luck.

The less common but more interesting one is a a segfault freeing memory inside libcudnn_adv_train.so.

Both seem worth opening issues for, I’ll do that tomorrow, but I’ve had enough fun for now!

using Pkg
pkg"activate --temp"
pkg"add Flux Plots Zygote"
using Flux
using Flux.Optimise: update!, Momentum
using Plots
using Zygote

function eval_model(rnn_model, x)
    out = rnn_model.(x)
    Flux.reset!(rnn_model)
    return out
end

function model_output(rnn_model, input, sequence_length_idx)
    vcat(eval_model(rnn_model, input)...)[sequence_length_idx]
end

function train(rnn_model, batch_size, sequence_length)
    input = [rand(Float32, 3, batch_size) for i = 1:sequence_length] |> gpu
    target = ones(Float32, 1, batch_size) |> gpu
    sequence_length_idx = [CartesianIndex(j, i) for (i, j) in enumerate(rand(Int64, 1:sequence_length, batch_size))]

    loss(x, idx, y) = Flux.Losses.mae(model_output(rnn_model, x, idx), y)

    initial_loss = loss(input, sequence_length_idx, target)
    println("initial_loss: $initial_loss")

    opt = Momentum(0.005, 0.8)

    loss_trajectory = []

    rnn_params = params(rnn_model)
    for i = 1:300
        local grads, current_loss
        try
            grads = gradient(rnn_params) do
                current_loss = loss(input, sequence_length_idx, target)
            end
        catch e
            bt = catch_backtrace()
            @show input;
            @show target;
            @show rnn_params;
            rethrow(e, bt)
        end


        update!(opt, rnn_params, grads)

        println("Iteration: $i current_loss: $current_loss")
        push!(loss_trajectory, [i, current_loss])
    end
    return loss_trajectory
end

model = Chain(GRU(3, 8), Dense(8, 1, relu, initW = Flux.glorot_normal)) |> gpu

loss_trajectory = train(model, 12, 20);

plot([i[1] for i in loss_trajectory], [i[2] for i in loss_trajectory])
1 Like

Thx for the follow up. Your latest code worked on my machine! I also did multiple runs and no error pops up.
I did changed one thing though, which is the sequence_length_idx line by excluding the Int64 when defining the rand.

enumerate(rand(1:sequence_length, batch_size))]

When I run rand(Int64, 1:sequence_length, batch_size) I encountered the follow error

ERROR: MethodError: no method matching rand(::Type{Int64}, ::UnitRange{Int64}, ::Int64)
Closest candidates are:
  rand(!Matched::AbstractRNG, ::Any, ::Integer, !Matched::Integer...) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Random\src\Random.jl:282
  rand(::Type{X}, !Matched::Integer, ::Integer...) where X at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Random\src\Random.jl:292
  rand(::Any, !Matched::Integer, ::Integer...) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Random\src\Random.jl:283

One last thing that occurs, which I think is more of a math problem, is that when the network output straight 0 arrays, no gradient descend would happen; the loss will be constant 1. It makes sense because all the weights & biases have 0 gradient. Is there a way in this problem setting to initialize the GRU so that this would not occur? Thank you.

Hmm, if it is reliable on your machine I must still have some configuration problem. I’ll see if I can figure anything out.

I think zero output is a thing relu activation can do. If you get unlucky and have lots of negative weights, w*x+b stays negative for most inputs, so the relu output stays zero. You might try a different activation, or you might try using a wider relu layer to reduce the chance of them all being zero, and then stacking a linear layer on top of that to get your final output.