Training Halts when Using CuArrarys

(open issue here)

I’m fairly new to CuArrays and Flux, and I met this problem of having halted training after some epochs. There is no CUDA out of memory error, but the usage is extremely high for this simple linear model (99.97% on a 1080 Ti). The code would sometimes finish all 500 epochs without problems, but other times halt around Epoch 150. Here’s an example code snippet

using LinearAlgebra
using Flux
using CuArrays, CUDAnative
using Flux.Optimise: update!
using Flux: crossentropy

pred_loss(x, y) = sum((x .- y) .^ 2)

# dimens
B = 250
linear = Dense(400, 144) |> gpu
# norm
linear.W .= linear.W ./ sqrt.(sum(linear.W .^ 2, dims=1));
# training
E = 500
opt_U = Descent(0.01)
for e = 1:E
    running_l = 0
    c = 0
    for b = 1:100
        y = rand(144, B) |> gpu
        R, = zeros(400, size(y)[2]) |> gpu
        l = 0
        grads = gradient(params(linear.W)) do
            l = pred_loss(y, linear(R))
            running_l += l
            return l
        update!(opt_U, linear.W, grads[linear.W])
        linear.W .= linear.W ./ sqrt.(sum(linear.W .^ 2, dims=1))
        c += 1
    println("Epoch: $e, Running loss: $(running_l / c)")

I’m having this problem on Ubuntu 18.04, using CuArrays v 2.1.0. Would appreciate some pointers on this.