(open issue here)

I’m fairly new to `CuArrays`

and `Flux`

, and I met this problem of **having halted training after some epochs**. There is no CUDA out of memory error, but the usage is extremely high for this simple linear model (99.97% on a 1080 Ti). The code would sometimes finish all 500 epochs without problems, but other times halt around Epoch 150. Here’s an example code snippet

```
using LinearAlgebra
using Flux
using CuArrays, CUDAnative
using Flux.Optimise: update!
using Flux: crossentropy
device!(1)
CuArrays.allowscalar(false)
pred_loss(x, y) = sum((x .- y) .^ 2)
# dimens
B = 250
linear = Dense(400, 144) |> gpu
# norm
linear.W .= linear.W ./ sqrt.(sum(linear.W .^ 2, dims=1));
# training
E = 500
opt_U = Descent(0.01)
for e = 1:E
running_l = 0
c = 0
for b = 1:100
y = rand(144, B) |> gpu
R, = zeros(400, size(y)[2]) |> gpu
l = 0
grads = gradient(params(linear.W)) do
l = pred_loss(y, linear(R))
running_l += l
return l
end
update!(opt_U, linear.W, grads[linear.W])
linear.W .= linear.W ./ sqrt.(sum(linear.W .^ 2, dims=1))
c += 1
end
println("Epoch: $e, Running loss: $(running_l / c)")
end
```

I’m having this problem on Ubuntu 18.04, using `CuArrays`

v 2.1.0. Would appreciate some pointers on this.