GPU version of my net got stuck

I was trying to use a Conv net for the CIFAR-10 data. I was able to train it on CPU, but when I tried to use the GPU on my laptop to do this, it first gave me this warning:
Warning: Performing scalar operations on GPU arrays: This is very slow, consider disallowing these operations with allowscalar(false)
└ @ GPUArrays C:\Users\tume.julia\packages\GPUArrays\gjXOn\src\host\indexing.jl:58

I tried adding “CUDA.allowscalar(false)” but it just failed instantly saying “scalar getindex is disallowed”, and gave me a long stack trace about complaining around conv.

So I set it back to true. When I ran it, it just got stuck. Below is the code I used. Could someone please enlighten me on resolving this? Thank you in advance!


using MLDatasets, JLD2, FileIO, ImageFiltering, Images, Interact, Plots
using Flux, Zygote, Flux.Data.MNIST, Statistics
using Flux: onehotbatch, onecold, crossentropy, throttle, mse, flatten
using Base.Iterators: repeated, partition
using Random:randperm
using CUDA

train_x, train_y = CIFAR10.traindata()
test_x, test_y = CIFAR10.testdata()

train_x_tensor = permutedims(train_x, [1, 2, 3, 4])
train_y_onehot = onehotbatch(train_y, 0:9)

test_x_tensor = permutedims(test_x, [1, 2, 3, 4])
test_y_onehot = onehotbatch(test_y, 0:9)


cu_train_x_tensor = cu(train_x_tensor)
cu_train_y_onehot = cu(train_y_onehot)

cu_test_x_tensor = cu(test_x_tensor)
cu_test_y_onehot = cu(test_y_onehot)


gpu_deep_conv2_only_model = Chain(
    Conv((3, 3), 3 => 3, relu),
    MaxPool((2,2)) ,
    Conv((11,11), 3 => 16,  relu),
    MaxPool((3,3))
) |> gpu

gpu_deep_mlp2_model = Chain(x -> reshape(x, :, size(x, 4)),
    Dense(16, 16, relu),
    Dense(16, 10),
    softmax,
) |> gpu

gpu_deep_conv2_mlp_model = Chain(gpu_deep_conv2_only_model, gpu_deep_mlp2_model) |> gpu

gpu_deep_loss2(x,y) = crossentropy(gpu_deep_conv2_mlp_model(x), y)  |> gpu

gpu_accuracy(yout, yonehot) = mean(onecold(yout) .== onecold(yonehot))  |> gpu


batch_size = 10
opt = ADAM(1e-4) |> gpu

for iters =  1 : 250
    batch_idxs = randperm(size(cu_train_x_tensor,4))[1:batch_size]
    cu_train_x_batch_tensor = cu_train_x_tensor[:,:,:,batch_idxs]
    cu_train_set = (cu_train_x_batch_tensor, cu_train_y_onehot[:,batch_idxs])
    Flux.train!(gpu_deep_loss2, params(gpu_deep_conv2_mlp_model), [cu_train_set], opt) |> gpu
    if iters % 50 == 0
        cu_train_loss = gpu_deep_loss2(cu_train_set[1], cu_train_set[2])
        batch_idxs = randperm(size(cu_test_x_tensor, 4))[1:1000]
        cu_test_loss = gpu_deep_loss2(cu_test_x_tensor[:,:,:,batch_idxs], cu_test_y_onehot[:,batch_idxs])
        test_accuracy = gpu_accuracy(deep_conv2_mlp_model(test_x_tensor), test_y_onehot)
        println("Batch training loss is $(cu_train_loss), Test loss is $(cu_test_loss), Test accuracy is $(cu_test_accuracy)")
    end
end