I was trying to use a Conv net for the CIFAR-10 data. I was able to train it on CPU, but when I tried to use the GPU on my laptop to do this, it first gave me this warning:

Warning: Performing scalar operations on GPU arrays: This is very slow, consider disallowing these operations with `allowscalar(false)`

└ @ GPUArrays C:\Users\tume.julia\packages\GPUArrays\gjXOn\src\host\indexing.jl:58

I tried adding “CUDA.allowscalar(false)” but it just failed instantly saying “scalar getindex is disallowed”, and gave me a long stack trace about complaining around conv.

So I set it back to true. When I ran it, it just got stuck. Below is the code I used. Could someone please enlighten me on resolving this? Thank you in advance!

```
using MLDatasets, JLD2, FileIO, ImageFiltering, Images, Interact, Plots
using Flux, Zygote, Flux.Data.MNIST, Statistics
using Flux: onehotbatch, onecold, crossentropy, throttle, mse, flatten
using Base.Iterators: repeated, partition
using Random:randperm
using CUDA
train_x, train_y = CIFAR10.traindata()
test_x, test_y = CIFAR10.testdata()
train_x_tensor = permutedims(train_x, [1, 2, 3, 4])
train_y_onehot = onehotbatch(train_y, 0:9)
test_x_tensor = permutedims(test_x, [1, 2, 3, 4])
test_y_onehot = onehotbatch(test_y, 0:9)
cu_train_x_tensor = cu(train_x_tensor)
cu_train_y_onehot = cu(train_y_onehot)
cu_test_x_tensor = cu(test_x_tensor)
cu_test_y_onehot = cu(test_y_onehot)
gpu_deep_conv2_only_model = Chain(
Conv((3, 3), 3 => 3, relu),
MaxPool((2,2)) ,
Conv((11,11), 3 => 16, relu),
MaxPool((3,3))
) |> gpu
gpu_deep_mlp2_model = Chain(x -> reshape(x, :, size(x, 4)),
Dense(16, 16, relu),
Dense(16, 10),
softmax,
) |> gpu
gpu_deep_conv2_mlp_model = Chain(gpu_deep_conv2_only_model, gpu_deep_mlp2_model) |> gpu
gpu_deep_loss2(x,y) = crossentropy(gpu_deep_conv2_mlp_model(x), y) |> gpu
gpu_accuracy(yout, yonehot) = mean(onecold(yout) .== onecold(yonehot)) |> gpu
batch_size = 10
opt = ADAM(1e-4) |> gpu
for iters = 1 : 250
batch_idxs = randperm(size(cu_train_x_tensor,4))[1:batch_size]
cu_train_x_batch_tensor = cu_train_x_tensor[:,:,:,batch_idxs]
cu_train_set = (cu_train_x_batch_tensor, cu_train_y_onehot[:,batch_idxs])
Flux.train!(gpu_deep_loss2, params(gpu_deep_conv2_mlp_model), [cu_train_set], opt) |> gpu
if iters % 50 == 0
cu_train_loss = gpu_deep_loss2(cu_train_set[1], cu_train_set[2])
batch_idxs = randperm(size(cu_test_x_tensor, 4))[1:1000]
cu_test_loss = gpu_deep_loss2(cu_test_x_tensor[:,:,:,batch_idxs], cu_test_y_onehot[:,batch_idxs])
test_accuracy = gpu_accuracy(deep_conv2_mlp_model(test_x_tensor), test_y_onehot)
println("Batch training loss is $(cu_train_loss), Test loss is $(cu_test_loss), Test accuracy is $(cu_test_accuracy)")
end
end
```