Overcoming Slow Scalar Operations on GPU Arrays

Hi all,

I’ve been trying to exploit GPU capabilities for solving a NeuralPDE with GalacticOptim.solve, but I’ve been stumbling into a problem related to scalar operations on GPU arrays.

In line 123 I use the |> gpu operator and in line 179 if CUDA.allowscalar(true) I get:

Warning: Performing scalar operations on GPU arrays: This is very slow, consider disallowing these operations with allowscalar(false)

Indeed, if I set CUDA.allowscalar(false) I get an error:
`scalar getindex is disallowed``

Is there a way to exploit GPU’s speedup with these settings?

Here you can find my code, thanks :slightly_smiling_face:

Copying my reply from Slack for archival purposes:

this is too large of an example to have a quick look at

see https://juliagpu.gitlab.io/CUDA.jl/usage/workflow/#UsageWorkflowScalar, scalar indexing is something you want to avoid because it kills performance (edited)

there’s no ‘setting’ to automatically speed these operations up. you need to avoid the problematic calls and replace them by something vectorized the array infrastructure can work with