Perplexing behavior when computing the matmul smoke test on GPU in Julia


@avikpal , @ChrisRackauckas Why is that when I compute the matmul smoke test for the first time and time it, it takes long time than the CPU computation, and also the allocations are huge. When I again try to compute the same operation, this time it is fast? What is the reason behind this? I have attached the scrrenshot of what I’m doing, for you to have a look at it…Please explain it to me?

Thanks in advance

That’s just JIT compilation?

It’s described at the beginning of the abstract here Reducing Compilation Latency in the Julia Programming Language

Thanks for pointing me and helping me out…I have one more question, when I do a GPU based computation, I get this error…
ERROR: LoadError: GPU compilation of MethodInstance for (::GPUArrays.var"#broadcast_kernel#26")(::CUDA.CuKernelContext, ::CuDeviceMatrix{Float32, 1}, ::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, typeof(-), Tuple{Base.Broadcast.Extruded{Matrix{Float32}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, Base.Broadcast.Extruded{LinearAlgebra.Adjoint{Float32, CuDeviceVector{Float32, 1}}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}}}, ::Int64) failed
KernelError: passing and using non-bitstype argument
Argument 4 to your kernel function is of type Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, typeof(-), Tuple{Base.Broadcast.Extruded{Matrix{Float32}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, Base.Broadcast.Extruded{LinearAlgebra.Adjoint{Float32, CuDeviceVector{Float32, 1}}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}}}, which is not isbits:
.args is of type Tuple{Base.Broadcast.Extruded{Matrix{Float32}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, Base.Broadcast.Extruded{LinearAlgebra.Adjoint{Float32, CuDeviceVector{Float32, 1}}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}} which is not isbits.
.1 is of type Base.Broadcast.Extruded{Matrix{Float32}, Tuple{Bool, Bool}, Tuple{Int64, Int64}} which is not isbits.
.x is of type Matrix{Float32} which is not isbits.
Can someone help me out with this error? I know to trace the error, you need the actual code, but can someone explain the reason for this error and what type of error in general is this?

What code is that error from? Your code above does not broadcast.

The error is when I’m trying to use the Lux framework in the NeuralPDE PINN setup on the GPU…