I’m having trouble debugging the following issue with CuArrays. I see very different performance using the findmax function when I call it from the REPL vs when I use it in my application. From the REPL I get the desired “good” performance, which I can reproduce in Example 1 below. In my application…

Help Debugging GPU Performance Issue

maleadt July 1, 2020, 8:15am 3

The allocations reported by BenchmarkTools are CPU allocations, and there are always some when launching kernels (we need to allocate kernel parameter buffers to pass to CUDA). To see GPU allocations, you can use CUDA.@time.

Topic		Replies	Views
What is the optimal way of updating CuArray? GPU cudanative	7	1543	July 5, 2018
Memory usage problem when using findmax/min GPU	9	903	December 29, 2022
GPU randn way slower than rand? Performance gpu , cuda	6	1604	December 3, 2018
Using CUSOLVER in CuArrays.jl GPU question	12	1893	February 26, 2019
The performance difference of transferring (SubArray, ReshapedArray) Array to GPU GPU flux	2	662	November 20, 2019

Help Debugging GPU Performance Issue

Related topics