I’m having trouble debugging the following issue with CuArrays. I see very different performance using the
findmax function when I call it from the REPL vs when I use it in my application. From the REPL I get the desired “good” performance, which I can reproduce in Example 1 below. In my application I always see the “bad” performance in Example 2. In my application I’m calling the function as in Example 1, but am seeing the performance from Example 2.
Rather than posting my more complicated code, I’m wondering what is the best way to figure out what is going on under the hood in each case and how to fix it.
a=CuArrays.rand(64) @btime findmax(a)
196.191 μs (265 allocations: 8.42 KiB)
108.159 ms (41192 allocations: 2.99 MiB)
FYI there is almost no overhead to creating the array so that can’t account for the difference.
9.079 μs (6 allocations: 144 bytes)