CUDA arrays not working well with broadcast!(), and other in-place operations inside a loop

I’m not sure what’s the problem that’s haunting you here?

Same explanation, you’re measuring time wrong. You can only use @time when you synchronize the GPU, which you aren’t.

Why to you assume that? The GPU can only queue a number of asynchronous operations, so it’s likely that the first sequence of operations without a synchronize() gets queued up without anything more, while subsequent iterations will need to wait until there’s space in the queue.

3 Likes