I don’t suppose rockblas_get_stream(handle())
the correct one to synchronize? I.e., that hipStreamSynchronize(rockblas_get_stream(handle()))
would be correct?
I’m also new to GPUs and don’t actually know what a stream is.
So for now, I used
gmul!(C,A,B) = (mul!(C,A,B); AMDGPU.HIP.hipDeviceSynchronize())
New results:
>10
TFLOPS is pretty good.
I’ll test your PR with build system updates.