I have allocated memory on the device and have passed it to a kernel function. Is it possible to read that memory from the host before the kernel finishes its execution?
If it is possible could you please put me on the right track (bearing in mind I’m a newbie to CUDA.jl)
Thanks for your speedy reply. No - the previous question was how to write to device memory from the host whilst a kernel is running (which I have achieved by tasking the kernel execution).
My new question is how to read from the host side the device memory during the kernel execution. I’ve tried every which way but it seems as if it is unreadable until the kernel finishes.
Reading or writing doesn’t make any difference. If you allocate a device-mapped host array, you’ll be able to perform ordinary reads and writes while the GPU will be able to read and write that memory too. But again, since that’s host memory the GPU will read over the PCIe bus and those memory operations will be slow. If you want to read/write device memory without having to wait for the kernel to finish, use a separate stream to break the ordering. In that case there’s no guarantees about the validity of the memory contents.
These questions are not specific to CUDA.jl and equally apply to CUDA C, so you can also Google for them. For example, gpgpu - Accessing cuda device memory when the cuda kernel is running - Stack Overflow.
You are right, I made a booboo in my code. Now it’s working correctly. Thankyou