Monitoring GPU activity asynchronously (read data from the GPU)

Is there a way to continuously read a data array from the GPU (such as print it) while a kernel continues to update that data? in effect monitoring the kernel activity?
for instance, I want a long-running kernel to do its calculations and put the best results in an array that can be shown during the run.
Afterwards, maybe even calibrating (putting back) some parameters in real-time on the GPU to affect the calculations.

Memory copies are stream-ordered operations, that mean they will execute after the kernel finishes. If you don’t want that, you can perform the copy on a different stream. The easiest way to do so, is by performing the copy from another task. See CUDA.jl 3.0 | Tim Besard | JuliaCon2021 - YouTube