I’ve have a function that calls a kernel that uses unified memory. The function allocates and frees the unified memory before and after multiple kernel invocations. I call synchronize() after each kernel call. Without synchronization, I would get a Bus Error, sp., __memcpy_avx_unaligned_erms. Based on my understanding of the documentation, I am assuming that each kernel invocation has its own default stream. The function works fine as a single thread, but when I try to run this function in two simultaneous threads, I get the above Bus Error. So, they appear to be related. It would appear that there is confusion among the two unified memory buffers, so that one thread is writing to the wrong memory buffer. Is this problem related to the streams? Do I need to specifically create the streams or can I assume that the default streams are sufficient. Any suggestions on how to fix this is appreciated?
Solved. I had to add “own=true” to unsafe_wrap for the CPU pointer, so it wouldn’t disappear.
I take it you mean own=false
? The own
flag determines whether a finalizer is added to the object that tries to free the memory when it goes out of scope. It does assume that memory is regular device memory, so with unified memory that free would fail.
No, I meant own=true, but that does not appear to work either. It just hangs the two threads. own=false causes the CPU pointer to quickly disappear when using two (and I assume more) threads. In the mean time, I’ve reverted to serial processing again.
Without an MWE I can’t help you. You can try CUDA#master, which has an additional fix related to unsafe_wrap
and unified memory, but it’s unlikely to help.
If I get the time, I’ll try creating a MWE or maybe restructuring my code will solve the problem.