Yeah but so I can add an additional function barrier with the explicit kernel function and even if some types are messed up due to the closure bug, I only have to pay nthreads()
times for the dynamic dispatch.
Yeah but so I can add an additional function barrier with the explicit kernel function and even if some types are messed up due to the closure bug, I only have to pay nthreads()
times for the dynamic dispatch.