I don’t mean images necessarily, but any generic looping operation like matrix multiply or sum. Or do you mean a pixel more generically.
Like here: Writing fast stencil computation kernels that work on both CPUs and GPUs - #3 by maleadt
I don’t mean images necessarily, but any generic looping operation like matrix multiply or sum. Or do you mean a pixel more generically.
Like here: Writing fast stencil computation kernels that work on both CPUs and GPUs - #3 by maleadt