Do we currently have an established way to write stencils for single-/multi-array operations, or kernels in general, in a way that’s compatible with both CPU and GPU array?
I guess stencils could be seen as a generalized form of broadcasting (resp. broadcasting as applying a multi-array stencil of size one - broadcasting-based code we can already write in a CPU/GPU independent fashion, of course).
I have some vague ideas, but maybe someone is already working on this kind of thing?
Oh, sure, but that’s 2D stencils with linear combination of fixed coefficients on a single array right?
Maybe I chose the wrong term - when I wrote stencils, I meant kernels that have a fixed access pattern (but may read from multiple arrays) run a fixed but arbitrarily complex operation on the input values and write the result to a single entry in a single target array. Maybe I should have just termed it kernel in general - but then, a kernel in the more generic meaning of the term (e.g. CUDA) isn’t always restricted to a fixed access pattern or a single output array.