Generic way to write array stencils/kernels for CPUs and GPUs?

oschulz · September 28, 2018, 8:28am

Do we currently have an established way to write stencils for single-/multi-array operations, or kernels in general, in a way that’s compatible with both CPU and GPU array?

Regarding kernels/stencils in general, I found an old experiment of @timholy, [KernelTools.jl[(GitHub - timholy/KernelTools.jl: Fast kernel/stencil operations in Julia), and there was some support for stencils in ParallelAccelerator.jl, but I’m not sure about the current situation.

I guess stencils could be seen as a generalized form of broadcasting (resp. broadcasting as applying a multi-array stencil of size one - broadcasting-based code we can already write in a CPU/GPU independent fashion, of course).

I have some vague ideas, but maybe someone is already working on this kind of thing?

jw3126 · September 28, 2018, 8:34am

https://github.com/JuliaImages/ImageFiltering.jl has some stuff in this direction.

oschulz · September 28, 2018, 9:03am

GitHub - JuliaImages/ImageFiltering.jl: Julia implementations of multidimensional array convolution and nonlinear stencil operations

Oh, sure, but that’s 2D stencils with linear combination of fixed coefficients on a single array right?

Maybe I chose the wrong term - when I wrote stencils, I meant kernels that have a fixed access pattern (but may read from multiple arrays) run a fixed but arbitrarily complex operation on the input values and write the result to a single entry in a single target array. Maybe I should have just termed it kernel in general - but then, a kernel in the more generic meaning of the term (e.g. CUDA) isn’t always restricted to a fixed access pattern or a single output array.

jw3126 · September 28, 2018, 9:08am

Yeah its single array, but not restricted to 2d and (on CPU) allows arbitrary operations via mapwindow.

oschulz · September 28, 2018, 9:27am

Oh, neat, good to know (does mapwindow use views?)

jw3126 · September 28, 2018, 9:36am

No, it copys to a buffer, so mapwindow(f!, arr) will not modify arr.

Topic		Replies	Views
Writing fast stencil computation kernels that work on both CPUs and GPUs GPU	3	2201	January 29, 2019
[ANN] Stencils.jl for fast small/direct stencils on CPU/GPU Package Announcements stencils	2	561	November 22, 2023
Writing stencils for CuArray GPU	6	1144	July 31, 2019
GPU Map without reduction on multiple arrays indices GPU	1	693	February 8, 2019
Map Performance with CuArrays GPU question , fftw , cuda , broadcast	15	5176	January 4, 2021

Generic way to write array stencils/kernels for CPUs and GPUs?

Related topics