StaticKernels.jl is a new package that aims to enable easy and fast execution of kernel operations on arrays, e.g. finite differences, convolutions, image filters and morphological operations, etc.
- custom kernel functions in arbitrary dimensions
- custom boundary handling
- allocation-free execution
- small size (currently ~300 loc) and dependency free
Introduction
You can think of a Kernel
as a function that takes a neighbourhood Window
view on the underlying data as an argument. The kernel function aggregates the values within the neighbourhood and outputs a new value. The well-known function map(k, a)
applied to the kernel k
and a data array a
then applies the kernel as a filter.
using StaticKernels
a = rand(1000, 1000)
# Laplace
k = Kernel{(-1:1,-1:1)}(w -> w[0,-1] + w[-1,0] - 4*w[0,0] + w[1,0] + w[0,1])
map(k, a, inner=true)
# Erosion
k = Kernel{(-1:1,-1:1)}(w -> minimum(Tuple(w)))
map(k, a)
The window can be indexed using relative coordinates and implements a standard array interface. When accessed out of bounds (e.g. when positioned on the boundaries of the array) it evaluates to nothing.
# Laplace, zero boundary condition
k = Kernel{(-1:1,-1:1)}(w -> something(w[0,-1], 0.) + something(w[-1,0], 0.) - 4*w[0,0] + something(w[1,0], 0.) + something(w[0,1], 0.))
map(k, a)
# Forward-Gradient (non-skalar Kernel), neumann boundary condition
k = Kernel{(0:1, 0:1)}(w -> (something(w[1,0], w[0,0]) - w[0,0], something(w[0,1], w[0,0]) - w[0,0]))
map(k, a)
Why a new package?
Existing solutions didnβt meet at least one of:
- non-allocating, fast operations
- easily being able to write custom kernels
- custom and efficient handling of boundary conditions
- arbitrary types and dimensions.
- simple implementation and easy to extend
Benchmarks on Linear Filters
Here a quick comparison for linear filtering (see test/benchmark.jl
).
LocalFilters
might be worse off than it should be, since it allocates within the loop.
Array: (10, 10), Kernel: (3, 3)
StaticKernels 206.048 ns (0 allocations: 0 bytes)
LoopVectorization 131.591 ns (0 allocations: 0 bytes)
NNlib 13.238 ΞΌs (35 allocations: 6.89 KiB)
ImageFiltering 8.997 ΞΌs (16 allocations: 3.73 KiB)
LocalFilters 45.291 ΞΌs (2353 allocations: 36.77 KiB)
Array: (10, 10), Kernel: (5, 5)
StaticKernels 567.505 ns (0 allocations: 0 bytes)
LoopVectorization 273.347 ns (0 allocations: 0 bytes)
NNlib 15.427 ΞΌs (35 allocations: 9.45 KiB)
ImageFiltering 15.772 ΞΌs (16 allocations: 5.41 KiB)
LocalFilters 93.978 ΞΌs (5809 allocations: 90.77 KiB)
Array: (10, 10), Kernel: (7, 7)
StaticKernels 625.294 ns (0 allocations: 0 bytes)
LoopVectorization 115.621 ns (0 allocations: 0 bytes)
NNlib 14.544 ΞΌs (35 allocations: 8.52 KiB)
ImageFiltering 119.226 ΞΌs (222 allocations: 27.36 KiB)
LocalFilters 152.574 ΞΌs (10093 allocations: 157.70 KiB)
Array: (100, 100), Kernel: (3, 3)
StaticKernels 26.029 ΞΌs (0 allocations: 0 bytes)
LoopVectorization 17.417 ΞΌs (0 allocations: 0 bytes)
NNlib 248.317 ΞΌs (36 allocations: 677.66 KiB)
ImageFiltering 169.551 ΞΌs (17 allocations: 81.06 KiB)
LocalFilters 5.208 ms (266413 allocations: 4.07 MiB)
Array: (100, 100), Kernel: (5, 5)
StaticKernels 141.370 ΞΌs (0 allocations: 0 bytes)
LoopVectorization 30.132 ΞΌs (0 allocations: 0 bytes)
NNlib 780.498 ΞΌs (36 allocations: 1.76 MiB)
ImageFiltering 312.253 ΞΌs (17 allocations: 82.73 KiB)
LocalFilters 12.308 ms (732109 allocations: 11.17 MiB)
Array: (100, 100), Kernel: (7, 7)
StaticKernels 323.081 ΞΌs (0 allocations: 0 bytes)
LoopVectorization 53.485 ΞΌs (0 allocations: 0 bytes)
NNlib 1.400 ms (36 allocations: 3.31 MiB)
ImageFiltering 839.447 ΞΌs (232 allocations: 731.58 KiB)
LocalFilters 22.810 ms (1420033 allocations: 21.67 MiB)
Array: (1000, 1000), Kernel: (3, 3)
StaticKernels 3.356 ms (0 allocations: 0 bytes)
LoopVectorization 2.709 ms (0 allocations: 0 bytes)
NNlib 60.562 ms (40 allocations: 68.39 MiB)
ImageFiltering 22.945 ms (17 allocations: 7.63 MiB)
LocalFilters 703.648 ms (26964013 allocations: 411.44 MiB)
Array: (1000, 1000), Kernel: (5, 5)
StaticKernels 15.446 ms (0 allocations: 0 bytes)
LoopVectorization 6.850 ms (0 allocations: 0 bytes)
NNlib 161.323 ms (40 allocations: 189.21 MiB)
ImageFiltering 39.113 ms (17 allocations: 7.63 MiB)
LocalFilters 1.866 s (74820109 allocations: 1.11 GiB)
Array: (1000, 1000), Kernel: (7, 7)
StaticKernels 39.406 ms (0 allocations: 0 bytes)
LoopVectorization 9.057 ms (0 allocations: 0 bytes)
NNlib 323.726 ms (40 allocations: 369.37 MiB)
ImageFiltering 88.782 ms (241 allocations: 68.77 MiB)
LocalFilters 3.652 s (146496433 allocations: 2.18 GiB)