What's the status of image convolutions on CPU & GPU?

FastConv is nice if you want something that grows the array by the kernel size (like conv2, something I’ve never gotten around to implementing in ImageFiltering). Otherwise, though, imfilter is both faster and more general:

A = rand(1000,1000);
k = rand(3,3);
julia> @btime convn(A, k);
  15.451 ms (20 allocations: 7.66 MiB)

julia> @btime imfilter(A, (k,));
  12.473 ms (44 allocations: 15.32 MiB)

julia> B = view(A, 1:900, 1:900);

julia> @btime imfilter(B, (k,));
  10.139 ms (44 allocations: 12.42 MiB)

julia> @btime convn(B, (k,));
ERROR: MethodError: no method matching convn(::SubArray{Float64,2,Array{Float64,2},Tuple{UnitRange{Int64},UnitRange{Int64}},false}, ::Tuple{Array{Float64,2}})