This looks like a problem in the NNlib implementation of depthwiseconv. I get the same error with
using Flux, CUDA
CUDA.allowscalar(false)
gpu(rand(5,5,2,1)) |> gpu(DepthwiseConv((3,3), 2=>2))
This issue makes it look like a GPU version of depthwiseconv has not yet been implemented in NNlib.