Hi, I cannot directly help with your problem, but maybe we can find another solution in separating CPU and GPU code. I have translated pytorch’s implementation of bilinear upsampling to julia and it seems to work more or less, even with fractional upsampling. I just dont have experience with zygote and adjoints. You can have a look at the implementation here and try it out with the following code. The forward pass looks good, but I havent checked the backward pass.
using FileIO
using ImageView # ZZZzzzz
using Colors
using BenchmarkTools
f = download("https://upload.wikimedia.org/wikipedia/en/e/ed/Nyan_cat_250px_frame.PNG")
nyan = load(f)
imshow(nyan)
nyan_nchw = reshape(reinterpret(UInt8, nyan),1,3,250,250)
nyan_whcn = permutedims(nyan_nchw, [3,4,2,1]) .|> Float32
nyan_gpu = CuArray(nyan_whcn)
nyan_large = upsample_bilinear(nyan_gpu, pi, pi)
nyan_large_cpu = Array(nyan_large)[:,:,:,1]
nyan_colored = view(reinterpret(RGB{Float32}, permutedims(nyan_large_cpu, [3,1,2])),1,:,:) # what a mess
imshow(nyan_colored/255)
# or plain arrays:
n = 64
s = 4
checkerboard = zeros(Float32, n, n)
for x in 1:2s:n-s
for y in 1:2s:n-s
checkerboard[y:y+s-1, x:x+s-1] .= 1
end
end
imshow(checkerboard)
checkerboard_gpu = CuArray(reshape(checkerboard,n,n,1,1))
res = Array(upsample_bilinear(checkerboard_gpu, 2, 2))
imshow(res[:,:,1,1])
imshow(round.(res[:,:,1,1]))
Maybe we you can compare this to your results and have a look at the backward pass.
EDIT: I just saw this, which also looks interesting!