Zygote/Flux not computing gradient with view/slice of tensor properly for MLP

Well there is still more fiddling to be done to maybe improve its performance. For example, I wonder if manually computing gradients would be faster as Zygote has a hard time with the matrix subsampling. I am also primarily interested in implementing SLIDE for computer vision applications, so I can do training and inference on a CPU. I was planning on implementing one of the new MLP-based vision networks like MLP-Mixer ( https://papers.nips.cc/paper/2021/file/cba0a4ee5ccd02fda0fe3f9a3e7b89fe-Paper.pdf ) using SLIDE to see if that would work.

I uploaded my experiments to this git repo: GitHub - outlace/SLIDE-Pose-Estimation

So I was able to get Zygote to get gradients but Flux’s built in optimizer doesn’t seem to work with my SLIDE implementation, so I had to write an ADAM optimizer from scratch, that’s in the optim.jl file.