Coincidentally, a PR for grid sampling was just opened against NNlib (CUDA version here). That leaves the affine grid generation, for which the code at https://github.com/thebhatman/Spatial-Transformer-Network/blob/master/src/stn.jl#L43-L58 might be workable with some minor changes since it’s 100% vectorized already. Also have a look at NNlib · Flux to see if you can make use of anything there.