Inplace versions would require a massive rewrite and won’t really be that advantageous without other optimizations. Instead checkout Reactant + Lux (Compiling Lux Models using Reactant.jl | Lux.jl Docs), that will be significantly faster and will pre-allocate required vram memory once compiled
2 Likes