This is one big weak point of the current Flux optimizer design, and there’s no easy way to fix it. That’s why Flux is moving to use GitHub - FluxML/Optimisers.jl: Optimisers.jl defines many standard optimisers and utilities for learning loops., which does let you do cpu(opt_state) and save things that way. If you can, I’d recommend switching to it today so that you’re also future-proofed when Flux switches over.