ERROR: MethodError: Cannot `convert` an object of type CuContext to an object of type CuPtr{Nothing}
Closest candidates are:
convert(::Type{CuPtr{T}}, ::CUDA.Mem.UnifiedBuffer) where T at /scratch/drozda/.julia/packages/CUDA/9T5Sq/lib/cudadrv/memory.jl:234
convert(::Type{CuPtr{T}}, ::CUDA.Mem.HostBuffer) where T at /scratch/drozda/.julia/packages/CUDA/9T5Sq/lib/cudadrv/memory.jl:129
convert(::Type{CuPtr{T}}, ::CUDA.Mem.DeviceBuffer) where T at /scratch/drozda/.julia/packages/CUDA/9T5Sq/lib/cudadrv/memory.jl:59
...
From what I’ve seen in Flux model zoo, you don’t need to offload the optimizer to any device. Instantiating opt = ADAM(), for instance, will work both on CPU and GPU.
What’s strange to me is that the reported issue appears just for models saved some time ago.
If I train a new model, save it (with the optimizer) and load it for inference, the issue isn’t raised.
Yes, but if you offload the model itself and load it back in, the references in the optimizer will no longer point to the same thing. So no error is raised, but all the optimizer state has been silently invalidated. This is a big footgun and something we haven’t been able to address until very recently, since it’s a fundamental issue with the design of the current optimizer interface.
That’s interesting, I assume you didn’t offload those older models before saving? If so, I wonder if internal changes in CUDA.jl might be causing the errors then.
I do offload these older models back to the CPU before saving.
Could changing modules over time lead to such an issue ?
I’m using @__MODULE__ option when loading,
but if the current version of a module differs from the older one,
issues could appear, I suppose.
Did you save anything other than the model? Optimizer state could be a problem as mentioned above.
If you’re willing/able to provide one of these troublesome BSON files, I could take a look. Another idea would be reading them back in with https://github.com/ancapdev/LightBSON.jl and seeing trying to recover the data manually. Even printing out a tree of all the type tags in the file could help to identify what is holding onto CUDA stuff.