BSON error when loading Flux model

luciano-drozda · March 18, 2022, 1:27pm

The command

m = BSON.load("model.bson", @__MODULE__)[:m]

is throwing

ERROR: MethodError: Cannot `convert` an object of type CuContext to an object of type CuPtr{Nothing}
Closest candidates are:
  convert(::Type{CuPtr{T}}, ::CUDA.Mem.UnifiedBuffer) where T at /scratch/drozda/.julia/packages/CUDA/9T5Sq/lib/cudadrv/memory.jl:234
  convert(::Type{CuPtr{T}}, ::CUDA.Mem.HostBuffer) where T at /scratch/drozda/.julia/packages/CUDA/9T5Sq/lib/cudadrv/memory.jl:129
  convert(::Type{CuPtr{T}}, ::CUDA.Mem.DeviceBuffer) where T at /scratch/drozda/.julia/packages/CUDA/9T5Sq/lib/cudadrv/memory.jl:59
  ...

Does anyone know what could it be ?

ToucheSir · March 18, 2022, 2:37pm

Does the machine this is running on have a CUDA-supported GPU? Otherwise the note in Saving & Loading · Flux may apply.

luciano-drozda · March 18, 2022, 2:46pm

Thanks for the reply @ToucheSir.
I do offload the model back to the CPU before saving

m = model |> cpu
@save "model.bson" m opt

ToucheSir · March 18, 2022, 3:10pm

Ah, but opt is not offloaded. Can you get away with not saving the optimizer state as well?

luciano-drozda · March 18, 2022, 7:14pm

From what I’ve seen in Flux model zoo, you don’t need to offload the optimizer to any device. Instantiating opt = ADAM(), for instance, will work both on CPU and GPU.

What’s strange to me is that the reported issue appears just for models saved some time ago.

If I train a new model, save it (with the optimizer) and load it for inference, the issue isn’t raised.

ToucheSir · March 18, 2022, 8:14pm

Yes, but if you offload the model itself and load it back in, the references in the optimizer will no longer point to the same thing. So no error is raised, but all the optimizer state has been silently invalidated. This is a big footgun and something we haven’t been able to address until very recently, since it’s a fundamental issue with the design of the current optimizer interface.

That’s interesting, I assume you didn’t offload those older models before saving? If so, I wonder if internal changes in CUDA.jl might be causing the errors then.

luciano-drozda · March 19, 2022, 8:45am

I do offload these older models back to the CPU before saving.

Could changing modules over time lead to such an issue ?
I’m using @__MODULE__ option when loading,
but if the current version of a module differs from the older one,
issues could appear, I suppose.

ToucheSir · March 19, 2022, 4:53pm

Did you save anything other than the model? Optimizer state could be a problem as mentioned above.

If you’re willing/able to provide one of these troublesome BSON files, I could take a look. Another idea would be reading them back in with https://github.com/ancapdev/LightBSON.jl and seeing trying to recover the data manually. Even printing out a tree of all the type tags in the file could help to identify what is holding onto CUDA stuff.

Topic		Replies	Views
Error while loading Serialized BSON Machine Learning question	4	60	August 22, 2024
Saving Flux model with BSON (julia 1.2) New to Julia	1	699	October 30, 2019
Unable to save and load model (or parameters) with BSON either on GPU or CPU Machine Learning gpu , cuda , flux , bson	2	1248	July 15, 2021
Unable to save simple FLUX model with BSON Machine Learning	2	587	October 8, 2019
Saving/loading Flux models with Julia 1.8.x? Specific Domains flux	6	699	April 24, 2023

BSON error when loading Flux model

Related topics