Saving a model built with Transformers.jl

gemuesi · March 15, 2023, 12:55pm

Hi there,
I am pretty new to anything related to this topic in general, but I found Flux to be quite accessible and wanted to have a look at the Transformers.jl package as well. I’ve been playing around with the copy task (the toy example found here: Transformers.jl/example/AttentionIsAllYouNeed at master · chengchingwen/Transformers.jl · GitHub) and was wondering how one could save the model to use in a future session. I know that there is the @save macro from the BSON.jl package but something like

@save "/path/to/folder/" trf_model

or saving every single layer did not do the trick for me.
I’d appreciate any hint on how to save and load the model in the above-mentioned example (@chengchingwen).

Cheers

chengchingwen · March 15, 2023, 12:59pm

What error did you get?

gemuesi · March 15, 2023, 1:04pm

Hi,
thanks for the quick response!
When not loading the model onto CPU before saving, I got the ReadOnlyMemoryError(). So I moved it to CPU and back to GPU after loading, but then the decode_loss() function returned NaNs while training.
But from your response it seems that saving trf_model alone (and not all single layers on top) should suffice?

Cheers

chengchingwen · March 15, 2023, 1:07pm

It should be suffice. Did you move the model back to cpu before saving, and move it to gpu after loading? BSON does not support directly saving array from gpu, so an extra copy to cpu is needed.

gemuesi · March 15, 2023, 1:27pm

Yes, I did that. I think I found the mistake: I just did not set the model as constant again when loading it. Now it seems to have loaded the model, but this brings me to a follow-up question: in the example models you used translate(), which uses embed, encoder etc. - which are not saved as such when only saving trf_model. How can I access their trained parameters in a subsequent session?

chengchingwen · March 15, 2023, 1:30pm

They are in the trf_model. trf_model is just a model wrapper wrapping those embed, encoder etc. You should be able to access those model in trf_model’s field. (or you can just save embed/encoder/… instead.)

gemuesi · March 15, 2023, 1:48pm

Right! I was struggling with this because fieldnames() didn’t work but propertynames() did the trick and after a little bit of maneuvering I found all I needed to make translate() work with the trained parameters. Many thanks!

Cheers

Topic		Replies	Views
Unable to save and load model (or parameters) with BSON either on GPU or CPU Machine Learning gpu , cuda , flux , bson	2	1251	July 15, 2021
Loading a trained model in Transformers.jl General Usage question , flux , transformers	0	366	September 25, 2023
How to load BSON file of the model build with Flux@0.12.10 to use with Flux@0.13? Flux.Diagonal deprecated problem Machine Learning flux , bson , save	7	406	December 27, 2022
Qs about trained model: (1) testmode! vs excluding Dropout layers; (2) size of BSON file Machine Learning	3	323	October 11, 2021
Saving/loading Flux models with Julia 1.8.x? Specific Domains flux	6	712	April 24, 2023

Saving a model built with Transformers.jl

Related topics