Hello All,
I am having a problem while trying to load a model that is trained from scratch with Transformers.jl. The model it self is a seq2seq model that is very similar to the one in the documentation. Only difference is that I use multiple layers.
During training, I saved the model using Flux.state
and BSON: @save
.
Now, I try to load the trained model with Flux.loadmodel!
. And, I get NamedTuple{(:layers)}
. When I call propertynames(model[:layers])
it returns an ntuple
with layer numbers. Then if I call propertynames(model[:layers][1])
it returns (embeddings, )
, and propertynames(model[:layers][2])
it returns (:blocks, :f)
which seem fine to me.
The problem arises when I try to load the TransformerBlock(s)
. When I call TransformerBlock(model[:layers][2][:blocks], nothing)
it throws an error saying :
TransformerBlock(
Tuple(
NamedTuple(
NamedTuple(
NamedTuple(
Error showing value of type TransformerBlock{Tuple{NamedTuple{(:attention, :feedforward), Tuple{NamedTuple{(:layer, :norm), Tuple{NamedTuple{(:layer,), Tuple{NamedTuple{(:attention_op, :qkv_proj, :o_proj), Tuple{Tuple{}, NamedTuple{(:layer,), Tuple{NamedTuple{(:W, :b), Tuple{Matrix{Float32}, Vector{Float32}}}}}, NamedTuple{(:W, :b), Tuple{Matrix{Float32}, Vector{Float32}}}}}}}, NamedTuple{(:α, :β), Tuple{Vector{Float32}, Vector{Float32}}}}}, NamedTuple{(:layer, :norm), Tuple{NamedTuple{(:layer,), Tuple{NamedTuple{(:layers,), Tuple{Tuple{NamedTuple{(:W, :b), Tuple{Matrix{Float32}, Vector{Float32}}}, NamedTuple{(:W, :b), Tuple{Matrix{Float32}, Vector{Float32}}}}}}}}, NamedTuple{(:α, :β), Tuple{Vector{Float32}, Vector{Float32}}}}}}}, NamedTuple{(:attention, :feedforward), Tuple{NamedTuple{(:layer, :norm), Tuple{NamedTuple{(:layer,), Tuple{NamedTuple{(:attention_op, :qkv_proj, :o_proj), Tuple{Tuple{}, NamedTuple{(:layer,), Tuple{NamedTuple{(:W, :b), Tuple{Matrix{Float32}, Vector{Float32}}}}}, NamedTuple{(:W, :b), Tuple{Matrix{Float32}, Vector{Float32}}}}}}}, NamedTuple{(:α, :β), Tuple{Vector{Float32}, Vector{Float32}}}}}, NamedTuple{(:layer, :norm), Tuple{NamedTuple{(:layer,), Tuple{NamedTuple{(:layers,), Tuple{Tuple{NamedTuple{(:W, :b), Tuple{Matrix{Float32}, Vector{Float32}}}, NamedTuple{(:W, :b), Tuple{Matrix{Float32}, Vector{Float32}}}}}}}}, NamedTuple{(:α, :β), Tuple{Vector{Float32}, Vector{Float32}}}}}}}}, Nothing}:
ERROR: MethodError: _show_leaflike(::Tuple{}) is ambiguous. Candidates:
_show_leaflike(::Tuple{Vararg{Number}}) in Flux at /home/phd/.julia/packages/Flux/n3cOc/src/layers/show.jl:50
_show_leaflike(::Tuple{Vararg{AbstractArray}}) in Flux at /home/phd/.julia/packages/Flux/n3cOc/src/layers/show.jl:51
Possible fix, define
_show_leaflike(::Tuple{})
Stacktrace:
[1] _all(f::typeof(Flux._show_leaflike), itr::NamedTuple{(:attention_op, :qkv_proj, :o_proj), Tuple{Tuple{}, NamedTuple{(:layer,), Tuple{NamedTuple{(:W, :b), Tuple{Matrix{Float32}, Vector{Float32}}}}}, NamedTuple{(:W, :b), Tuple{Matrix{Float32}, Vector{Float32}}}}}, #unused#::Colon)
@ Base ./reduce.jl:1251
[2] all(f::Function, itr::NamedTuple{(:attention_op, qkv_proj .....
If I remember correctly, Flux saves models in a tree like structure. And somehow when loading this pre-trained model into a TransformerBlock
it cannot read it. What should I do ? I do not know the internals of it. Or may be I am using wrong constructor for loading transformer block?
Since the training of this model took nearly 2 days I really need to read this model file.
Could someone guide me through?
B.R.