[ANN] Transformers.jl

Hey guys, I want to announce this package for building Transformer related model with Flux.jl.
GitHub: https://github.com/chengchingwen/Transformers.jl

With the package you can build this model


with this syntax

using Transformers
using Transformers.Basic

encoder = Stack(
    @nntopo(e → pe:(e, pe) → x → x → $N),
    PositionEmbedding(512),
    (e, pe) -> e .+ pe,
    Dropout(0.1),
    [Transformer(512, 8, 64, 2048) for i = 1:N]...
)

decoder = Stack(
    @nntopo((e, m, mask):e → pe:(e, pe) → t → (t:(t, m, mask) → t:(t, m, mask)) → $N:t → c),
    PositionEmbedding(512),
    (e, pe) -> e .+ pe,
    Dropout(0.1),
    [TransformerDecoder(512, 8, 64, 2048) for i = 1:N]...,
    Positionwise(Dense(512, length(labels)), logsoftmax)
)

These are all compatible with the current Flux’s API. You can find more information on the README.

I’ll also be working on the BERT model as one of the JSoC 2019 project. If you are interested, please take a look and give it a try.

12 Likes

This sounds like a very useful project.
But the CIs look like they are failing.
What is its current status?

If you use Flux with Tracker (the old version), it work quit fine (with Transformers.jl@0.1.1). But if you need Flux with Zygote, it is not ready. The gradient calculation is not correct with Zygote, though the forward part is ok.

Thanks for the tip!
I tried to run the examples/BERT/pretrain.jl but I keep hitting an InexactError like below. Any ideas?

(Transformers) pkg> st
Project Transformers v0.1.1
    Status `~/.julia/dev/Transformers/Project.toml`
  [79e6a3ab] Adapt v1.0.0
  [c7e460c6] ArgParse v0.6.2
  [fbb218c0] BSON v0.2.3
  [a4280ba5] BytePairEncoding v0.1.1
  [3a865a2d] CuArrays v1.3.0
  [124859b0] DataDeps v0.6.4
  [587475ba] Flux v0.9.0
  [cd3eb016] HTTP v0.8.5
  [7d512f48] InternedStrings v0.7.0
  [682c06a0] JSON v0.21.0
  [9c8b4983] LightXML v0.8.0
  [1914dd2f] MacroTools v0.5.1
  [ae029012] Requires v0.5.2
  [796a5d58] WordTokenizers v0.5.3
  [a5390f91] ZipFile v0.8.3
  [ade2ca70] Dates 
  [8bb1440f] DelimitedFiles 
  [37e2e46d] LinearAlgebra 
  [d6f4376e] Markdown 
  [44cfe95a] Pkg 
  [9a3f8284] Random 
  [4ec0a83e] Unicode 

julia> cd("example/BERT/")

julia> include("pretrain.jl")
[ Info: Precompiling Transformers [21ca0261-441d-5938-ace7-c90938fde4d4]
[ Info: loading pretrain bert model: uncased_L-12_H-768_A-12.tfbson wordpiece
[ Info: loading pretrain bert model: uncased_L-12_H-768_A-12.tfbson tokenizer
[ Info: loading pretrain bert model: uncased_L-12_H-768_A-12.tfbson bert_model
[ Info: start training
[ Info: epoch: 1
ERROR: LoadError: InexactError: Bool(4)
Stacktrace:
 [1] Bool at ./float.jl:73 [inlined]
 [2] convert at ./number.jl:7 [inlined]
 [3] logcrossentropy(::TrackedArray{…,CuArray{Float32,2}}, ::Flux.OneHotMatrix{CuArray{Flux.OneHotVector,1}}) at /home/amanela/.julia/dev/Transformers/src/basic/loss.jl:40
 [4] loss(::NamedTuple{(:tok, :segment),Tuple{CuArray{Int64,2},CuArray{Int64,2}}}, ::CuArray{Tuple{Int64,Int64},1}, ::Flux.OneHotMatrix{CuArray{Flux.OneHotVector,1}}, ::Flux.OneHotMatrix{CuArray{Flux.OneHotVector,1}}, ::CuArray{Float32,3}) at /home/amanela/.julia/dev/Transformers/example/BERT/pretrain.jl:63
 [5] train!() at /home/amanela/.julia/dev/Transformers/example/BERT/pretrain.jl:96
 [6] top-level scope at /home/amanela/.julia/dev/Transformers/example/BERT/pretrain.jl:104
 [7] include at ./boot.jl:328 [inlined]
 [8] include_relative(::Module, ::String) at ./loading.jl:1105
 [9] include(::Module, ::String) at ./Base.jl:31
 [10] include(::String) at ./client.jl:424
 [11] top-level scope at REPL[6]:1
in expression starting at /home/amanela/.julia/dev/Transformers/example/BERT/pretrain.jl:104

Try Bool(1). It seems like Bool converts only 0 or 1 to false/true.

Ah, I put the argument in the wrong order. Please switch the nextlabel to the first argument of logcrossentropy in the loss function. Don’t know why I didn’t find this before.

Switching the argument order works. Thanks!