[ANN] Transformers.jl

chengchingwen · May 11, 2019, 9:31pm

Hey guys, I want to announce this package for building Transformer related model with Flux.jl.
GitHub: https://github.com/chengchingwen/Transformers.jl

With the package you can build this model

with this syntax

using Transformers
using Transformers.Basic

encoder = Stack(
    @nntopo(e → pe:(e, pe) → x → x → $N),
    PositionEmbedding(512),
    (e, pe) -> e .+ pe,
    Dropout(0.1),
    [Transformer(512, 8, 64, 2048) for i = 1:N]...
)

decoder = Stack(
    @nntopo((e, m, mask):e → pe:(e, pe) → t → (t:(t, m, mask) → t:(t, m, mask)) → $N:t → c),
    PositionEmbedding(512),
    (e, pe) -> e .+ pe,
    Dropout(0.1),
    [TransformerDecoder(512, 8, 64, 2048) for i = 1:N]...,
    Positionwise(Dense(512, length(labels)), logsoftmax)
)

These are all compatible with the current Flux’s API. You can find more information on the README.

I’ll also be working on the BERT model as one of the JSoC 2019 project. If you are interested, please take a look and give it a try.

AsafManela · February 17, 2020, 4:24pm

This sounds like a very useful project.
But the CIs look like they are failing.
What is its current status?

chengchingwen · February 17, 2020, 5:27pm

If you use Flux with Tracker (the old version), it work quit fine (with Transformers.jl@0.1.1). But if you need Flux with Zygote, it is not ready. The gradient calculation is not correct with Zygote, though the forward part is ok.

AsafManela · February 17, 2020, 10:56pm

Thanks for the tip!
I tried to run the examples/BERT/pretrain.jl but I keep hitting an InexactError like below. Any ideas?

(Transformers) pkg> st
Project Transformers v0.1.1
    Status `~/.julia/dev/Transformers/Project.toml`
  [79e6a3ab] Adapt v1.0.0
  [c7e460c6] ArgParse v0.6.2
  [fbb218c0] BSON v0.2.3
  [a4280ba5] BytePairEncoding v0.1.1
  [3a865a2d] CuArrays v1.3.0
  [124859b0] DataDeps v0.6.4
  [587475ba] Flux v0.9.0
  [cd3eb016] HTTP v0.8.5
  [7d512f48] InternedStrings v0.7.0
  [682c06a0] JSON v0.21.0
  [9c8b4983] LightXML v0.8.0
  [1914dd2f] MacroTools v0.5.1
  [ae029012] Requires v0.5.2
  [796a5d58] WordTokenizers v0.5.3
  [a5390f91] ZipFile v0.8.3
  [ade2ca70] Dates 
  [8bb1440f] DelimitedFiles 
  [37e2e46d] LinearAlgebra 
  [d6f4376e] Markdown 
  [44cfe95a] Pkg 
  [9a3f8284] Random 
  [4ec0a83e] Unicode 

julia> cd("example/BERT/")

julia> include("pretrain.jl")
[ Info: Precompiling Transformers [21ca0261-441d-5938-ace7-c90938fde4d4]
[ Info: loading pretrain bert model: uncased_L-12_H-768_A-12.tfbson wordpiece
[ Info: loading pretrain bert model: uncased_L-12_H-768_A-12.tfbson tokenizer
[ Info: loading pretrain bert model: uncased_L-12_H-768_A-12.tfbson bert_model
[ Info: start training
[ Info: epoch: 1
ERROR: LoadError: InexactError: Bool(4)
Stacktrace:
 [1] Bool at ./float.jl:73 [inlined]
 [2] convert at ./number.jl:7 [inlined]
 [3] logcrossentropy(::TrackedArray{…,CuArray{Float32,2}}, ::Flux.OneHotMatrix{CuArray{Flux.OneHotVector,1}}) at /home/amanela/.julia/dev/Transformers/src/basic/loss.jl:40
 [4] loss(::NamedTuple{(:tok, :segment),Tuple{CuArray{Int64,2},CuArray{Int64,2}}}, ::CuArray{Tuple{Int64,Int64},1}, ::Flux.OneHotMatrix{CuArray{Flux.OneHotVector,1}}, ::Flux.OneHotMatrix{CuArray{Flux.OneHotVector,1}}, ::CuArray{Float32,3}) at /home/amanela/.julia/dev/Transformers/example/BERT/pretrain.jl:63
 [5] train!() at /home/amanela/.julia/dev/Transformers/example/BERT/pretrain.jl:96
 [6] top-level scope at /home/amanela/.julia/dev/Transformers/example/BERT/pretrain.jl:104
 [7] include at ./boot.jl:328 [inlined]
 [8] include_relative(::Module, ::String) at ./loading.jl:1105
 [9] include(::Module, ::String) at ./Base.jl:31
 [10] include(::String) at ./client.jl:424
 [11] top-level scope at REPL[6]:1
in expression starting at /home/amanela/.julia/dev/Transformers/example/BERT/pretrain.jl:104

Iulian.Cioarca · February 17, 2020, 11:25pm

Try Bool(1). It seems like Bool converts only 0 or 1 to false/true.

chengchingwen · February 18, 2020, 2:30am

Ah, I put the argument in the wrong order. Please switch the nextlabel to the first argument of logcrossentropy in the loss function. Don’t know why I didn’t find this before.

AsafManela · February 18, 2020, 4:13pm

Switching the argument order works. Thanks!

Topic		Replies	Views
Running a pre-trained BERT on twitter data using Flux.jl Transformer.jl Machine Learning flux , nlp , transformers	17	2162	September 16, 2021
Julia Implementation of Transformer Neural Network Model Machine Learning flux	3	1636	April 19, 2019
Loading a trained model in Transformers.jl General Usage question , flux , transformers	0	366	September 25, 2023
Transformers for NER classification Machine Learning transformers	9	1049	October 12, 2021
BERT models from huggingface - Transformers.jl Machine Learning package	1	1198	July 15, 2021

[ANN] Transformers.jl

Related topics