Transformers for NER classification

Maxim_Lubov · October 6, 2021, 12:33pm

Hello. I am building a model for NER classification. Model is based on pre-trained BERT loaded with Transformers. I have some problems with the loss function. The problem is, that all losses are calculated only for the sequences, (separated with [CLS] token), and not for the tokens.
My classifier is.
const clf = cpu(Chain(Dropout(0.1), Dense(hidden_size, length(labels)), logsoftmax))

and I create classifier on top of BERT:
const bert_model = cpu(set_classifier(_bert_model, (pooler=_bert_model.classifier.pooler, clf=clf)))

Here labels is the number of NER-tags

@chengchingwen

chengchingwen · October 6, 2021, 2:12pm

What do you mean?

Maxim_Lubov · October 6, 2021, 2:26pm

The number of predicted labels is equal to the number of sequences in the batch, and I want to get predicted labels for the tokens in each sequence.
For the token classification, I use a classifier layer on top of the pre-trained BERT model.
My classifier is:
Chain(Dropout(0.1), Dense(hidden_size, length(labels2)), logsoftmax)
Predictions I calculate as follows:
p = bert_model.classifier.clf(bert_model.classifier.pooler(t[:, 1, :]))

chengchingwen · October 6, 2021, 2:36pm

If you are doing sequence labeling, then just apply the cross entropy loss on each tokens.

For example:

function loss(model, batch)
    data = batch.input
    E = model.embed(data)
    H = model.transformers(E, batch.atten_mask)
    ner = @view model.classifier.ner(H)[:, 2:end-1, :] # remove the [CLS] and [SEP] token
    ner_loss = Basic.logcrossentropy(batch.ner, ner, batch.mask)
    return ner_loss
end

where model.classifier.ner is your bert_model.classifier.clf. The pooler is not needed.

Maxim_Lubov · October 6, 2021, 4:09pm

Thanks for the help.
When calculating ner_loss, the error arises, since ner is 3D array, with dimensions
(number_of_name_entities, (size(batch.atten_mask)[2] - 2), number_of_sequencies)
So I use Flux.onehotbatch to encode labels. However, I obtain 2d Array with dimensions:
(number_of_name_entities, number_of_tokens_in_batch)
Is it possible in Transformers to produce labels 3d array, in accordance with batch.atten_mask?

chengchingwen · October 6, 2021, 4:20pm

I do use 3D array for ner label. Just wrap them with Basic.Vocabulary.

I can show you the script that I used to process the conll2003 dataset:

using JSON3
using Arrow

const datainfo = open(JSON3.read, "./datasets/conll2003/dataset_info.json")

const pos_labels = collect(datainfo.features.pos_tags.feature.names)
const chunk_labels = collect(datainfo.features.chunk_tags.feature.names)
const ner_labels = collect(datainfo.features.ner_tags.feature.names)

const pos_vocab = Vocabulary(pos_labels, ".")
const chunk_vocab = Vocabulary(chunk_labels, chunk_labels[1])
const ner_vocab = Vocabulary(ner_labels, ner_labels[1])

const trainset = Arrow.Table("./datasets/conll2003/conll2003-train.arrow")
const devset = Arrow.Table("./datasets/conll2003/conll2003-validation.arrow")
const testset = Arrow.Table("./datasets/conll2003/conll2003-test.arrow")

const train_num = length(trainset.id)
const dev_num = length(devset.id)
const test_num = length(testset.id)

function retoken(wp, tk, tokens)
    retokens = Array{String}(undef, 0)
    wordbounds = Array{Int}(undef, 0)
    _len = length(tokens)
    sizehint!(retokens, _len)
    sizehint!(wordbounds, _len)

    for (i, token) in enumerate(tokens)
        ntokens = wp(tk(token))
        append!(retokens, ntokens)
        foreach(_->push!(wordbounds, i), 1:length(ntokens))
    end

    sizehint!(retokens, length(retokens))
    sizehint!(wordbounds, length(wordbounds))

    # @assert wp(tk(join(tokens, ' '))) == retokens
    return retokens, wordbounds
end

function getbatch(dataset, ids)
    tks = dataset.tokens[ids]
    chks = dataset.chunk_tags[ids]
    poss = dataset.pos_tags[ids]
    ners = dataset.ner_tags[ids]
    return (token=tks, chunk=chks, pos=poss, ner=ners)
end

function relabel(wb, label, labels)
    relabels = Vector{String}(undef, 0)
    sizehint!(relabels, length(labels))
    base = 1
    @assert first(wb) == base
    for i in wb
        l = labels[i] + 1
        if base == i
            push!(relabels, label[l])
            base += 1
        else
            push!(relabels, replace(label[l], r"^B"=>'I'))
        end
    end

    return relabels
end

function preprocess(wordpiece, tokenizer, sample)
    token, wb = retoken(wordpiece, tokenizer, sample.token)
    chunk = relabel(wb, chunk_labels, sample.chunk)
    pos = relabel(wb, pos_labels, sample.pos)
    ner = relabel(wb, ner_labels, sample.ner)
    return (token = token, chunk = chunk, pos = pos, ner = ner, bounds = wb)
end

function preprocess_batch(wordpiece, tokenizer, sample)
    batch = length(sample.token)
    token = Vector{Vector{String}}(undef, batch)
    wb = Vector{Vector{Int}}(undef, batch)
    chunk = similar(token)
    pos = similar(token)
    ner = similar(token)

    for i = 1:batch
        token[i], wb[i] = retoken(wordpiece, tokenizer, sample.token[i])
        chunk[i] = relabel(wb[i], chunk_labels, sample.chunk[i])
        pos[i] = relabel(wb[i], pos_labels, sample.pos[i])
        ner[i] = relabel(wb[i], ner_labels, sample.ner[i])
    end

    return (token = token, chunk = chunk, pos = pos, ner = ner, bounds = wb)
end

addsstok(x, start_token = "[CLS]", sep_token = "[SEP]") = [start_token; x; sep_token]

function process(wordpiece, tokenizer, sample)
    batch = preprocess_batch(wordpiece, tokenizer, sample)
    token = batch.token
    tok = map(addsstok, token)

    mask = Basic.getmask(batch.token)
    atten_mask = Basic.getmask(tok)
    tok_id = vocab(tok)
    segment = ones(Int, size(tok_id))

    pos = Flux.onehot(pos_vocab, batch.pos)
    chunk = Flux.onehot(chunk_vocab, batch.chunk)
    ner = Flux.onehot(ner_vocab, batch.ner)

    bounds = Tuple(batch.bounds)

    return (input = (tok = tok_id, segment = segment), mask = mask, atten_mask = atten_mask,
            pos = pos, chunk = chunk, ner = ner, bounds = bounds)
end

Maxim_Lubov · October 11, 2021, 1:07pm

Thanks for the help!

Maxim_Lubov · October 11, 2021, 3:11pm

@chengchingwen
I trained my model and saved it:
@save joinpath(pwd(), "bert_model_ADAM_1.e-5.bson") bert_model

However, when I try to load BSON file, the following error occurs:
LoadError: UndefVarError: Transformers not defined Stacktrace: [1] (::BSON.var"#31#32")(m::Module, f::String) @ BSON C:\Users\User1\.julia\packages\BSON\N216E\src\extensions.jl:21

Also, does my saved model include wordpiece and tokenizer?

chengchingwen · October 11, 2021, 3:48pm

BSON.jl only save the data. You still need to using all the required packages (e.g. Transformers.jl, Flux.jl etc.). Besides, It would be problematic if you save the model with GPU data. I would recommend doing cpu_model = cpu(bert_model) and then BSON.@save the cpu_model.

No, since we only save the bert_model, but you can do @save joinpath(pwd(), "bert_model_ADAM_1.e-5.bson") bert_model tokenizer wordpiece to save them as one file. (remember you still need to using all the required packages as mentioned above)

Maxim_Lubov · October 12, 2021, 6:07pm

@chengchingwen
Thanks again for the help.
I have a strange error when using saved bert_model. I load the trained model in another Julia project. Before running the model to make predictions, I perform some preprocessing and when I vectorized tokens:
E = model.embed(tok, segment)
following error occurs:
MethodError: no method matching (::Transformers.Basic.CompositeEmbedding

I have the same using as in the training project.

Topic		Replies	Views
Running a pre-trained BERT on twitter data using Flux.jl Transformer.jl Machine Learning flux , nlp , transformers	17	2163	September 16, 2021
Using Transformers.jl for "is next sentence" New to Julia	2	555	March 24, 2021
Using Transformers.jl for time series classification? Web Stack	10	1501	December 23, 2020
[ANN] Transformers.jl Package Announcements announcement	6	1947	February 18, 2020
Transformers.jl: How to train for masked languge model tasks in Julia? General Usage transformers	4	493	December 7, 2023

Transformers for NER classification

Related topics