Writing a fast nlp tokenizer in Julia

merckxiaan · January 30, 2021, 5:21pm

Thanks oxinabox,

I’ve seen WordTokenizers and it looks really interesting. I read the gsoc blog post on sentencepiece and understood that WordTokenizers doesn’t do training of tokenizers, right?
It’s really fascinating that this readable Julia code leads to something faster then Spacy though, I’ll be sure to take a closer look at the implementation.

Topic		Replies	Views
Creating a corpus with a custom tokenizer General Usage nlp , text-analysis	1	405	June 17, 2022
Writing a parser in Julia General Usage	10	7426	August 30, 2018
Tokenising using TextAnalysis 0.8 Machine Learning text-analysis	1	73	September 19, 2024
Pygments and Tokenize.jl Community	1	872	January 9, 2018
Discourse syntax highlighting options Meta Discussion	2	600	January 30, 2021

Writing a fast nlp tokenizer in Julia

Related topics