Tokenising using TextAnalysis 0.8

Thanks to the maintainers of TextAnalysis.

This works in TextAnalysis 0.7.5, but not here in 0.8.1:

julia> docs = ["Hi my name is Sam.", "How are you today?"]
2-element Vector{String}:
 "Hi my name is Sam."
 "How are you today?"

julia> tokenized_docs = TextAnalysis.tokenize.(docs)
ERROR: MethodError: no method matching tokenize(::String)

Closest candidates are:
  tokenize(::Type{S}, ::T) where {S<:Languages.Language, T<:AbstractString}
   @ TextAnalysis ~/.julia/packages/TextAnalysis/bdhyq/src/deprecations.jl:4
  tokenize(::S, ::T) where {S<:Languages.Language, T<:AbstractString}
   @ TextAnalysis ~/.julia/packages/TextAnalysis/bdhyq/src/tokenizer.jl:19

What’s the new way to do this under 0.8.1?

I’m trying to update the examples at MLJText.jl, which have stopped working.

1 Like

Looks like you need to add the Languages package.

julia> using TextAnalysis; using Languages;
julia> TextAnalysis.tokenize(Languages.English(), "the cat")