Tokenising using TextAnalysis 0.8

ablaom · September 18, 2024, 8:44pm

Thanks to the maintainers of TextAnalysis.

This works in TextAnalysis 0.7.5, but not here in 0.8.1:

julia> docs = ["Hi my name is Sam.", "How are you today?"]
2-element Vector{String}:
 "Hi my name is Sam."
 "How are you today?"

julia> tokenized_docs = TextAnalysis.tokenize.(docs)
ERROR: MethodError: no method matching tokenize(::String)

Closest candidates are:
  tokenize(::Type{S}, ::T) where {S<:Languages.Language, T<:AbstractString}
   @ TextAnalysis ~/.julia/packages/TextAnalysis/bdhyq/src/deprecations.jl:4
  tokenize(::S, ::T) where {S<:Languages.Language, T<:AbstractString}
   @ TextAnalysis ~/.julia/packages/TextAnalysis/bdhyq/src/tokenizer.jl:19

What’s the new way to do this under 0.8.1?

I’m trying to update the examples at MLJText.jl, which have stopped working.

jlapeyre · September 19, 2024, 1:21am

Looks like you need to add the Languages package.

julia> using TextAnalysis; using Languages;
julia> TextAnalysis.tokenize(Languages.English(), "the cat")

Topic		Replies	Views
TextAnalysis 0.7.2 - where to find SentimentAnalyzer() Machine Learning	4	414	March 20, 2021
TextAnalysis.jl statistical language models not defined New to Julia package	3	346	October 7, 2020
Creating a corpus with a custom tokenizer General Usage nlp , text-analysis	1	410	June 17, 2022
Could you please produce a New toml for TextAnalysis - its downgrading DataFrames version General Usage package	6	404	December 15, 2020
[ANN] LLMTextAnalysis.jl - Unveil Text Insights with LLMs! Package Announcements announcement , llm , generative-ai	1	613	January 17, 2024

Tokenising using TextAnalysis 0.8

Related topics