Generating an ngram document term matrix with TextAnalysis.jl

joaomacalos · June 19, 2021, 6:28am

I recently wanted to do the same thing. I solved it by using NGramDocument() instead of TokenDocument():

using TextAnalysis
test = "Hello, I like apples and oranges. I like ice-cream."
test_doc = NGramDocument(test, 2)

crps = Corpus([test_doc])
update_lexicon!(crps)
m = DocumentTermMatrix(crps)
dtm(m, :dense)

Topic		Replies	Views
Creating a corpus with a custom tokenizer General Usage nlp , text-analysis	1	412	June 17, 2022
Method Error and Corpus Creation New to Julia nlp	1	381	April 6, 2022
Tokenising using TextAnalysis 0.8 Machine Learning text-analysis	1	79	September 19, 2024
Push!() tokendocument into corpus New to Julia question	3	358	November 23, 2021
TextAnalysis 0.7.2 - where to find SentimentAnalyzer() Machine Learning	4	420	March 20, 2021

Generating an ngram document term matrix with TextAnalysis.jl

Related topics