Generating an ngram document term matrix with TextAnalysis.jl

Hello, I’m currently trying to use TextAnalysis.jl for the first time, and I can’t figure out this problem.

I can tokenize a string like so:

using TextAnalysis
test = "Hello, I like apples and oranges. I like ice-cream."
test_doc = TokenDocument(test)

And then, I can access unigrams as well as bigrams from the token object:

tokens(test_doc)
ngrams(test_doc, 2)

But what I can’t figure out is how to create a bigram (or ngram) document term matrix out of it? I know that first I have to convert it into a corpus object

crps = Corpus([test_doc])
update_lexicon!(crps)
m = DocumentTermMatrix(crps)
dtm(m, :dense)

But I can’t figure out from the TextAnalysis documentation, is whether I can choose the ngram to create the dtm? It seems reasonable that I could since the information is already there in the tokenized object.

I recently wanted to do the same thing. I solved it by using NGramDocument() instead of TokenDocument():

using TextAnalysis
test = "Hello, I like apples and oranges. I like ice-cream."
test_doc = NGramDocument(test, 2)

crps = Corpus([test_doc])
update_lexicon!(crps)
m = DocumentTermMatrix(crps)
dtm(m, :dense)

Hi, yep, later I figured it too, but did not reply back to my comment.
Thanks for pointing out the solution!