I recently wanted to do the same thing. I solved it by using NGramDocument()
instead of TokenDocument()
:
using TextAnalysis
test = "Hello, I like apples and oranges. I like ice-cream."
test_doc = NGramDocument(test, 2)
crps = Corpus([test_doc])
update_lexicon!(crps)
m = DocumentTermMatrix(crps)
dtm(m, :dense)