If you have a function that produces a vector of strings from an initial string, then you should be able to pass the output token vector to a TokenDocument directly, instead of passing the original string. Then you can just make your Corpus from a list of TokenDocuments.
token_vector = my_tokenizer_function(a_document_string)
token_doc = TokenDocument(token_vector)
corpus = Corpus([token_doc])