I would like to ask, if there is a
Dict that would be suitable for indexing a large collection of documents. Specifically imagine that each document is composed by a set of words
doc1 = [1,2,3,4,5,] doc2 = [3,4,5,6] doc3 = [1,3,4,5]
I would like to store this in a dict containing
Dict( 1 => [doc1, doc3], 2 => [doc2], 3 => [doc1, doc2, doc3], 4 => [doc1, doc2, doc3], 5 => [doc1, doc2, doc3], 6 => [doc2], )
But I need the solution to scale to tens of millions of documents where words are mostly unique to a document operating on streams.
At the moment, GitHub - andyferris/Dictionaries.jl: An alternative interface for dictionaries in Julia, for improved productivity and performance by @andyferris seems to me the best solution. Does anyone knows better solution / works on something like this?