There is also the memory tracking output and a sample dataset.
The function that I am trying to optimize is sample_one_word which is currently the bottleneck of my program.
Mainly I don’t understand why memory allocations happen when the program reads and writes a single element of a Vector, for example, see line 178, line 272, line 278, etc.
This program is about 2~3X slower than a C++ implementation and I am suspecting memory allocation is one main bottleneck.
This naively looks like a type stability issue. Can you update your code to generate random test-data first, so that we can copy-paste it into the REPL?
No, I didn’t precompile my code. The sample_one_word function is executed hundreds of thousands of times, but it’s JIT compiled just one. It would be pretty shocking that compiling one function allocates 480MB of memory, right?