I profiled by my program by tracking user memory allocation. The program seems to incur memory allocation at places I don’t expect.
My program is pretty long so I have it updated to Github: https://github.com/jinliangwei/julia_mem/blob/master/serial_lda.jl
There is also the memory tracking output and a sample dataset.
The function that I am trying to optimize is
sample_one_word which is currently the bottleneck of my program.
This program is about 2~3X slower than a C++ implementation and I am suspecting memory allocation is one main bottleneck.