I profiled by my program by tracking user memory allocation. The program seems to incur memory allocation at places I don’t expect.
My program is pretty long so I have it updated to Github: https://github.com/jinliangwei/julia_mem/blob/master/serial_lda.jl
There is also the memory tracking output and a sample dataset.
The function that I am trying to optimize is
sample_one_word which is currently the bottleneck of my program.
Mainly I don’t understand why memory allocations happen when the program reads and writes a single element of a
Vector, for example, see line 178, line 272, line 278, etc.
This program is about 2~3X slower than a C++ implementation and I am suspecting memory allocation is one main bottleneck.
This naively looks like a type stability issue. Can you update your code to generate random test-data first, so that we can copy-paste it into the REPL?
Have you precompiled your code first? That line 178 seems like compilation allocation. Note that the allocation printed some times is off a line…
Thanks for your answer!
No, I didn’t precompile my code. The
sample_one_word function is executed hundreds of thousands of times, but it’s JIT compiled just one. It would be pretty shocking that compiling one function allocates 480MB of memory, right?
@code_warntype, I see that
sample_all_words is not inferred and
sample_one_word has an
Thank you! This is very helpful!