I’m new to Julia. I ran my code and timed it using @time. The results show that gc time is 67.2%. From what I’ve seen in other threads, this is a high figure and is a drag on performance. However, it is unclear to me (even after reading the documentation) what garbage collection is, what the figure 67.2% means, and therefore what I might do to reduce it.
Not an actual expert on this, but garbage collection is basically freeing memory that you used in your calculation and is not required anymore. It can be a sign that you are overly reliant on creating lots of temporary arrays unnecessarily, which have to be allocated and freed, slowing down your code.
Garbage collection is how memory is recovered from your program. It is a form of automatic memory management: languages like python, r and Java also have automatic memory management. Without it you need to make explicit calls in your code to tell the operating system you are done using memory. Languages like c and c++ do not have automatic memory management. You can read more here.
If your program is spending a lot of time on garbage collection that often means you are allocating a lot of memory; e.g. creating lots of large arrays. Sometimes you can improve it by using mutable commands (e.g. sort! instead of sort).
Note that @time may not be accurate and you should probably use @btime from BenchmarkTools
It’s hard to give you any more specific advice without seeing the specific code: that’s the advantage of a MWE.
To be more specific 62% of the time the program was looking for blocks of memory that your program no longer needed and making them available for new data you were allocating. Garbage collection can take a long time if it has to rearrange blocks of memory or search through a lot of memory blocks, for example.
Maybe you can simplify the program to the point that it is small enough, yet it still displays a large amount of GC. This much garbage collection is somewhat unusual for idiomatic Julia code, but it is difficult to provide help without code to run.
If you are looking for an explanation of what GC is, the Wikipedia article maybe a good starting point:
I had a method that when timed with TimeOutputs.jl gave a summed allocation of some GiB. This did not make sense to me as the method should deal with very small arrays that were just frequently rearranged. Then I discovered that permute! and sort! (using the keyword by) both allocated, and my code was an heuristic (that I had to reproduce, i.e., couldn’t just change the algorithm) which run at least one million iterations each one calling a small constant number of permutes and one sort. I had to reimplement such methods by hand for them to take a buffer vector, and after this my summed allocation ended up as some KiB, and the total time as 15~20% of the original time. So, it can depend on you having a very tight and long loop in which you call Base methods that allocate.