Is garbage collection bad for gpu programming?

I was hearing this podcast:

and the Swift guy at a moment (about minute 56) said something like that no language would replace Python if one of the qualities of Python is not followed, concerning not having garbage collection, because that was important for performant GPU programming. The phrase seemed directed to Julia in the context.

I am completely ignorant about these themes, but is there any reason to believe that Julia is not suitable for high-performance GPU computing in any sense?


Not sure I understand his point. Both python and julia have garbage collection. They just go about it in different ways. I haven’t read the post so I cannot really say more.


The part of this that is true is that garbage collection on GPU really sucks. The Julia solution to that is to write code that only creates objects on the stack so GC isn’t needed.


The main distinction between Python and Julia here is that many people feel that Python’s reference counting mechanism makes it bit easier to regulate when GPU memory is freed than Julia’s GC.


Does this mean something very specific in Julia? (As using static arrays, or tuples, etc?)

The concept isn’t Julia specific (C also has something similar where malloc allocates to the heap, everything else goes to the stack). Julia currently stack allocates any immutable struct (or primitive type). That said, this behavior is 1.5 and newer. gives a very good analysis of the change.


That is an excellent article. I am curious about one statement:

“As a result of this work, arbitrary immutable objects—regardless of whether they have fields that reference mutable objects or not—can now be stack allocated”

Does that mean that an array that is part of an immutable struct is treated differently than an array not within a struct? If so, can this difference be important for performance? (Would it make sense to declare an struct containing a single array just because of that?)

No. When a struct T has an Array as a field, it is storing a pointer to that array. Prior to 1.5, this would mean that T had to be heap allocated. In 1.5, the Array is still heap allocated, but the struct is stack allocated. There is no difference for the Array, just for things that reference it.


I would be interested in hearing more about this. Is this a practical solution in most cases? Isn’t Flux still allocating a whole lot for example?

1 Like

No on both versions the struct can be either heap or stack allocated. On 1.5 the struct will be inlined.


On the other hand, it’s much harder to write Python code that doesn’t allocate in the first place, whereas Julia has lots of non-allocating in-place APIs.


True, but I don’t think that matters as much for something like PyTorch, which I don’t think aspires to be a general purpose GPU language.


It would be really helpful for this discussion if someone can outline all the differences between the approaches to garbage collection in Julia and Python and what are the tradeoffs, advantages, and disadvantages in each (including with regards to GPU)


These texts seem nice:,part%20of%20your%20application%20code.


See this Julia github issue for a related discussion

According to Jeff and Keno, compile time memory management seems to be a promising alternative.


Unfortunately, the implementation of this in CUDAnative.jl was not merged because (from what I’ve heard), the implementation relied on a GPU-to-CPU communication mechanism that was not performant enough for general usage.

Continual memory allocation/deallocation is the thing which causes the GC to work hard, so it’s the thing that needs to be tackled first. I think what will be necessary is for Flux to perform more efficient memory management, specifically trying to reuse previously-allocated buffers when possible. To my knowledge, Flux doesn’t worry about these things in the interest of simplicity (I could of course be wrong).