and the Swift guy at a moment (about minute 56) said something like that no language would replace Python if one of the qualities of Python is not followed, concerning not having garbage collection, because that was important for performant GPU programming. The phrase seemed directed to Julia in the context.
I am completely ignorant about these themes, but is there any reason to believe that Julia is not suitable for high-performance GPU computing in any sense?
Not sure I understand his point. Both python and julia have garbage collection. They just go about it in different ways. I haven’t read the post so I cannot really say more.
The part of this that is true is that garbage collection on GPU really sucks. The Julia solution to that is to write code that only creates objects on the stack so GC isn’t needed.
The main distinction between Python and Julia here is that many people feel that Python’s reference counting mechanism makes it bit easier to regulate when GPU memory is freed than Julia’s GC.
The concept isn’t Julia specific (C also has something similar where malloc allocates to the heap, everything else goes to the stack). Julia currently stack allocates any immutable struct (or primitive type). That said, this behavior is 1.5 and newer. Julia 1.5 Highlights gives a very good analysis of the change.
That is an excellent article. I am curious about one statement:
“As a result of this work, arbitrary immutable objects—regardless of whether they have fields that reference mutable objects or not—can now be stack allocated”
Does that mean that an array that is part of an immutable struct is treated differently than an array not within a struct? If so, can this difference be important for performance? (Would it make sense to declare an struct containing a single array just because of that?)
No. When a struct T has an Array as a field, it is storing a pointer to that array. Prior to 1.5, this would mean that T had to be heap allocated. In 1.5, the Array is still heap allocated, but the struct is stack allocated. There is no difference for the Array, just for things that reference it.
On the other hand, it’s much harder to write Python code that doesn’t allocate in the first place, whereas Julia has lots of non-allocating in-place APIs.
It would be really helpful for this discussion if someone can outline all the differences between the approaches to garbage collection in Julia and Python and what are the tradeoffs, advantages, and disadvantages in each (including with regards to GPU)
Unfortunately, the implementation of this in CUDAnative.jl was not merged because (from what I’ve heard), the implementation relied on a GPU-to-CPU communication mechanism that was not performant enough for general usage.
Continual memory allocation/deallocation is the thing which causes the GC to work hard, so it’s the thing that needs to be tackled first. I think what will be necessary is for Flux to perform more efficient memory management, specifically trying to reuse previously-allocated buffers when possible. To my knowledge, Flux doesn’t worry about these things in the interest of simplicity (I could of course be wrong).