Is garbage collection bad for gpu programming?

lmiq · October 24, 2020, 4:04pm

I was hearing this podcast:

and the Swift guy at a moment (about minute 56) said something like that no language would replace Python if one of the qualities of Python is not followed, concerning not having garbage collection, because that was important for performant GPU programming. The phrase seemed directed to Julia in the context.

I am completely ignorant about these themes, but is there any reason to believe that Julia is not suitable for high-performance GPU computing in any sense?

DoktorMike · October 24, 2020, 5:32pm

Not sure I understand his point. Both python and julia have garbage collection. They just go about it in different ways. I haven’t read the post so I cannot really say more.

Oscar_Smith · October 24, 2020, 6:11pm

The part of this that is true is that garbage collection on GPU really sucks. The Julia solution to that is to write code that only creates objects on the stack so GC isn’t needed.

johnmyleswhite · October 24, 2020, 7:35pm

The main distinction between Python and Julia here is that many people feel that Python’s reference counting mechanism makes it bit easier to regulate when GPU memory is freed than Julia’s GC.

lmiq · October 24, 2020, 10:48pm

Does this mean something very specific in Julia? (As using static arrays, or tuples, etc?)

Oscar_Smith · October 24, 2020, 10:56pm

The concept isn’t Julia specific (C also has something similar where malloc allocates to the heap, everything else goes to the stack). Julia currently stack allocates any immutable struct (or primitive type). That said, this behavior is 1.5 and newer. Julia 1.5 Highlights gives a very good analysis of the change.

lmiq · October 25, 2020, 12:56am

That is an excellent article. I am curious about one statement:

“As a result of this work, arbitrary immutable objects—regardless of whether they have fields that reference mutable objects or not—can now be stack allocated”

Does that mean that an array that is part of an immutable struct is treated differently than an array not within a struct? If so, can this difference be important for performance? (Would it make sense to declare an struct containing a single array just because of that?)

Oscar_Smith · October 25, 2020, 12:59am

No. When a struct T has an Array as a field, it is storing a pointer to that array. Prior to 1.5, this would mean that T had to be heap allocated. In 1.5, the Array is still heap allocated, but the struct is stack allocated. There is no difference for the Array, just for things that reference it.

jonathan-laurent · October 25, 2020, 1:40am

I would be interested in hearing more about this. Is this a practical solution in most cases? Isn’t Flux still allocating a whole lot for example?

yuyichao · October 25, 2020, 2:46pm

No on both versions the struct can be either heap or stack allocated. On 1.5 the struct will be inlined.

StefanKarpinski · October 25, 2020, 3:58pm

On the other hand, it’s much harder to write Python code that doesn’t allocate in the first place, whereas Julia has lots of non-allocating in-place APIs.

johnmyleswhite · October 25, 2020, 4:23pm

True, but I don’t think that matters as much for something like PyTorch, which I don’t think aspires to be a general purpose GPU language.

Azamat · October 25, 2020, 7:43pm

It would be really helpful for this discussion if someone can outline all the differences between the approaches to garbage collection in Julia and Python and what are the tradeoffs, advantages, and disadvantages in each (including with regards to GPU)

lmiq · October 25, 2020, 8:05pm

These texts seem nice:

Garbage Collection (.NET) vs. ARC (Swift) - Part 2.

Akatz · October 25, 2020, 9:05pm

See this Julia github issue for a related discussion

According to Jeff and Keno, compile time memory management seems to be a promising alternative.

jpsamaroo · October 25, 2020, 11:34pm

Unfortunately, the implementation of this in CUDAnative.jl was not merged because (from what I’ve heard), the implementation relied on a GPU-to-CPU communication mechanism that was not performant enough for general usage.

Continual memory allocation/deallocation is the thing which causes the GC to work hard, so it’s the thing that needs to be tackled first. I think what will be necessary is for Flux to perform more efficient memory management, specifically trying to reuse previously-allocated buffers when possible. To my knowledge, Flux doesn’t worry about these things in the interest of simplicity (I could of course be wrong).

Topic		Replies	Views
Using GPU via PyCall causes non-reusable memory allocation GPU question , pycall , gpu , pytorch , garbage-collection	4	1135	February 16, 2021
What exactly is "allocation" in Julia? Performance question , memory-allocation	45	6093	November 4, 2022
Request information about GC implementation Internals & Design question	7	1425	August 9, 2021
Details about Julia's Garbage Collector, Reference Counting? Internals & Design	4	11876	April 27, 2021
Is my understanding of Julia correct? New to Julia question	38	4382	March 8, 2022

Is garbage collection bad for gpu programming?

Related topics