Request information about GC implementation

KavinduZoysa · May 20, 2021, 3:15am

Hi All,

Could you please tell me, to implement the garbage collector do you use the LLVM GC framework support?

Also I saw in some documentations there is a method called GC.gc() to manually call the gc. Could you please point me to the source code of that method? (I tried to find using grep)

Thank you,
Kavindu

odow · May 20, 2021, 4:09am

julia> @which GC.gc()
gc() in Base.GC at gcutils.jl:94

github.com

JuliaLang/julia/blob/d93a1c867ba1ffa7f803c93b580ee9b53b3f3d00/base/gcutils.jl#L94


      
              GC.gc([full=true])
          
          
Perform garbage collection. The argument `full` determines the kind of
          collection: A full collection (default) sweeps all objects, which makes the
          next GC scan much slower, while an incremental collection may only sweep
          so-called young objects.
          
          
!!! warning
              Excessive use will likely lead to poor performance.
          """
          gc(full::Bool=true) =
              ccall(:jl_gc_collect, Cvoid, (Cint,), full ? GC_FULL : GC_INCREMENTAL)
          
          
"""
              GC.enable(on::Bool)
          
          
Control whether garbage collection is enabled using a boolean argument (`true` for enabled,
          `false` for disabled). Return previous GC state.
          
          
!!! warning
              Disabling garbage collection should be used only with caution, as it can cause memory

KavinduZoysa · May 20, 2021, 6:16am

The GC used in Julia is an accurate GC or a conservative GC?

foobar_lv2 · May 20, 2021, 7:16am

Accurate, generational, non-concurrent (i.e. stop-the-world), non-relocating. It does not use the llvm framework.

KavinduZoysa · May 20, 2021, 7:18am

How does Julia identify the heap references(which are going to mark(objects) in mark face) in the current stack?

foobar_lv2 · May 20, 2021, 10:00am

In the julia code this is called “gc rooting”. The general idea is that there is a second stack (“the shadow stack”) that contains gc roots, i.e. local object references that must be visible to the gc and cannot be proven otherwise visible. This is implemented via a mix of C-API (when you write C code that interfaces with the julia runtime), custom llvm passes, and code lowering.

To see this in action, consider

julia> function foo(r)
       @inbounds rr = r[1]
       push!(rr, 1)
       end;

Here, the push! will call into the runtime, and codegen/llvm has no understanding what this foreign call will actually do. Hence, we need to keep rr alive (r is kept alive by out caller – but it could be that the foreign function mutates r to remove the reference to rr and then calls the garbage collector).

julia> @code_llvm foo([[1]])
;  @ REPL[23]:1 within `foo'
define nonnull {}* @japi1_foo_309({}* %0, {}** %1, i32 %2) #0 {
top:
  %gcframe2 = alloca [3 x {}*], align 16
  %gcframe2.sub = getelementptr inbounds [3 x {}*], [3 x {}*]* %gcframe2, i64 0, i64 0
  %3 = bitcast [3 x {}*]* %gcframe2 to i8*
  call void @llvm.memset.p0i8.i32(i8* nonnull align 16 dereferenceable(24) %3, i8 0, i32 24, i1 false)
  %4 = alloca {}**, align 8
  store volatile {}** %1, {}*** %4, align 8
  %thread_ptr = call i8* asm "movq %fs:0, $0", "=r"() #6
  %ptls_i8 = getelementptr i8, i8* %thread_ptr, i64 -32768
  %5 = bitcast [3 x {}*]* %gcframe2 to i64*
  store i64 4, i64* %5, align 16
  %6 = getelementptr inbounds [3 x {}*], [3 x {}*]* %gcframe2, i64 0, i64 1
  %7 = bitcast i8* %ptls_i8 to i64*
  %8 = load i64, i64* %7, align 8
  %9 = bitcast {}** %6 to i64*
  store i64 %8, i64* %9, align 8
  %10 = bitcast i8* %ptls_i8 to {}***
  store {}** %gcframe2.sub, {}*** %10, align 8
  %11 = bitcast {}** %1 to {}****
  %12 = load {}***, {}**** %11, align 8
;  @ REPL[23]:2 within `foo'
; ┌ @ array.jl:801 within `getindex'
   %13 = load {}**, {}*** %12, align 8
   %14 = load atomic {}*, {}** %13 unordered, align 8
   %.not = icmp eq {}* %14, null
   br i1 %.not, label %fail, label %pass

fail:                                             ; preds = %top
   call void @jl_throw({}* inttoptr (i64 140358161122320 to {}*))
   unreachable

pass:                                             ; preds = %top
   %15 = getelementptr inbounds [3 x {}*], [3 x {}*]* %gcframe2, i64 0, i64 2
   store {}* %14, {}** %15, align 16
; └
;  @ REPL[23]:3 within `foo'
; ┌ @ array.jl:929 within `push!'
; │┌ @ array.jl:884 within `_growend!'
    call void inttoptr (i64 140358397421246 to void ({}*, i64)*)({}* nonnull %14, i64 1)
; └└
; ┌ @ array.jl:930 within `push!'
; │┌ @ abstractarray.jl:347 within `lastindex'
; ││┌ @ abstractarray.jl:312 within `eachindex'
; │││┌ @ abstractarray.jl:109 within `axes1'
; ││││┌ @ abstractarray.jl:89 within `axes'
; │││││┌ @ array.jl:133 within `size'
        %16 = bitcast {}* %14 to {}**
        %17 = getelementptr inbounds {}*, {}** %16, i64 3
        %18 = bitcast {}** %17 to i64*
        %19 = load i64, i64* %18, align 8
; │└└└└└
; │┌ @ array.jl:839 within `setindex!'
    %20 = add nsw i64 %19, -1
    %21 = bitcast {}* %14 to { i8*, i64, i16, i16, i32 }*
    %22 = getelementptr inbounds { i8*, i64, i16, i16, i32 }, { i8*, i64, i16, i16, i32 }* %21, i64 0, i32 1
    %23 = load i64, i64* %22, align 8
    %24 = icmp ult i64 %20, %23
    br i1 %24, label %idxend, label %oob

oob:                                              ; preds = %pass
    %25 = alloca i64, align 8
    store i64 %19, i64* %25, align 8
    call void @jl_bounds_error_ints({}* %14, i64* nonnull %25, i64 1)
    unreachable

idxend:                                           ; preds = %pass
    %26 = bitcast {}* %14 to i64**
    %27 = load i64*, i64** %26, align 8
    %28 = getelementptr inbounds i64, i64* %27, i64 %20
    store i64 1, i64* %28, align 8
    %29 = load i64, i64* %9, align 8
    store i64 %29, i64* %7, align 8
; └└
  ret {}* %14
}

The julia GC has no read barriers, but it needs a write barrier (to support the case where an old object is mutated to contain a reference to a young object):

julia> bar(a, b) = begin a[]=b; nothing; end;

julia> @code_llvm bar(Ref([1]), [1])
;  @ REPL[30]:1 within `bar'
define nonnull {}* @japi1_bar_498({}* %0, {}** %1, i32 %2) #0 {
top:
  %3 = alloca {}**, align 8
  store volatile {}** %1, {}*** %3, align 8
  %4 = load {}*, {}** %1, align 8
  %5 = getelementptr inbounds {}*, {}** %1, i64 1
  %6 = load {}*, {}** %5, align 8
; ┌ @ refvalue.jl:57 within `setindex!'
; │┌ @ Base.jl:34 within `setproperty!'
    %7 = bitcast {}* %4 to {}**
    store atomic {}* %6, {}** %7 unordered, align 8
    %8 = bitcast {}* %4 to i64*
    %9 = getelementptr inbounds i64, i64* %8, i64 -1
    %10 = load atomic i64, i64* %9 unordered, align 8
    %11 = and i64 %10, 3
    %12 = icmp eq i64 %11, 3
    br i1 %12, label %13, label %20

13:                                               ; preds = %top
    %14 = bitcast {}* %6 to i64*
    %15 = getelementptr inbounds i64, i64* %14, i64 -1
    %16 = load atomic i64, i64* %15 unordered, align 8
    %17 = and i64 %16, 1
    %18 = icmp eq i64 %17, 0
    br i1 %18, label %19, label %20

19:                                               ; preds = %13
    call void @jl_gc_queue_root({}* nonnull %4)
    br label %20

20:                                               ; preds = %19, %13, %top
; └└
  ret {}* inttoptr (i64 140358107785408 to {}*)
}

Some effort has been spent to improve loop hoisting of the write barrier, but I am in no position to properly evaluate how good we are with this.

There is a known performance pitfall with respect to multithreading and GC: GC can only be started at safepoints (allocations, foreign function calls, explicit safepoints). The shadow stack only needs to be in a correct state at safepoints (this helps performance of loops). Tight optimized loops contain no safepoints.

You see the problem: One thread enters a tight highly optimized long loop that doesn’t hit a safepoint for several seconds, until it has finished munching through gigabytes of computation; in the meantime all other threads stall, waiting for an opportunity where all threads hit a safepoint and GC can start.

edit: improved write barrier example

For future work, it would be super cool to switch to using stack unwinding instead, possibly hook into the register spilling logic, and generally get rid of safepoints (i.e. use an interrupt to stop the world). However, this is unlikely to happen in the next couple of years. Another interesting idea is what the JVM does: implement safepoints by a load instruction that traps if we are supposed to yield to GC (i.e. load from a global address; when we want to stop the world, mark the page as non-readable and wait until all threads have hit our fault handler). This has the advantage that safepoints become very cheap and can be sprinkled into a lot more places.

jzr · May 20, 2021, 9:31pm

Is one of the options similar to the Go gc implementation, designed to ensure consistently low latency?

KavinduZoysa · August 9, 2021, 7:42am

Can you point me the source code of generating this second stack and putting heap references on that stack?

Topic		Replies	Views
On the garbage collection General Usage question , garbage-collection	35	15215	March 17, 2020
Julia GC wiping memory despite use of GC.@preserve General Usage embedding , garbage-collection	0	561	March 12, 2021
Garbage collection of objects created in between gc_enable(0) and gc_enable(1) General Usage embedding	19	2263	August 8, 2018
Manually Trigger Mark Phase of GC General Usage	7	133	March 1, 2025
Details about Julia's Garbage Collector, Reference Counting? Internals	4	11839	April 27, 2021

Request information about GC implementation

Related topics