@Oscar_Smith, @GeorgeGkountouras Immix GC (./julia) is confidently 33% faster than Julia 1.11 (with its default GC), for three threads, which is the optimal number for at least than program with (at least) Immix, on this worst-case outlier from Debian Benchmark Game, though only 11% faster than 1.10. Julia 1.11 (default GC) is 17% slower than that Julia 1.10 (default GC), i.e. Immix (1.12.0-DEV.1745) is 33% faster than 1.11.
Since the benchmark game uses four threads (all the cores), and I believe insists on the same config, -t 4
, for all programs, is there a way to opt into fewer at runtime? I know you can’t yet, ask for more at runtime (except by a hack, by calling C, and it adds…):
https://benchmarksgame-team.pages.debian.net/benchmarksgame/program/binarytrees-julia-3.html
~/MMTk/julia$ killall -SIGSTOP firefox-bin
hyperfine './julia -t 3 binarytrees.julia-3.julia 21'
Benchmark 1: ./julia -t 3 binarytrees.julia-3.julia 21
Time (mean ± σ): 7.488 s ± 0.203 s [User: 18.221 s, System: 0.438 s]
Range (min … max): 7.215 s … 7.781 s 10 runs
$ hyperfine 'julia -t 3 binarytrees.julia-3.julia 21'
Benchmark 1: julia -t 3 binarytrees.julia-3.julia 21
Time (mean ± σ): 8.877 s ± 0.501 s [User: 13.945 s, System: 3.693 s]
Range (min … max): 8.144 s … 9.498 s 10 runs
$ hyperfine 'julia +1.11 -t 3 binarytrees.julia-3.julia 21'
Benchmark 1: julia +1.11 -t 3 binarytrees.julia-3.julia 21
Time (mean ± σ): 10.254 s ± 0.306 s [User: 16.358 s, System: 3.016 s]
Range (min … max): 9.609 s … 10.585 s 10 runs
I was accidentally benchmarking a non-threaded program (for Distributed) before, and then Immix was is confidently slower than Julia 1.10's GC (defaults 1.11's GC is also slower):
~/MMTk/julia$ hyperfine './julia -t 16 ../../binarytrees.julia-4.julia 21'
Benchmark 1: ./julia -t 16 ../../binarytrees.julia-4.julia 21
Time (mean ± σ): 11.998 s ± 0.133 s [User: 28.054 s, System: 0.724 s]
Range (min … max): 11.777 s … 12.237 s 10 runs
$ hyperfine './julia -t 4 ../../binarytrees.julia-4.julia 21'
Benchmark 1: ./julia -t 4 ../../binarytrees.julia-4.julia 21
Time (mean ± σ): 11.586 s ± 0.120 s [User: 16.106 s, System: 0.355 s]
Range (min … max): 11.454 s … 11.804 s 10 runs
$ hyperfine './julia ../../binarytrees.julia-4.julia 21'
Benchmark 1: ./julia ../../binarytrees.julia-4.julia 21
⠼ Current estimate: 12.070 s ██████████████████████████████████████████████████████████████████████████████████████████████████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ ETA 00:00:49
^C
$ hyperfine 'julia ../../binarytrees.julia-4.julia 21'
Benchmark 1: julia ../../binarytrees.julia-4.julia 21
⠼ Current estimate: 10.995 s ██████████████████████████████████████████████████████████████████████████████████████████████████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ ETA 00:00:44
^C
Startup is a bit slower with Immix:
$ hyperfine './julia -e ""'
Benchmark 1: ./julia -e ""
Time (mean ± σ): 232.5 ms ± 17.5 ms [User: 309.9 ms, System: 82.7 ms]
Range (min … max): 203.5 ms … 248.3 ms 12 runs
$ hyperfine 'julia +1.11 -e ""'
Benchmark 1: julia +1.11 -e ""
Time (mean ± σ): 192.3 ms ± 14.5 ms [User: 281.3 ms, System: 65.8 ms]
Range (min … max): 166.9 ms … 210.2 ms 14 runs
$ hyperfine 'julia +1.10 -e ""'
Benchmark 1: julia +1.10 -e ""
Time (mean ± σ): 225.5 ms ± 18.5 ms [User: 207.2 ms, System: 141.6 ms]
Range (min … max): 198.3 ms … 247.1 ms 12 runs
$ ./julia
| | |_| | | | (_| | | Version 1.12.0-DEV.1745 (2024-12-09)
_/ |\__'_|_|_|\__'_| | upstream-ready/immix/4aeea613eb (fork: 27 commits, 8 days)
Side-note, while compiling I saw:
Precompiling packages ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━ 98/106
◓ Pkg -g2 -O3
◒ REPL -g2 -O3
◓ Pkg -g2 --check-bounds=yes -O3
◒ REPL -g2 --check-bounds=yes -O3
I believe we have duplicated pkgimages for that, and we could have just one for the most conservative, with --check-bounds=yes
at least for those that are not speed critical.
EDIT2: Compiling Immix as we speak, I unstuck the segfault problem, by changing:
$ (cd julia && git checkout dev && echo ‘MMTK_PLAN=Immix’ > Make.user)
to:
$ (cd julia && git checkout upstream-ready/immix && echo ‘MMTK_PLAN=Immix’ > Make.user) # upstream-ready/immix
[Some more info on how to compile in history for this post, since my experiments in getting to compile, and fixing the segfault in building Julia, are a distraction here.]
EDIT: if someone wants to help with building julia, it segfaults… building MMtk in previous step worked, but wasn’t used below, since you need to build MMTk AND then julia with it also from source:
I was hoping even faster with the new upcoming GC PR (it’s disabled here, since I wasn’t building from source), but it is 4.8% faster, though only compared to 1.11 (for min):
$ juliaup add pr56288
$ hyperfine 'julia +1.11 -t 4 binarytrees.julia-4.julia 21'
Benchmark 1: julia +1.11 -t 4 binarytrees.julia-4.julia 21
Time (mean ± σ): 12.591 s ± 0.304 s [User: 14.098 s, System: 0.530 s]
Range (min … max): 12.248 s … 13.252 s 10 runs
$ hyperfine 'julia +pr56288 -t 4 binarytrees.julia-4.julia 21'
Benchmark 1: julia +pr56288 -t 4 binarytrees.julia-4.julia 21
Time (mean ± σ): 12.059 s ± 0.246 s [User: 16.355 s, System: 0.559 s]
Range (min … max): 11.654 s … 12.539 s 10 runs
both are even regressions from 1.10, even when it’s single threaded:
$ hyperfine 'julia +1.10 binarytrees.julia-4.julia 21'
Benchmark 1: julia +1.10 binarytrees.julia-4.julia 21
Time (mean ± σ): 11.352 s ± 0.244 s [User: 11.430 s, System: 0.765 s]
Range (min … max): 11.068 s … 11.925 s 10 runs
$ hyperfine 'julia +1.10 -t 4 binarytrees.julia-4.julia 21'
Benchmark 1: julia +1.10 -t 4 binarytrees.julia-4.julia 21
⠏ Current estimate: 10.957 s █████████████████████████████████████████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ ETA 00:01:17
^C
I’m assuming Immix is enabled by default after compiling from source, while I did have to change this like (building Julia as is with it segfaulted):
(cd julia && git checkout dev && echo 'MMTK_PLAN=Immix' > Make.user) # or MMTK_PLAN=StickyImmix to use Sticky Immix
to:
(cd julia && git checkout upstream-ready/immix && echo 'MMTK_PLAN=Immix' > Make.user) # upstream-ready/immix
I’ve yet to try out StickyImmix, do you know the difference?
I’m looking into the new MMTk, and if I need to tune it somehow:
I’m built with the instructions from there (modified as explained above):
first:
$ sudo apt install cargo
For example, MMTk provides BumpPointer, which simply includes a cursor and a limit.In the following example, we embed one BumpPointer struct in the TLS.
Does Julia itself need to opt into something like: post_alloc in mmtk::memory_manager - Rust
I see in the code:
INLINE_FASTPATH_ALLOCATION
Something else to look into and might be related to what I’m proposing: