Do Garbage Collectors necessarily make code slow?

A more serious question, while you’re around, is to ask you why C# is generally reckoned to be slower than C++, “because it has garbage collection”. In another thread you opined that garbage collectors don’t necessarily make code slow.

Thoughts, please.

2 posts were merged into an existing topic: Standalone “Hello World” Executable size comparison to Node.js

I have no idea. If you’re serious about wanting to know, I would try seeing if the premise is true or not: how does the speed of an identical program in C versus C# compare? There’s enough syntactic overlap to write many benchmarks in both languages with basically the same code.

Prefacing my comment, garbage collection (GC) performance should probably be split off to its own thread, possibly tagged off-topic if it’s not about Julia’s implementation specifically.

There’s really no definitive opinion on whether manual malloc/free or GC is faster, there are many confounding factors. Visually similar code in different languages can be compiled differently, and that skews comparisons with one language using more efficient instructions or allocating less for the same results (on cursory readings, true for C++ > C#). Garbage collection strategies also vary wildly and can’t really be lumped onto one side. A truly controlled comparison would be using the same language and compiler; Kareen and Blackburn wrote a review/experiment in 2022, but in all honesty the details go over my head.

This thread offers a variety of opinions and examples. One comments on a 1986 study by Andrew Appel demonstrating that a copying GC could be faster than stack allocation, basically it’s overall cheaper to copy reachable data than explicitly free unreachable data when physical memory is >7x the reachable data. Bear in mind that study does say that GCed languages are less efficient, and this predated many GC developments like the generational hypothesis.

The only clear thing, and something Julia users know well: reduce garbage.

4 Likes

I think this is actually the main difference between GC and non-GC languages: languages with GC tend to make it easy to allocate a lot and often, and, as in the case of Java (and Python), some even make it impossible not to allocate objects in many situations. Non-GC languages, on the other hand, almost always offer APIs that allow you to avoid allocation as much as possible—because dealing with dynamically allocated memory is a burden on the user, so letting them avoid it is natural. When the user doesn’t have to explicitly deal with allocation, it’s easier to think of it as “free” and do a lot of it “behind their back”.

That’s why I think the C# question is kind of a gotcha—it all depends on the kind of code you’re talking about. If you write C# in the style of C++ then I’d be surprised if the performance is substantially different. Even if you do a bit of dynamic allocation which is manually cleaned up in C++ and automatically collected in C#, it’s probably not going to matter and the GC will be just as efficient as the malloc library, maybe better. You can also write code in either language that does a ton of dynamic allocation and relies on the GC in C# and smart pointers in C++ to clean up. In that case I bet C# is not only easier to write but faster too—ref counting has a lot of overhead (it has more small pauses though). But this is artificially restricting to comparisons where you write similar code in C# and C++. The reason people think of C# as slower is because they don’t write the same kind of code in the two languages. In C# they use the features that rely on the GC because they’re really convenient, which puts more pressure on the GC than if you posted C++ code over directly, which in turn makes C# seem slower.

So where does Julia fall here? Julia does let the user write a lot of allocation free code. What tools does it provide for this? One big one is the ability to define user data types that are immutable and therefore value types, which the compiler can much more easily avoid heap allocating. Java doesn’t allow this; C# does but not if you want to use objects, dispatch, pass by sharing, etc. Another tool Julia provides is a lot of APIs for mutation and in-place non-allocating operations. This is less of a fundamental language design matter, and more of a commitment to offering these kinds of APIs. Between these two things, it’s very possible to write a lot of Julia code that doesn’t allocate at all or only a little. And note that this isn’t an all or nothing thing: code that does a bit of dynamic allocation will generally be fast without taxing the GC too much.

Given that, why do we continue to work on the GC so much? Because it can always be better and one area that our GC is still not good is keeping up with multithreaded code that generates garbage. We’re still really bad at this, and as a result of you want to write effective multithreaded code, you really need to avoid allocating. We need to lift that restriction—it should be fine to allocate some in multithreaded code.

16 Likes

I think you mean that as a pro of ref-counting (non-GC languages). But I believe it can mean arbitrarily long pauses (for a thread at least), think of a linked list. While a GC can be real-time, in same situation (even all I believe). When you drop the list, the GC could delete all the links with same con, just a better implemented GC would not. The scanning would still be a bit of overhead, but can also be amortized (and isn’t too much of a problem).

Also in the context of this discussion, a GC is a small code overhead, a tiny part of the Julia runtime (LLVM the largest, that can be dropped, often).

It’s the opposite: ref counting is low latency, high throughout overhead.

1 Like

Ref counting can actually have pretty high worst case latency if you aren’t very careful about the implementation (since you naively can free arbitrarily many objects at the same time). Average latency is much better than GC though.

5 Likes

There is a “downside” though, Julia is in a rare position of being one of the GCed languages that abstract away the pointers/stack/heap yet be a language where users have an indirect control by understanding the version-specific compiler. Repeatedly I see people struggle with the semantics of variable binding and instance mutability, especially if they didn’t come from Python. It’s easy to mistake a one-to-one correspondence between mutability and heap allocation.

I think there’s still work to do on communicating exactly what to do, even if it changes by version. 1) People make mutable types just to easily reassign fields stuff.x += 1, when they had no intention of having multiple variables pointing to the same data and thus could use Accessors.jl; I’m not even sure when exactly the compiler can optimize that to field data changes like variable reassignments are. 2) The compiler has very recently improved to move things from the heap to the stack, but I’m not sure when exactly that happens (I know that fixed size and not escaping its construction’s scope is important). 3) I’ve seen someone try to aggressively avoid mutables to avoid heap allocations, but they are reassigning them to multiple variables one by one, which was awkward and unnecessarily copies data. People did point out they could mutate an extra argument in-place, but we didn’t really have a good answer for “resetting” the instance easily, like some standard zero!.

I don’t think that’s a unique position. Every compiler tries to eliminate allocations and optimize things away. That’s true of Java, C#, C and C++. When you upgrade the compiler (or even change options), you sometimes get free performance. On the other hand, sometimes something that used to be fast regresses. The only languages with truly predictable performance are the ones with predictably bad performance. I have never seen an optimizing compiler that comes with clear rules for when it can and can’t optimize something away. Probably because optimizing compilers are a zany pile of sketchy heuristics. There’s kind of a rough mental model you get after using a tool for a bit, but the rules are as complex as the implementation.

5 Likes

I would say that you also get more predictable performance if the language allows you to be more expressive about describing the desired optimizations to the compiler. And low-level, explicit operations are generally more predictable than high-level operations.

For example, Julia’s dot-call loop fusion f.(g.(x)) is more expressive and hence more predictable than relying completely upon the compiler to fuse loops from vectorized functions (e.g. Fortran, Numba, …). Or, at a lower level, explicit SIMD instructions ala SIMD.jl are more predictable than auto-SIMD ala @simd.

8 Likes

imo explicit could be clearer for very small quick programs or specific hardware, but it gets miserable for general use. I’d almost always prefer an optimizing compiler to have a say on what goes on the stack or heap or what is copied or referenced. I’ve heard arguments that abstractions mean I have to learn rules anyway and with less guarantees, but it really cuts down on mental overhead and bugs.

Despite the lack of guarantees or stability, I still think it’s worth figuring out and communicating the implementation. Stable language design does have a large effect and can give good rules of thumb e.g. mutability is really about allowing variables to share data hence the pointing and tendency to heap-allocate for safety. And on the other hand, implementation does affect what code we write. v1.5 made a whole package UnsafeArrays.jl much less necessary. We often care about if something isbits.

1 Like

The compiler knows if it’s accessing the heap. Or more importantly if it’s allocating there. In D you can mark a function with @nogc and then that function will never trigger GC (meaning it can also only call such marked functions or code the compiler proves is safe), likely meaning no allocation on the heap will happen (the only thing that triggers the GC in Julia, and I presume in D). I would want that option in Julia, and while most code doesn’t need this, some functions really do, many or most on real-time embedded.

In Julia ! of func!(mutable_var) means you’re mutating (what’s already on the heap). While it’s really a hint, not meaningful, nor its absence, I think such functions might likely implicitly be @nogc. Adding such a macro/marking, and having it implicit for those would though be breaking… but good for 2.0?

A problem with that approach is that a method’s behavior depends on its arguments’ types e.g. zero(big"2.0") allocates but zero(2.0) does not. Function call signatures could be marked, but at that point it would seem safer to actually compile the program and find where heap allocations are happening, maybe a PrecompileTools-like routine.

I actually created a github discussion yesterday unaware of this thread entirely. That discussion is about directly allocating memory but the potential to do so in a user friendly way that doesn’t bomb performance seems to be complicated by the garbage collector. Some of the weird corner cases where there are suddenly lots of allocations could potentially be mediated using specialized implementations of things (such as big ints), but we’d need to find a way to give users a stable interface to this sort of thing first.

1 Like

I see your point for generic methods (the default in Julia), either you would fully type them, or possibly restrict to some type like Integer, that only allow immutable types. A problem would be that even if hat holds true now, I could make up a new subtype and it would no longer hold. I guess those methods marked @nogc would just have to take immutable types, and disallow other types at runtime…

Yes and no, the former is same as:

temp = big"2.0"  # Yes, this would allocate, but in your function, i.e. the caller (and at compile time?)
zero(temp)  # This function allocates currently, but doesn't need to, and for an arbitrary function might not need to

If zero gave you a mutable type back you might in all cases give out the same big"0.0", no allocation, but yes, currently it isn’t, and even if it gave you mutables, then this might save things:

temp = big"2.0"  # would it need to be temp=BigInt(2.0)?
temp = zero(temp)

Usually you want the result back in some variable, the same one as there, or not, and even if not the compiler could allocate that container for you at the caller site. Not sure, maybe you would need to inline…

One thing that is very nice is that if you have a piece of code that is GC bound and isn’t super complicated, please open PRs to GitHub - JuliaCI/GCBenchmarks. In order to make the GC better we need to have lots of information into what use cases it performs badly.

6 Likes

The Julia compiler has been doing this for years.

Broadcasting is a great example for relying on zany compiler heuristics, however, because it relies on loop unswitching, i.e. hoisting branches out of loops and making multiple versions, so that

for i = ...
    A[size(A,...) == 1 ? 1 : i]
end

becomes

if size(A,...) == 1
    for i = ...
        A[1]
    end
else
    for i = ...
        A[i]
    end
end

Given N arrays in a broadcast statement, we need 2^N versions of the loop…
FastBroadcast.jl gets around this by special casting the case where all sizes are equal, so that no array is actually broadcast. Thus, a fast loop for this case is always generated.

LLVM doesn’t always generate the fast SIMD loop the user wanted for a broadcast.

Broadcasting is nice syntax, but if someone wants consistent, reliable, performance, they’re best off writing a loop.

In the context of the discussion on not relying on memory optimizations, broadcasting does syntactically avoid the temporaries, so it’s an example of that.

As an aside, C++ does have mandatory copy and move elision in a small number of circumstances.

Most C++ objects will have RAII, or if you want smart pointers you can use std::unique_ptr which has RAII behavior. Reference counting should be uncommon.

Unless you view RAII as a special case of reference counting with 0 and 1 being the only legal counts.
I’d say they’re distinct; for one thing, you don’t maintain any counts with RAII; it is implicitly always “1”, so instead of decrementing and checking, you just free.

4 Likes

I got the impression that in the last few years the compiler became more able to do heap-to-stack in more situations, but I could be wrong, I don’t really know how to explore that myself.