My insane garbage collector idea. Why wouldn't it work?

Tarny_GG_Channie · December 30, 2024, 9:57am

This probably wouldn’t work but I can’t figure out why…

Garbage collectors can either stop the world, which has the most throughput, but ruins real-time performance, or it could be concurrent, and there are many versions of that. However, they might rely on locking, write barriers, etc, which can slow down processing. This is needed to prevent all sorts of shenanigans of temporarily unreachable objects, race conditions, etc.

This was really disappointing to me. This could be a major problem in the Julia language. So, I decided to ponder up some ideas, and not knowing much about GC, this idea is probably stupid, but hey, I need to give this idea a try.

I propose a high-throughput, concurrent garbage collector based on mark-and-sweep. The first mechanism is probabilistic mark and sweep. Unlike traditional mark and sweep which relies on stopping the world, locking, or other synchronization to ensure the reference graph doesn’t change, the first preliminary round operates on an N-mark system. The system runs the marking process several times. Only objects seen as unreachable N times in a row get declared dead. Then, before the actual sweep, you enter a low-overhead period where each deletion of a pointer must be accompanied by adding the referenced object to be marked as well. During this time, the GC thread performs a final mark. If the GC found anything not marked or the pointer deletion found anything unmarked, then the GC aborts the process, and the probabilistic GC begins again.

This does need to assume a few things; for example, that the race condition only invalidates the written value but does not cause further undefined behaviors. This also would need a lot of probabilistic reasoning and so on. Also, programming while tolerating race conditions can be nightmarishly difficult. Finally, the idea probably wouldn’t work or need a lot of refinement due to some unknown reasons, but I want to put it out anyway.

Maybe those of you knowing more about GC will know why it wouldn’t work, but on the off chance that it could work, well, I’m happy to contribute something even if it’s just an idea.

Benny · December 30, 2024, 12:22pm

Being able to assume that the program and heap’s states are not being modified during the GC cycle, except by the GC itself, saves a lot of work. If you can’t assume that, like with incremental or concurrent GC, you must do more work, which lowers throughput.

This is more work, and repeated marking alone isn’t guaranteed to get things right after any number of times. If you want to take a chance on not freeing garbage, sure, we’ll find it eventually. But misidentifying an object as garbage, say the application happens to reassign the object elsewhere before the GC checks each reference, will break the program after deallocation. A known way to address that is to write deallocated objects to storage so they can be reallocated if needed; this of course is more work. More work can be offset by allowing work to be skipped elsewhere; generational GC often skips work on older generations after taking the effort to identify them. You can’t eyeball these tradeoffs, you have to profile real programs and implement a GC model to test hypotheses. Nobody can tell you if your hypothesis is justified or if your GC model would work.

Tarny_GG_Channie · December 30, 2024, 12:24pm

There is the verification round where objects misidentified, if found, will not be cleaned up and will instead make the GC start over. The repeated marking round has N rounds, depending on how many round is needed to get it probabilistically correct.

Benny · December 30, 2024, 12:29pm

I know, and I’m telling you a running application can keep moving an object around so it gets misidentified. You don’t provide a mechanism that makes the final verification marking round any safer than the N rounds, let alone open opportunities for saving work (increasing throughput).

Tarny_GG_Channie · December 30, 2024, 12:31pm

Here, the verification round marks any recently “deleted” object. This means that if an object is either reachable, or deleted (or pseudo-deleted or moved) recently, it would be found.

Benny · December 30, 2024, 12:38pm

That doesn’t address the possible dangers, so it’s just an assertion, not a mechanism, for memory safety. It’s not clear what you’re trying to say there because of the unusual use of terminology (deleting “pointers”, “pseudo-deleted”, marking a “referenced object” as well as a “recently deleted object”).

Tarny_GG_Channie · December 30, 2024, 12:41pm

Oh, I guess this does need some testing to see if the verification round would work correctly (I think so but not sure of all the dangers). Still, even if it works, multiple marking rounds could also eat up performance.

I’m in a love-hate relationship with comments that I need to test my ideas. In the past, people would disprove my ideas easily and I could move on to other ideas. Nowadays, they need testing. This meant my ideas are improving in soundness, but it also means I need to test things out if I want to know if my idea would work.

Topic		Replies	Views
Multithreaded GC? Internals & Design garbage-collection , realtime	18	2643	June 16, 2020
Running GC.gc() multiple times General Usage	3	1809	January 29, 2019
Request information about GC implementation Internals & Design question	7	1457	August 9, 2021
When does the garbage collector runs? New to Julia	2	7520	September 18, 2018
Poor performance of garbage collection in multi-threaded application Julia at Scale garbage-collection	22	5582	February 3, 2022

My insane garbage collector idea. Why wouldn't it work?

Related topics