Multithreaded memory usage

Yes, and I believe this is just made for you in that case, just read its docs:

Ideally a sufficiently smart compiler would realize allocated temporaries that go away in the loop, and allocate them on the stack automatically for you, i.e. do what that package does in without needing any code changes. I suppose it could, in some cases, then the temporaries are non-huge as in your case. When they are too big, or not known at compile time, I guess the compiler couldn’t do this. [Does C++ (or any known compiler) do something like this automatically? I.e. without such a library that I’m sure is available there.]

[I know C++, i.e. in the C++ LLVM library, has a stack allocated array available, that if too big reverts to the heap. That would work, for C++, and since it has destructors they would even get deallocated from the heap eagerly. Since this array type isn’t in the C++ standard library (why not?), I’m not sure C++ compilers would do this for you without help.]

Ideally also the garbage collector wouldn’t let garbage pile up too high, and take physical RAM size into account. I think it actually does that, tries to avoid that ceiling, but it’s a global property, and only the OS knows how much memory is allocated across processes, and it difficult I think for any one process to guess or find out info about RAM usage of other process combined (that is constantly changing, so I think it’s not even tried, and even if possible would likely be slow, a costly syscall). Julia, or any process that has a GC, can know only its own use, track it exactly, and inexpensively. And I believe that’s done.

At least tracking allocations and deallocations for any one thread is very cheap and done. It could be done, and likely is for all threads separately. The aggregate total across threads, is a simple sum, but I’m not sure how fast it can be had, I guess locks involved. Besides, as I said RAM usage is a global property so this is an unsolvable problem I think to do perfectly if you produce garbage. Or at best you have some good heuristics.

Even worse, the heap-size-hint gets completely ignored

That’s strange, it might have to do with threads used. You could try running with just one to see if it’s respected then.

The GC wasn’t multi-threaded, but it now is, on 1.10-alpha1, so you could try with that; and maybe you need to fiddle with the --gcthreads=N,M option. The M part may not be in, maybe on master, but I don’t think it’s relevant. This is all quite new.

The example is oversimplified, in that you could rewrite it to work entirely in-place. This is not possible for the actual code.

I think you’re saying something like this will not work (but for others is imilar situation where it would work, it should help):