We should clarify that what @time reports is specifically heap allocations, which are typically needed for either mutable objects or for creating/growing variable-sized containers (such as Array or Dict, strings, or “type-unstable” objects whose type is only known at runtime). Allocating (or deallocating) such blocks of memory may require an expensive system call (e.g. via malloc in C), and they must be tracked for garbage collection. In contrast, immutable values like numbers (except bignums), tuples, and immutable structs can be stored much more cheaply, e.g. in stack or CPU-register memory, so one doesn’t typically worry about the performance cost of “allocating” them.
Unexpected memory allocation is almost always a sign of some problem with your code, usually a problem with type-stability or creating many small temporary arrays. Consequently, in addition to the allocation itself, it’s very likely that the code generated for your function is far from optimal. Take such indications seriously and follow the advice below.
ugh. I have read it, but perhaps I am even more confused now.
So, when we are worried about allocation we only means “allocation to the heap”, that is where an identifier is bound to a slow memory area whose content can be controlled by the programmer and in particular we can change its content by modifying the object bounded by the identifier.
Conversely when the identifier is bound to an immutable object, Julia “allocate” the memory, but on the fast “stack” area that we can not control and we don’t count it as an allocation, right ?
To actually understand all that one needs to go into the details of what are heap and stack memories (which physically are the same, by the way). I have only a partial understanding of that, actually.
I think the best intuitive image one can have is that of thinking of stack-allocated objects as just complicated numbers. One does not expect that this allocates, of course:
x = 0
since we are just modifying the content of what is in the position 1 of x. The thing is that numbers are not something special, they are just collections of bits. Thus, a complex object can be treated similarly, if it has a clearly defined memory layout:
x = A(0,0.0)
that doesn’t have any reason to allocate more or less than in the number case. We are just rewriting the bits of the memory associated to x.
However, if A was mutable, that means that we needed a way to access an instance of A independently, which means it has to have an address in the memory, and then it goes to the heap, where objects that need an address are organized. And that introduces an indirection level of reference to that object, and that object has to be individually allocated.
Not quite, no. From the perspective of performance and speed of memory access, stack & heap are equally fast. That’s not the issue.
What is an issue is that if a variable can be placed on the stack, there is no extra work that needs to be performed to clean up that variable and its memory when it is no longer in use and the function returns.
When we speak of an “allocation”, we mean asking the garbage collector for some memory, to store things in for longer than the current function runs. This “asking for memory” takes some time - usually a few microseconds - and means that the garbage collector has to keep track of which object is still in use, how likely it is to be used in the future and so on. In the same vain, when we no longer need it, the garbage collector has to figure out that we truly no longer need it - meaning that our code does not have any possible way to access that memory, at which point it will be given back to the OS.
So what’s slow is not the fact that it’s a piece of memory in some “slower” part of memory - all memory is equally fast - but the whole song & dance around managing that memory that’s slow. This is also not really related to whether a variable is mutable or not - there’s lots of cases where a mutable object can be allocated on the stack, if it doesn’t get used longer than the function it’s allocated in.
Generally speaking, immutable objects can have an easier time to end up on the stack (and thus don’t need to be managed by the GC explicitly), if they have a (to the compiler) fixed/known size, which is why they’re often recommended. There’s nothing making them inherently faster though.
One possible complication is that arrays of mutable objects generally have to be arrays of pointers, not arrays of the packed memory layout of the objects, because multiple positions of the array may reference to the same object, something that cannot happen with immutable ones. Thus the elements of the array (the objects) may become heap objects, and everything slows down.
(I’m not disagreeing with anything you said, @Sukera, just pointing that from performance perspective there any many cases where mutability and immutability get intertwined with how the memory is accessed, with performance implications).
Agreed, but those are a true consequence of the semantics of mutable elements in arrays, which is a bit different from performance loss due to excessive allocations, which is what’s usually a bottleneck long before these caching effects make a difference.
In particular, since the length of a String is only known at runtime, it must be heap-allocated (at least if it escapes local scope), so internally it is represented by an Array of bytes, and it is possible to mutate this if you dig deep enough, even though you’re not supposed to.
Another example of this are bignums (BigInt and BigFloat), which are accessed as if they were immutable, but their implementation generally requires a mutable heap-allocated data structure under the hood (since the number of digits is only known at runtime, and can be too large for stack allocation in any case).
(It’s possible that, in the future, some small arrays and strings may be stack-allocated if they are used only in a local scope: julia#43573.)
The conceptual model of the stack is that it’s a region that starts at some fixed place and goes continuously until the “top of stack pointer”. There is one stack per thread. To “allocate on the stack” you decide how big the new object is, and you just add that many bytes to the top of stack pointer. Voila the stack is n bytes bigger and the last n bytes are your new object.
Allocating on the heap however is far more involved. It’s similar to a filesystem on the disk… There are usually some “extents” in memory where the allocator can put stuff, and some data structures to keep track of which regions in these extents have allocated space. And there is a code path that allows for the creation of new extents. When memory gets full, the GC runs, marks all the regions that are still in use and then frees the regions that are not in use, if an entire extent is freed then that may be released to the OS etc.
Now contrast all that rigamarole with the process of adding n to a number (to extend the stack) and you see why if you can put stuff on the stack it’s faster. It’s the memory management that’s slow not the memory access.
I just found a video that, although based on Rust, the first 15 minutes are pretty general to any language, including Julia, and explains a lot the difference and the reasoning behind the stack and the heap. I’ll link it here if anyone is interested: Visualizing memory layout of Rust's data types - YouTube
I’m not sure, if this helps julia-people, who have not dug into other languages, much, but I sometimes find the discussion of heap vs. stack somewhat esoteric, even (not meaning to blame anyone for it, and not in this thread ), but…
stack and heap are (roughly speaking) abstractions - not of any particular piece of hardware, but rather algorithms / processes, dealing with memory. (I.e. there is no hardware, where one chip would be “the stack” and another one “the heap”).
It is nothing julia-specific, but every language needs to handle something like a stack (for quick, ad-hoc allocated memory, to be automatically freed, when some local scope is left) and a heap (for storing more long-term, not just “locally used” and / or dynamically growing or shrinking data-structures). Even when writing assembler, directly, there are registers (i.e. ESP = * stack-pointer, forgot what the e stands for), which have been named for this use-case, specifically (though they can be used for other things, also).
“Local” usage of memory, here, can mean different things, like being used only within a function / subroutine or within one thread, only (in case of multithreaded algorithms).
As many pointed out, already, heap-memory needs to be managed with much more overhead, including keeping track of different fragments of memory (which result from dynamically growing or shrinking datastructures), through a bunch of pointers, which need to be managed, in order to even keep data accessible. Also, in case of multithreading, synchronization of heap-memory has to happen, somehow, etc. (as heap-memory is accessible to all threads).
The details vary, across languages. Julia has the gc deal with the heap-memory-management (similar to java), with the advantage of automatically keeping track of allocations and making sure, memory is (eventually) freed, again, but at a bit of extra-cost in terms of performance. Non-gc’d languages (C or even assembler) need the user to take care of de-allocating / freeing heap-memory, with the adv. of better performance, but at the cost of memory-leaks, in case the developer doesn’t handle all cases, correctly, in terms of proper memory-management.
Heap allocations (not stack, they are cheap, thus not counted). Actually heap allocations are rather cheap too (and GC can also be rather efficient). I believe the allocations (or even GC) isn’t the most important factor in slowness. It’s just a symptom, indicating other problems type-instabilty, which are the cause of many problems. Heap allocations (malloc and free) aren’t that slow in e.g. C, and only avoided for real-time reasons, not performance reasons.
The slowness that can happen in Julia is likely fixable to some degree without forcing people to think about allocations, i.e. for code already written. They aren’t that much of a problem in GC-languages like C# (with its tiered compilation). Julia is faster then C# best case, but slower worst case, could be faster in all cases.
Reading this thread is really interesting in the difference between stack and heap allocations in any language. I understood why the stack allocation was much faster, and while the heap allocation was sometimes needed.
Could someone focus back on Julia and tell us what is really the “Allocation” counts returned by @btime in this framework ? Only heap allocations ?
Is there a logic which forces Julia to heap-allocate, or more interestingly for the programmer, a way of doing things that forces Julia to stack-allocates things ?
Julia stack-allocates when it can (it’s called escape analysis), I believe in all cases theoretically possible (though I recall some rare exceptions I didn’t really understand). There might be a style that allows it (or not) to happen.
And where you do (heap) allocate, there is a style of Julia programming to avoid that. See ! (ban) functions and similar.