Loops, allocations and helping the coder

ToucheSir · October 10, 2023, 2:34pm

A variant of this idea has been implemented as a stdlib in .NET. See ArrayPool<T> Class (System.Buffers) | Microsoft Learn for the API and Pooling large arrays with ArrayPool – Adam Sitnik – .NET Performance and Reliability for an explainer.

BioTurboNick · October 10, 2023, 3:57pm

This is a particular pain I keep running into.

maxfreu · October 10, 2023, 4:08pm

@Mason at the first glance, Bumper.jl reminds me of arena allocators - are they conceptually related or even equivalent?

Paul_Soderlind · October 10, 2023, 7:29pm

Thanks for taking part in this thread. I learned some news things, but I also think the discussion turned rather technical and somehow disregarded the usage patterns of less CS-savvy Julia users.

If I may, I would like to get back to the original issue. Background: (a) allocations cause performance problems; (b) many (most?) Julia users want to write simple, but yet reasonably performant, code.

If the “compiler” could help with that, I believe it would be a great selling point for Julia.

I currently hear students and colleagues saying that it’s easier to get performance (from simple code) in other languages and that Julia is “hard” to squeeze out performance from. It would be nice to prove them otherwise.

lmiq · October 10, 2023, 8:25pm

The issue is that the compiler, in Julia, sometimes does not have some information that the compiler in other (compiled) languages, has. For instance, a simple fortran code like this:

! main calls foo 100 times
program main
    double precision :: x
    do i = 1, 100
        call foo(x)
    end do
    print *, x
end program main

! foo requires a y array of 100 elements
subroutine foo(x) 
    integer :: i
    double precision :: x, y(100)
    do i = 1, 100
        y(i) = 1.0
        x = x + y(i)
    end do
end subroutine foo

The function foo is called within a loop, and inside it it requires a vector y of length 100. When this code is compiled, as a whole, a single instance of y is created and will be reused. That can happen because the compiler knows in advance how foo is called from within main.

In Julia a similar result requires allocating the array y outside foo, and either having it as a global constant buffer, or passing it always as a parameter, because the compiler does not have the information, in advance, on how foo will be used every time, so that it cannot reuse the buffer on every call to foo.

These are the cases where it seems simpler to obtain a performant code in other language than in Julia.

In this specific cases I think it is fair to say that other languages, like fortran, are easier to tune for performance.

But… this is very limited. Most times, even a compiled language like fortran, you have to resize the auxiliary arrays depending on the input. In these cases, we need to deal with global buffers (through shared modules, for instance in Fortran) or pass explicitly the auxiliary arrays as parameters to the inner functions. Just like in Julia. Since the compiler cannot help much even in static languages, I doubt much can be done in Julia to improve on this (except improving escape analysis, but this won’t solve the issue for large buffers if they have to be recreated on every function call).

If you have concurrent access to buffers, like the use of the Channels above, at least in Fortran your life would be much harder than in Julia, but then I guess we are talking of of more advanced features.

Mason · October 10, 2023, 9:04pm

Arena allocator is another name for a bump allocator (the namesake of Bumper.jl) so yeah it’s the exact same concept.

Mason · October 10, 2023, 9:10pm

Julia’s compiler is actually somewhat uniquely bad at helping with this sort of thing currently, and that’s why arrays need to be dealt with more consciously than in many other languages. However, there is definitely work happening right now in the compiler infrastructure that will greatly help with this use case. E.g. https://github.com/JuliaLang/julia/pull/51319 will give some modest speedups to intermediate array allocation, but will also be an important step towards us being able to teach the compiler how to better understand and optimize array allocations (like what we do with normal mutable structs)

Topic		Replies	Views
Why is Julia's performance more sensitive to memory allocations than Numpy's? Performance	19	1083	September 5, 2024
Non-escaping array buffer re-use and the sufficiently smart compiler Internals & Design	11	948	February 21, 2021
Common allocation mistakes Performance memory-allocation	47	7174	August 21, 2023
Reducing allocations to optimize (to match Java speed) Performance	22	1341	December 10, 2018
Increase in allocations with Julia v1.11-beta Internals & Design performance , arrays	46	3167	May 8, 2024

Loops, allocations and helping the coder

Related topics