Loops, allocations and helping the coder

A variant of this idea has been implemented as a stdlib in .NET. See ArrayPool<T> Class (System.Buffers) | Microsoft Learn for the API and Pooling large arrays with ArrayPool – Adam Sitnik – .NET Performance and Reliability for an explainer.

3 Likes

This is a particular pain I keep running into.

4 Likes

@Mason at the first glance, Bumper.jl reminds me of arena allocators - are they conceptually related or even equivalent?

Thanks for taking part in this thread. I learned some news things, but I also think the discussion turned rather technical and somehow disregarded the usage patterns of less CS-savvy Julia users.

If I may, I would like to get back to the original issue. Background: (a) allocations cause performance problems; (b) many (most?) Julia users want to write simple, but yet reasonably performant, code.

If the “compiler” could help with that, I believe it would be a great selling point for Julia.

I currently hear students and colleagues saying that it’s easier to get performance (from simple code) in other languages and that Julia is “hard” to squeeze out performance from. It would be nice to prove them otherwise.

4 Likes

The issue is that the compiler, in Julia, sometimes does not have some information that the compiler in other (compiled) languages, has. For instance, a simple fortran code like this:

! main calls foo 100 times
program main
    double precision :: x
    do i = 1, 100
        call foo(x)
    end do
    print *, x
end program main

! foo requires a y array of 100 elements
subroutine foo(x) 
    integer :: i
    double precision :: x, y(100)
    do i = 1, 100
        y(i) = 1.0
        x = x + y(i)
    end do
end subroutine foo

The function foo is called within a loop, and inside it it requires a vector y of length 100. When this code is compiled, as a whole, a single instance of y is created and will be reused. That can happen because the compiler knows in advance how foo is called from within main.

In Julia a similar result requires allocating the array y outside foo, and either having it as a global constant buffer, or passing it always as a parameter, because the compiler does not have the information, in advance, on how foo will be used every time, so that it cannot reuse the buffer on every call to foo.

These are the cases where it seems simpler to obtain a performant code in other language than in Julia.

In this specific cases I think it is fair to say that other languages, like fortran, are easier to tune for performance.

But… this is very limited. Most times, even a compiled language like fortran, you have to resize the auxiliary arrays depending on the input. In these cases, we need to deal with global buffers (through shared modules, for instance in Fortran) or pass explicitly the auxiliary arrays as parameters to the inner functions. Just like in Julia. Since the compiler cannot help much even in static languages, I doubt much can be done in Julia to improve on this (except improving escape analysis, but this won’t solve the issue for large buffers if they have to be recreated on every function call).

If you have concurrent access to buffers, like the use of the Channels above, at least in Fortran your life would be much harder than in Julia, but then I guess we are talking of of more advanced features.

4 Likes

Arena allocator is another name for a bump allocator (the namesake of Bumper.jl) so yeah it’s the exact same concept.

2 Likes

Julia’s compiler is actually somewhat uniquely bad at helping with this sort of thing currently, and that’s why arrays need to be dealt with more consciously than in many other languages. However, there is definitely work happening right now in the compiler infrastructure that will greatly help with this use case. E.g. https://github.com/JuliaLang/julia/pull/51319 will give some modest speedups to intermediate array allocation, but will also be an important step towards us being able to teach the compiler how to better understand and optimize array allocations (like what we do with normal mutable structs)

9 Likes