Why mem allocation may surge when function reads single scalar global variable?

Two functions:
pi_1( ) sums N terms in for loop; N is defined in global scope.
pi_2(nterms) same as pi_1( ) except that N is passed as a parameter.

Output (copied from notebook cells):

N = 10_000_000
# First run
@time pi_1()
    1.580675 seconds (80.00 M allocations: 1.416 GiB, 15.24% gc time)
    
3.1415925535897915


@btime pi_1()
    1.236 s (79998471 allocations: 1.42 GiB)
    
3.1415925535897915

#First run
@time pi_2(N)
  0.043618 seconds (3 allocations: 76.294 MiB, 2.37% gc time)

3.1415925535897915


@btime pi_2(N)

  27.620 ms (3 allocations: 76.29 MiB)

3.1415925535897915

Both functions and versioninfo() given below:

function pi_1()
    mysum = 0.0
    factor=1.0
    k=0
    
    for k=collect(0:N-1)
        factor = k % 2 == 0 ? 1.0 : -1.0
        mysum += factor/(1+(k+k))
    end
    4*mysum
end

function pi_2(nterms)
    mysum = 0.0
    factor=1.0
    k=0
    
    for k=collect(0:nterms-1)
        factor = k % 2 == 0 ? 1.0 : -1.0
        mysum += factor/(1+(k+k))
    end
    
    4*mysum
end
versioninfo()
Julia Version 1.10.8
Commit 4c16ff44be8 (2025-01-22 10:06 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 12 × 12th Gen Intel(R) Core(TM) i5-1235U
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, alderlake)
Threads: 1 default, 0 interactive, 1 GC (on 12 virtual cores)

From what I understand, the global variable ‘N’ is read only once by collect() inside function pi_1(). What may cause this difference in memory behaviour ?

With N being a (non-const) global variable, the compiler can make no assumptions about what type it will be and it means that it must generate code defensively that can handle any type via indirections. What is worse is that this cascades; it doesn’t know what type 0:N-1 will be, neither what collect(0:N-1) will be and thus it doesn’t know what type k is inside the loop. As a consequence factor and mysum, as well as all subexpressions inside the loop must be compiled to something that can handle any types of values. This works and is resolved at runtime, but at the cost of time and memory, which is why you see so many allocations when type instabilities occur inside a hot loop.

5 Likes

While in this case you should just not use a global (and an untyped one, especially), one can limit these “instability cascades” using a function barrier.

In this case, the function barrier is as simple as defining another function pi_3() = pi_2(N). Even though N is an unknown global, the function dispatcher takes a look at it inside of pi_3 and calls pi_2 with the type of that global. pi_2, as always, is compiled for the specific type of N so does not have internal instability from accessing the global (though pi_3 still might not know what it returns and that instability would continue to propagate in pi_3 and possibly out its return type, so the global can still cause type instability at higher levels). This use of “dynamic dispatch” costs a little bit up front, but calling the outer function with an unknown type can be an improvement when it avoids a runtime dispatch in every expression of the inner function.

1 Like

There are several things in this function which is a bit non-julian. The first thing is what’s commented on above, the use of the global N. If the global has not been defined with a concrete type, either by N::Int = 10_000_000 or const N = 10_000_000, the compiler does not know the type of any value derived from it. The compiler is run whenever a function is called with arguments of a type it hasn’t been called with before. But N may have changed to e.g. ("foobar", 2.13, 3+2im) the next time you call pi_1(). It won’t be compiled again, because the types of the arguments haven’t changed (there are none).

That is, for what the compiler knows 0:N-1 might be a Set of Strings, and collect(0:N-1) might be a network connection. So the compiler does not even know that k will have the same type for every iteration in the loop. It will insert code for dynamically dispatching every function used inside the loop, i.e. %, ==, /, and +. This takes time, and requires allocations.

In general, a julia function which is performance critical should be written so that the type of every expression in it can be deduced from the types of the actual arguments to the function. That is, by the compiler.

Your other function, where nterms is an argument does not have this problem. When you call it with some argument, e.g. pi_2(N), then N has a value with a type. In your case an Int. The compiler then knows that nterms is an Int, that nterms-1 is an Int, that 0:nterms-1 is a UnitRange{Int}, that collect(0:nterms-1) will be a Vector{Int}, that k is an Int, that k % 2 is an Int, and so on. It will insert very efficient code.

Another thing I find non-julian is the initializations factor=1.0 and k=0. They serve no purpose in this function. A variable in julia is not a “box” which is kept the same, but with replaced content, throughout its scope. It’s just a name for a value. It’s the value which has a type, so the factor=1.0 and k=0 do not fix the type of the variables, they are simply forgotten as soon as you assign new values to factor and k.

The last thing, which is unusual, is the use of collect(0:N-1). It allocates an array with content [0, 1, 2, ..., N-1] which k is iterated over. However, if you write for k = 0:N-1, you will still iterate over the same values, without the extra allocation. There are cases where you want such an array, e.g. if you want to delete some of the entries, but in the example it’s superfluous.

2 Likes

@GunnarFarneback, @mikmoore and @sgaure Thank you all for insightful replies.