Why mem allocation may surge when function reads single scalar global variable?

Two functions:
pi_1( ) sums N terms in for loop; N is defined in global scope.
pi_2(nterms) same as pi_1( ) except that N is passed as a parameter.

Output (copied from notebook cells):

N = 10_000_000
# First run
@time pi_1()
    1.580675 seconds (80.00 M allocations: 1.416 GiB, 15.24% gc time)
    
3.1415925535897915


@btime pi_1()
    1.236 s (79998471 allocations: 1.42 GiB)
    
3.1415925535897915

#First run
@time pi_2(N)
  0.043618 seconds (3 allocations: 76.294 MiB, 2.37% gc time)

3.1415925535897915


@btime pi_2(N)

  27.620 ms (3 allocations: 76.29 MiB)

3.1415925535897915

Both functions and versioninfo() given below:

function pi_1()
    mysum = 0.0
    factor=1.0
    k=0
    
    for k=collect(0:N-1)
        factor = k % 2 == 0 ? 1.0 : -1.0
        mysum += factor/(1+(k+k))
    end
    4*mysum
end

function pi_2(nterms)
    mysum = 0.0
    factor=1.0
    k=0
    
    for k=collect(0:nterms-1)
        factor = k % 2 == 0 ? 1.0 : -1.0
        mysum += factor/(1+(k+k))
    end
    
    4*mysum
end
versioninfo()
Julia Version 1.10.8
Commit 4c16ff44be8 (2025-01-22 10:06 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 12 × 12th Gen Intel(R) Core(TM) i5-1235U
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, alderlake)
Threads: 1 default, 0 interactive, 1 GC (on 12 virtual cores)

From what I understand, the global variable ‘N’ is read only once by collect() inside function pi_1(). What may cause this difference in memory behaviour ?

With N being a (non-const) global variable, the compiler can make no assumptions about what type it will be and it means that it must generate code defensively that can handle any type via indirections. What is worse is that this cascades; it doesn’t know what type 0:N-1 will be, neither what collect(0:N-1) will be and thus it doesn’t know what type k is inside the loop. As a consequence factor and mysum, as well as all subexpressions inside the loop must be compiled to something that can handle any types of values. This works and is resolved at runtime, but at the cost of time and memory, which is why you see so many allocations when type instabilities occur inside a hot loop.

4 Likes

While in this case you should just not use a global (and an untyped one, especially), one can limit these “instability cascades” using a function barrier.

In this case, the function barrier is as simple as defining another function pi_3() = pi_2(N). Even though N is an unknown global, the function dispatcher takes a look at it inside of pi_3 and calls pi_2 with the type of that global. pi_2, as always, is compiled for the specific type of N so does not have internal instability from accessing the global (though pi_3 still might not know what it returns and that instability would continue to propagate in pi_3 and possibly out its return type, so the global can still cause type instability at higher levels). This use of “dynamic dispatch” costs a little bit up front, but calling the outer function with an unknown type can be an improvement when it avoids a runtime dispatch in every expression of the inner function.