Trying to understand memory usage

BLI · June 14, 2019, 5:38pm

I’m trying to understand memory usage and efficient coding of functions in Julia. Based on:

using LinearAlgebra
using BenchmarkTools

consider the following 3 illustrative functions [illustrating ideas; not the ones I will use]:

Function 1

function func1()
    a = rand(100,1000)
    return norm(a,2)
end
;
@benchmark begin
    for i in 1:10000
        func1()
    end
end
#
BenchmarkTools.Trial: 
  memory estimate:  7.45 GiB
  allocs estimate:  20000
  --------------
  minimum time:     4.654 s (15.18% GC)
  median time:      4.662 s (15.23% GC)
  mean time:        4.662 s (15.23% GC)
  maximum time:     4.669 s (15.29% GC)
  --------------
  samples:          2
  evals/sample:     1

Comment 1: I assume that func1() allocates a new 10^2\times10^3 matrix in heap memory each of the 10^4 times the function is called. With 8 bytes pr. element, this is 7.45 GiB.

Function 2:

function func2(a)
    a .= rand(100,1000)
    return norm(a,2)
end
;
a = rand(100,1000)
@benchmark begin
    for i in 1:10000
        func2(a)
    end
end
#
BenchmarkTools.Trial: 
  memory estimate:  7.45 GiB
  allocs estimate:  30000
  --------------
  minimum time:     5.944 s (11.68% GC)
  median time:      5.944 s (11.68% GC)
  mean time:        5.944 s (11.68% GC)
  maximum time:     5.944 s (11.68% GC)
  --------------
  samples:          1
  evals/sample:     1

Comment 2: I’m a little surprised here… The input argument a for func2() is put in stack memory and is thus efficiently deleted every time the function is exited, so I thought the memory allocation would be small. However, I guess the reason is that memory is temporarily allocated in heap memory when calling rand(100,1000), leading to no gain. In fact, it is worse.

It turns out that if I replace a .= rand(100,1000) with a = rand(100,1000), func2() behaves just like func1(). It is not quite clear to me why a .=... is worse than a = ... here.

Function 3:

function func3(a,m,n)
    for i in 1:n
        for j in 1:m
            a[j,i] = rand()
        end
    end
    return norm(a,2)
end
;
a = rand(100,1000)
m,n = size(a)
@benchmark begin
    for i in 1:10000
        func3(a,m,n)
    end
end
#
BenchmarkTools.Trial: 
  memory estimate:  156.25 KiB
  allocs estimate:  10000
  --------------
  minimum time:     4.737 s (0.00% GC)
  median time:      4.816 s (0.00% GC)
  mean time:        4.816 s (0.00% GC)
  maximum time:     4.896 s (0.00% GC)
  --------------
  samples:          2
  evals/sample:     1

Comment 3: This is, by far, the most efficient code when it comes to memory usage. The input arguments a, m, n are placed in stack memory, and are efficiently deleted every time the function is exited. The loops ensure that only scalars are involved, and scalars are also put in stack memory and efficiently deleted, I assume.

QUESTIONS to ye Julia gurus:

Why is a .= ... worse than a = ... in this particular case (i.e. func2())?
Why is even 156.25 KiB allocated for func3(...)? Is memory allocated for calling norm(a,2)??
Is there an even more efficient way to do this?
Since I modify the input argument for func2() and func3(), should I really have named them func2!() and func3!()?
The reason I look into this, is that I’ve coded a loss() function for doing parameter estimation of an ODE model. On my workstation with 32 GB RAM, doing estimation allocates some 74 GiB memory – and the estimation takes close to 8 minutes. I see lots of potential for reducing the memory allocation. Still…with @benchmark estimating 74 GiB in a computer with 32 GB RAM… does that imply:
a. The computer starts to swap memory to the SSD disk, which really slows down things [would be horrible on a hard disk]?, or
b. The 74GiB is the cumulative allocation, and the Garbage Collection algorithm frees memory so that the code still runs in RAM?

Good advice and knowledgeable answers are highly appreciated!

rdeits · June 14, 2019, 7:24pm

I’m not sure what you mean by this, but it does not sound correct.

func1 does one thing: it allocates a new matrix of 100x1000 random floats. func2, on the other hand, does two things: it allcoates a new matrix of 100x1000 random floats, then it copies that matrix into a.

You are accessing a non-constant global variable (a) inside your benchmark. See: https://github.com/JuliaCI/BenchmarkTools.jl/blob/master/doc/manual.md#interpolating-values-into-benchmark-expressions . You can avoid this by calling func2($a) instead, as explained in that section of the manual.

Furthermore, there is no reason for the for i in 1:1000 loop: @benchmark will already call your code multiple times to build up a precise estimate of its runtime.

There’s nothing wrong with a loop, and your func3 looks fine. You can avoid having to worry about which index order to choose by using eachindex:

for i in eachindex(a)
  a[i] = rand()
end

and you can eliminate bounds-checking for a little extra speed:

@inbounds a[i] = rand()

or you can use broadcasting to automatically call rand() for each element of a:

a .= rand.()

but I would expect all of those to perform comparably with your hand-written loop. Loops are fast in Julia, so don’t worry about using them when they seem appropriate to the task at hand.

It’s just a matter of style, but I would say yes.

The second one: that memory measurement is the cumulative sum of all the allocations. The total amount of memory consumed at any one point will be much lower, since the GC can free unused objects.

BLI · June 14, 2019, 8:50pm

Thanks for quick response. Let me explain the first of my questions that you address:

The input argument a for func2() is put in stack memory and is thus efficiently deleted every time the function is exited.

I’m not sure what you mean by this, but it does not sound correct.

OK – I read Stack-based memory allocation - Wikipedia, where it says that “In most modern computer systems, each thread has a reserved region of memory referred to as its stack. When a function executes, it may add some of its state data to the top of the stack; when the function exits it is responsible for removing that data from the stack.”

Maybe I over interpreted the info in the Wikipedia article… my interpretation was that input arguments to the function are placed on the stack. Maybe that is wrong?

rdeits · June 14, 2019, 9:06pm

Julia doesn’t really make a guarantee either way, at least not in a way that you would observe in Julia code (except indirectly by measuring allocations). But in an equivalent function in a language like C, the argument would be a pointer, and while that pointer would live on the stack, the data it points to might live on the heap. Passing an array to a function in Julia will certainly not guarantee that the array’s data lives on the stack.

BLI · June 14, 2019, 9:16pm

This trick reduced the memory estimate to zero! Neat.

This interpolation trick – I take it that that only works within the @benchmark macro?

rdeits · June 14, 2019, 9:36pm

Interpolation is a more general thing, but yes–this particular trick is most often seen with the macros from BenchmarkTools.jl.

BLI · June 14, 2019, 9:45pm

[Of course, interpolation (or “substitution”) it is used a lot in strings.]

kristoffer.carlsson · June 14, 2019, 9:52pm

This usage of $ in a macro is specific to BenchmarkTools though.

Topic		Replies	Views
@benchmark reporting (probably) wrong memory usage (BenchmarkTools.jl) Tooling	4	873	June 2, 2018
Threads memory allocations General Usage	2	482	January 22, 2020
Benchmarking questions General Usage benchmark	3	318	September 18, 2023
Are there any tools in julia that can test the peak memory consumption of the functions I wrote? Performance high-performance , memory-analysis	1	165	April 16, 2024
Doubled memory usage and computation time after update to 1.6.1 from 1.5.4 General Usage performance , memory-allocation	0	368	May 5, 2021

Trying to understand memory usage

Related topics