Track memory allocation not working correctly

Wikunia · March 19, 2021, 2:29pm

I’m currently a bit perplexed by --track_allocation=user while I’m writing a blog post about BenchmarkTools and profiling.
My test code is:

function get_primes(to)
    possible_primes = collect(2:to)
    # iterate over the primes and remove multiples
    i = 1
    while i <= length(possible_primes)
        prime = possible_primes[i]
        possible_primes = filter(n->n <= prime || n % prime != 0, possible_primes)
        i += 1
    end
    return possible_primes
end

which is a very suboptimal solution to the problem and can be easily fixed with using filter!… So I encountered that it takes >700MiB of allocations for computing the primes up to 100,000.
I assumed to see a big number next to the line:

possible_primes = filter(n->n <= prime || n % prime != 0, possible_primes)

when running it with julia --track-allocation=user

Unfortunately it shows:

        - function get_primes(to)
   800080     possible_primes = collect(2:to)
        -     # iterate over the primes and remove multiples
        -     i = 1
        0     while i <= length(possible_primes)
        0         prime = possible_primes[i]
        0         possible_primes = filter(n->n <= prime || n % prime != 0, possible_primes)
        0         i += 1
        -     end
        0     return possible_primes
        - end

which obviously doesn’t explain the memory allocations.

Am I doing something wrong?

Appreciate your help and I might send you the blog post for proofreading when you’re able to help me out

lmiq · March 19, 2021, 2:44pm

I always found that way of tracking allocations confusing, now I mostly do that by hand using @allocated. In this example, I would do:

julia> function get_primes(to)
           possible_primes = collect(2:to)
           # iterate over the primes and remove multiples
       a = @allocated begin
           i = 1
           while i <= length(possible_primes)
               prime = possible_primes[i]
               possible_primes = filter(n->n <= prime || n % prime != 0, possible_primes)
               i += 1
           end
       end; a > 0 && println(a)
           return possible_primes
       end
get_primes (generic function with 1 method)

julia> get_primes(100_000);
739349184

Where you can see clearly that the loop is allocating a lot, as expected.

That does not solve the issue you are seeing with Profile. I wander if that is a bug or if we do not understand the output.

Wikunia · March 19, 2021, 2:47pm

Yeah the problem for bigger loops would then be that you don’t want to know the exact line but don’t want to use @allocated inside the loop as it would print a lot
Additionally we don’t want to change the code for profiling normally but I agree that @allocated is a nice macro as well.

lmiq · March 19, 2021, 2:48pm

Indeed, I sometimes add a return there (and it only prints if allocates, so that is not so bad). It is actually not that bad. But of course if the profiling tool provided clearly the same information, that would be great.

Wikunia · March 19, 2021, 3:41pm

I just found out that GitHub - KristofferC/TimerOutputs.jl: Formatted output of timed sections in Julia can be useful for this.
It helps with the problem of measuring it inside loops but it still needs code changes to measure it,

As an example:

using TimerOutputs
const tmr = TimerOutput()

function get_primes(to)
    @timeit tmr "collect" possible_primes = collect(2:to)
    # iterate over the primes and remove multiples
    i = 1
    while i <= length(possible_primes)
        prime = possible_primes[i]
        @timeit tmr "filter"  possible_primes = filter(n->n <= prime || n % prime != 0, possible_primes)
        i += 1
    end
    return possible_primes
end

and then:

julia> show(tmr)
 ──────────────────────────────────────────────────────────────────
                           Time                   Allocations      
                   ──────────────────────   ───────────────────────
 Tot / % measured:      17.6s / 3.79%            777MiB / 90.9%    

 Section   ncalls     time   %tot     avg     alloc   %tot      avg
 ──────────────────────────────────────────────────────────────────
 filter     9.59k    665ms   100%  69.3μs    705MiB  100%   75.3KiB
 collect        1   78.6μs  0.01%  78.6μs    781KiB  0.11%   781KiB
 ──────────────────────────────────────────────────────────────────

Skoffer · March 19, 2021, 4:19pm

It has one caveat, @timeit generates allocations on its own. So, nested calls are slightly biased. It is not a big problem, but can be source of confusion, when you have nice tight allocationless loop, but outer @timeit shows you that allocations exists.

Henrique_Becker · March 19, 2021, 5:44pm

If you pass the array already generated, the function has zero allocation in all lines? I had some problems with tracking allocation in the past, but they were minor, it seems very problematic if a function that allocates something has zero on every line.

lmiq · March 19, 2021, 6:07pm

        - function get_primes(to,possible_primes)
        -     # iterate over the primes and remove multiples
        -     i = 1
        0     while i <= length(possible_primes)
        0         prime = possible_primes[i]
        0         possible_primes = filter(n->n <= prime || n % prime != 0, possible_primes)
        0         i += 1
        -     end
        0     return possible_primes
        - end
        -

This occurs both with --track-allocation=user and --track-allocation=all.

johnomotani · July 26, 2021, 9:50am

Jumping on this thread because I came across a similar issue (profiling a code much to big to paste as an example here), where the output from --track-allocations=user (or --track-allocations=all) seems to be ‘missing’ a lot of allocations. I realised that there are a bunch of .mem files under ~/.julia associated with the various packages that are being used, and another set in my julia install (I’d compiled from source) for the standard library, etc. Even collecting all of those (using analyze_malloc() from Collect.jl), it looks like most of the allocations reported by BenchmarkTools.@benchmark are missing.

It would be really nice if there was a way to pass a file (or list of files) to --track-allocation and have similar output, but just for those files, with all allocations triggered deeper in the call stack aggregated onto the lines in those files that triggered them. I have no idea how hard that would be to implement, but it would make --track-allocation much more useful to me right now!

Edit: I messed up when checking above and used --track-allocation=user instead of --track-allocation=all. Using --track-allocation=all it turned out that the ‘missing’ allocations for me were all in base/multidimensional.jl. Now my problem is that I don’t know which part of my code caused the call to that function, so I stand by the comment above that aggregating allocations onto specified files of user code would be super useful!

Topic		Replies	Views
Way to show where memory allocations occur? General Usage	6	9715	January 24, 2018
Does anybody use `--track-allocations` General Usage	12	825	March 1, 2023
Tracking allocations lost its way? General Usage memory-allocation	5	409	March 19, 2023
Tooling for line-by-line memory allocation analysis General Usage question , memory-allocation	15	2108	April 24, 2023
Can the output of --track-allocation be trusted with multi-threading enabled? Performance profiling , allocations	9	675	March 30, 2022

Track memory allocation not working correctly

Related topics