Track memory allocation not working correctly

I’m currently a bit perplexed by --track_allocation=user while I’m writing a blog post about BenchmarkTools and profiling.
My test code is:

function get_primes(to)
    possible_primes = collect(2:to)
    # iterate over the primes and remove multiples
    i = 1
    while i <= length(possible_primes)
        prime = possible_primes[i]
        possible_primes = filter(n->n <= prime || n % prime != 0, possible_primes)
        i += 1
    end
    return possible_primes
end

which is a very suboptimal solution to the problem and can be easily fixed with using filter!… So I encountered that it takes >700MiB of allocations for computing the primes up to 100,000.
I assumed to see a big number next to the line:

possible_primes = filter(n->n <= prime || n % prime != 0, possible_primes)

when running it with julia --track-allocation=user

Unfortunately it shows:

        - function get_primes(to)
   800080     possible_primes = collect(2:to)
        -     # iterate over the primes and remove multiples
        -     i = 1
        0     while i <= length(possible_primes)
        0         prime = possible_primes[i]
        0         possible_primes = filter(n->n <= prime || n % prime != 0, possible_primes)
        0         i += 1
        -     end
        0     return possible_primes
        - end

which obviously doesn’t explain the memory allocations.

Am I doing something wrong?

Appreciate your help and I might send you the blog post for proofreading when you’re able to help me out :grinning_face_with_smiling_eyes:

I always found that way of tracking allocations confusing, now I mostly do that by hand using @allocated. In this example, I would do:

julia> function get_primes(to)
           possible_primes = collect(2:to)
           # iterate over the primes and remove multiples
       a = @allocated begin
           i = 1
           while i <= length(possible_primes)
               prime = possible_primes[i]
               possible_primes = filter(n->n <= prime || n % prime != 0, possible_primes)
               i += 1
           end
       end; a > 0 && println(a)
           return possible_primes
       end
get_primes (generic function with 1 method)

julia> get_primes(100_000);
739349184

Where you can see clearly that the loop is allocating a lot, as expected.

That does not solve the issue you are seeing with Profile. I wander if that is a bug or if we do not understand the output.

Yeah the problem for bigger loops would then be that you don’t want to know the exact line but don’t want to use @allocated inside the loop as it would print a lot :grinning_face_with_smiling_eyes:
Additionally we don’t want to change the code for profiling normally but I agree that @allocated is a nice macro as well.

Indeed, I sometimes add a return there (and it only prints if allocates, so that is not so bad). It is actually not that bad. But of course if the profiling tool provided clearly the same information, that would be great.

1 Like

I just found out that GitHub - KristofferC/TimerOutputs.jl: Formatted output of timed sections in Julia can be useful for this.
It helps with the problem of measuring it inside loops but it still needs code changes to measure it,

As an example:

using TimerOutputs
const tmr = TimerOutput()

function get_primes(to)
    @timeit tmr "collect" possible_primes = collect(2:to)
    # iterate over the primes and remove multiples
    i = 1
    while i <= length(possible_primes)
        prime = possible_primes[i]
        @timeit tmr "filter"  possible_primes = filter(n->n <= prime || n % prime != 0, possible_primes)
        i += 1
    end
    return possible_primes
end

and then:

julia> show(tmr)
 ──────────────────────────────────────────────────────────────────
                           Time                   Allocations      
                   ──────────────────────   ───────────────────────
 Tot / % measured:      17.6s / 3.79%            777MiB / 90.9%    

 Section   ncalls     time   %tot     avg     alloc   %tot      avg
 ──────────────────────────────────────────────────────────────────
 filter     9.59k    665ms   100%  69.3μs    705MiB  100%   75.3KiB
 collect        1   78.6μs  0.01%  78.6μs    781KiB  0.11%   781KiB
 ──────────────────────────────────────────────────────────────────
2 Likes

It has one caveat, @timeit generates allocations on its own. So, nested calls are slightly biased. It is not a big problem, but can be source of confusion, when you have nice tight allocationless loop, but outer @timeit shows you that allocations exists.

4 Likes

If you pass the array already generated, the function has zero allocation in all lines? I had some problems with tracking allocation in the past, but they were minor, it seems very problematic if a function that allocates something has zero on every line.

1 Like
        - function get_primes(to,possible_primes)
        -     # iterate over the primes and remove multiples
        -     i = 1
        0     while i <= length(possible_primes)
        0         prime = possible_primes[i]
        0         possible_primes = filter(n->n <= prime || n % prime != 0, possible_primes)
        0         i += 1
        -     end
        0     return possible_primes
        - end
        - 

This occurs both with --track-allocation=user and --track-allocation=all.

1 Like

Jumping on this thread because I came across a similar issue (profiling a code much to big to paste as an example here), where the output from --track-allocations=user (or --track-allocations=all) seems to be ‘missing’ a lot of allocations. I realised that there are a bunch of .mem files under ~/.julia associated with the various packages that are being used, and another set in my julia install (I’d compiled from source) for the standard library, etc. Even collecting all of those (using analyze_malloc() from Collect.jl), it looks like most of the allocations reported by BenchmarkTools.@benchmark are missing.

It would be really nice if there was a way to pass a file (or list of files) to --track-allocation and have similar output, but just for those files, with all allocations triggered deeper in the call stack aggregated onto the lines in those files that triggered them. I have no idea how hard that would be to implement, but it would make --track-allocation much more useful to me right now!

Edit: I messed up when checking above and used --track-allocation=user instead of --track-allocation=all. Using --track-allocation=all it turned out that the ‘missing’ allocations for me were all in base/multidimensional.jl. Now my problem is that I don’t know which part of my code caused the call to that function, so I stand by the comment above that aggregating allocations onto specified files of user code would be super useful!