Memory allocation and @profile

@time shows that my program is doing a lot of memory allocation, much of which I could probably avoid. For example

@time b2 = MyFirst.mysolve(f2, nhsub, nhsub.wtfast16yr)
1.998780 seconds (6.27 M allocations: 479.828 MiB, 2.69% gc time)

I’m trying to understand where it’s happening and how much of the time it’s taking. The gc time seems relatively small, but I assume memory allocation takes time too.

The hierarchical Profile.print() has a left-most column labelled Overhead. What is this supposed to represent? Memory allocation time? JIT time? sampler time?

https://docs.julialang.org/en/v1.5/manual/profile/ does not discuss the Overhead column, and the sample outputs there don’t even include it.

That same page does say

specific lines triggering allocation can often be inferred from profiling via the cost of garbage collection that these lines incur.

How exactly can one tell the cost of garbage collection? Is it the overhead field? Will it appear as an explicit call in the profile stack?

More substantively, I thought the places and time gc was triggered were somewhat random, and not necessarily the place the memory was allocated.

And, to repeat, what if I want to measure allocation directly? Do I need to go to the --track-allocation option, which apparently requires a restart of Julia?

I’m using Julia 1.5.1 on MS Windows 10.

1 Like

You would only see allocations there. GC only happens as pat of those (possibly). If you have many threads I believe you could have no allocations, except in some other thread you’re not tracking and it would slow down doing GC.

You are right, it is stochastic so you need to run many times. I usually rely on BenchmarkTools to do this:

julia> using BenchmarkTools

julia> silly_example() = sum(randn(1000))
silly_example (generic function with 1 method)

julia> @benchmark silly_example()
BenchmarkTools.Trial: 
  memory estimate:  7.94 KiB
  allocs estimate:  1
  --------------
  minimum time:     3.347 μs (0.00% GC)
  median time:      3.903 μs (0.00% GC)
  mean time:        4.679 μs (10.34% GC)
  maximum time:     428.693 μs (92.27% GC)
  --------------
  samples:          10000
  evals/sample:     8
2 Likes

I don’t see anything that looks like a call to memory allocation in my profile. Possibilities:

  1. I don’t know what to look for.
  2. It is too infrequent to have been caught by my threshholds (see below).
  3. I need to enable tracing into C.

I looked at the profile with

Profile.print(; maxdepth=25, noisefloor=5)
Profile.print(;format=:flat, mincount=1000)

There were about 57k top level samples

Also, I’m still hoping someone can explain what the overhead column is in the default (tree) profile print.

I can’t help you with the overhead column (I never use Profile.print), but I would recommend trying to view profiles with ProfileView.jl. In my opinion, it’s much easier to look at than Profile.print.

When viewing the profile, look for orange and red bars. The orange ones show where allocations happen and the red ones show where dynamic dispatch happens. Dynamic dispatch can also be a source of allocations and can sometimes be very bad for performance.

Since your function only takes about two seconds to complete, maybe try running it a few times to get a better view of the profile.

2 Likes

What do I do to find out memory allocations?

  1. @profiler (I’m using Juno) has same colors as ProfileView.jl. Doesn’t directly show allocations but if the allocations result in slow code, profiling should help focus the search.
  2. @code_warntype to find memory allocations related to type-instability.
  3. for --track-allocation I write a script like
function g(n, T)
    sum([rand(T) for _ in 1:n])
end
@time g(1, Float64)
using Profile
Profile.clear_malloc_data()
@time g(100, Float64)
exit()
file = @__FILE__
run(`julia --track-allocation=user $file`)

The last lines are because I’m using Juno and too lazy to open a terminal.

  1. Sprinkle @time inside the suspicious function to see if the allocations also result in time spent. This only works in one level at a time because @time @time results in many allocations. Somehow this is the most precise method I know to measure memory allocations.
1 Like