Way to show where memory allocations occur?

I wonder if there is a nice way to show where in code memory allocations happen.

@benchmark of BenchmarkTools.jl is useful for measuring the number of allocations, and in many cases I can identify the locations of memory allocations by just looking at code (e.g., where new arrays are created), but sometimes I find it difficult to pinpoint where allocations happen in code. It will be nice if there is a tool that reports such information.

4 Likes

Use the --track-allocation=user option when starting julia.

See https://github.com/JuliaCI/Coverage.jl

Be aware that sometimes the allocations are not attributed properly so you may get some strange-looking results.

4 Likes

@dmbates, thanks for the answer! I tried your method, but am still having difficulty in getting the information I want.

As a simple test, I created testalloc.jl, whose contents are

function test(A)
    θ = linspace(0, 2π, 50)
    X = [cos(θ) sin(θ)]'
    Z = A*X

    return Z
end

After starting Julia (v0.5) with julia --track-allocation=user, I executed the following commands in REPL:

julia> include("testalloc.jl")
test (generic function with 1 method)

julia> A = rand(2,2);

julia> test(A);

julia> Profile.clear_malloc_data()

julia> test(A);

This created testalloc.jl.mem in the current directory. Then, I quit Julia and restarted it without the --track-allocation=user option, and executed the following commands in REPL:

julia> using Coverage

julia> analyze_malloc(".")
3-element Array{Coverage.MallocInfo,1}:
 Coverage.MallocInfo(0,"./testalloc.jl.mem",4)
 Coverage.MallocInfo(0,"./testalloc.jl.mem",6)
 Coverage.MallocInfo(144368,"./testalloc.jl.mem",3)

In comparison, I get the following benchmark result:

julia> using BenchmarkTools

julia> A = rand(2,2);

julia> include("testalloc.jl");

julia> @benchmark test($A)
BenchmarkTools.Trial:
  memory estimate:  3.91 KiB
  allocs estimate:  13

Now, here are my questions:

  1. The analyze_malloc() result reports 0-byte allocation in line 4 of testalloc.jl, which is Z = A*X. I don’t think that is true, because this line clearly allocates memory for the array Z. How can I make sense of this result?

  2. The @benchmark result reports 13 allocations while running test(A). I would like to know exactly how these 13 allocations distribute over the lines of the function test. I guess each of the first three lines inside the function consumes a portion of these allocations because the variables θ, X, and Z are created there, but how many of the 13 allocations are used to create each of θ, X, and Z? Is there a way to know these details?

(Edited) Later, I figured that I might be able to get an answer to the 2nd question above by commenting out the lines of my test() function and then using @benchmark. Specifically,

  • If I comment out all the lines inside the body of test() except for the first line (where θ is created), then @benchmark would report the number of allocations used in the first line.
  • Then, if I uncomment the second line (where X is created) and use @benchmark, then @benchmark would report the number of allocations used in the first and second lines.
  • By subtracting the former from the latter, I would be able to obtain the number of allocations used in the second line.
  • Repeat this procedure to get the number of allocations used in the subsequent lines.

This procedure indeed revealed that no allocations were used in creating θ and X, and all the 13 allocations were consumed in creating Z! I can understand the result for θ, because it is an instance of UnitRange rather than Array, but I still don’t understand why creating X uses 0 allocation…

Also, I don’t understand why the third line of the body of test() (where Z is created) consumes as much as 13 allocations. Don’t we need just one allocation to create Z and fill its contents with A*X?

1 Like

You need to allocate memory to perform A*X. Since the multiplication is on that line, the memory allocations made inside the actual function call to A_mul_B(A, X) will be attributed to the Z = A*X line.

@pkofod, but A*X itself uses only two allocations in Julia v0.5:

julia> VERSION
v"0.5.1-pre+31"

julia> A = rand(2,2); X = rand(2,50);

julia> @benchmark $A*$X
BenchmarkTools.Trial:
  memory estimate:  928 bytes
  allocs estimate:  2

Why is the number of allocations here so much different from the previous 13 allocations in Z = A*X inside my test() function?

I am gradually suspecting that @benchmark is buggy in BenchmarkTools for Julia v0.5. Some numbers just don’t add up. In Julia v0.6, the above @benchmark $A*$X correctly reports just 1 allocation, so probably @benchmark has been improved in BenchmarkTools for Julia v0.6.

Assuming @benchmark is more reliable in Julia v0.6, I really wish I could use @benchmark to measure the number of allocations in test($A) in Julia v0.6, but currently it generates a huge number of warnings and eventually reports a ridiculously large number of allocations like 152. To use @benchmark in Julia v0.6, I guess I will need to wait until a stable version of Julia v0.6 is released and BenchmarkTools supports it.

Is it conceivable to have something like @profile in Juno but for allocations instead of computing time?

5 Likes

GitHub - KristofferC/TimerOutputs.jl: Formatted output of timed sections in Julia is pretty ok for showing allocations but you need to annotate where it should measure.

3 Likes