@dmbates, thanks for the answer! I tried your method, but am still having difficulty in getting the information I want.
As a simple test, I created testalloc.jl
, whose contents are
function test(A)
θ = linspace(0, 2π, 50)
X = [cos(θ) sin(θ)]'
Z = A*X
return Z
end
After starting Julia (v0.5) with julia --track-allocation=user
, I executed the following commands in REPL:
julia> include("testalloc.jl")
test (generic function with 1 method)
julia> A = rand(2,2);
julia> test(A);
julia> Profile.clear_malloc_data()
julia> test(A);
This created testalloc.jl.mem
in the current directory. Then, I quit Julia and restarted it without the --track-allocation=user
option, and executed the following commands in REPL:
julia> using Coverage
julia> analyze_malloc(".")
3-element Array{Coverage.MallocInfo,1}:
Coverage.MallocInfo(0,"./testalloc.jl.mem",4)
Coverage.MallocInfo(0,"./testalloc.jl.mem",6)
Coverage.MallocInfo(144368,"./testalloc.jl.mem",3)
In comparison, I get the following benchmark result:
julia> using BenchmarkTools
julia> A = rand(2,2);
julia> include("testalloc.jl");
julia> @benchmark test($A)
BenchmarkTools.Trial:
memory estimate: 3.91 KiB
allocs estimate: 13
Now, here are my questions:
-
The analyze_malloc()
result reports 0-byte allocation in line 4 of testalloc.jl
, which is Z = A*X
. I don’t think that is true, because this line clearly allocates memory for the array Z
. How can I make sense of this result?
-
The @benchmark
result reports 13 allocations while running test(A)
. I would like to know exactly how these 13 allocations distribute over the lines of the function test
. I guess each of the first three lines inside the function consumes a portion of these allocations because the variables θ
, X
, and Z
are created there, but how many of the 13 allocations are used to create each of θ
, X
, and Z
? Is there a way to know these details?
(Edited) Later, I figured that I might be able to get an answer to the 2nd question above by commenting out the lines of my test()
function and then using @benchmark
. Specifically,
- If I comment out all the lines inside the body of
test()
except for the first line (where θ
is created), then @benchmark
would report the number of allocations used in the first line.
- Then, if I uncomment the second line (where
X
is created) and use @benchmark
, then @benchmark
would report the number of allocations used in the first and second lines.
- By subtracting the former from the latter, I would be able to obtain the number of allocations used in the second line.
- Repeat this procedure to get the number of allocations used in the subsequent lines.
This procedure indeed revealed that no allocations were used in creating θ
and X
, and all the 13 allocations were consumed in creating Z
! I can understand the result for θ
, because it is an instance of UnitRange
rather than Array
, but I still don’t understand why creating X
uses 0 allocation…
Also, I don’t understand why the third line of the body of test()
(where Z
is created) consumes as much as 13 allocations. Don’t we need just one allocation to create Z
and fill its contents with A*X
?