@dmbates, thanks for the answer! I tried your method, but am still having difficulty in getting the information I want.
As a simple test, I created testalloc.jl, whose contents are
function test(A)
θ = linspace(0, 2π, 50)
X = [cos(θ) sin(θ)]'
Z = A*X
return Z
end
After starting Julia (v0.5) with julia --track-allocation=user, I executed the following commands in REPL:
julia> include("testalloc.jl")
test (generic function with 1 method)
julia> A = rand(2,2);
julia> test(A);
julia> Profile.clear_malloc_data()
julia> test(A);
This created testalloc.jl.mem in the current directory. Then, I quit Julia and restarted it without the --track-allocation=user option, and executed the following commands in REPL:
julia> using Coverage
julia> analyze_malloc(".")
3-element Array{Coverage.MallocInfo,1}:
Coverage.MallocInfo(0,"./testalloc.jl.mem",4)
Coverage.MallocInfo(0,"./testalloc.jl.mem",6)
Coverage.MallocInfo(144368,"./testalloc.jl.mem",3)
In comparison, I get the following benchmark result:
julia> using BenchmarkTools
julia> A = rand(2,2);
julia> include("testalloc.jl");
julia> @benchmark test($A)
BenchmarkTools.Trial:
memory estimate: 3.91 KiB
allocs estimate: 13
Now, here are my questions:
-
The analyze_malloc() result reports 0-byte allocation in line 4 of testalloc.jl, which is Z = A*X. I don’t think that is true, because this line clearly allocates memory for the array Z. How can I make sense of this result?
-
The @benchmark result reports 13 allocations while running test(A). I would like to know exactly how these 13 allocations distribute over the lines of the function test. I guess each of the first three lines inside the function consumes a portion of these allocations because the variables θ, X, and Z are created there, but how many of the 13 allocations are used to create each of θ, X, and Z? Is there a way to know these details?
(Edited) Later, I figured that I might be able to get an answer to the 2nd question above by commenting out the lines of my test() function and then using @benchmark. Specifically,
- If I comment out all the lines inside the body of
test() except for the first line (where θ is created), then @benchmark would report the number of allocations used in the first line.
- Then, if I uncomment the second line (where
X is created) and use @benchmark, then @benchmark would report the number of allocations used in the first and second lines.
- By subtracting the former from the latter, I would be able to obtain the number of allocations used in the second line.
- Repeat this procedure to get the number of allocations used in the subsequent lines.
This procedure indeed revealed that no allocations were used in creating θ and X, and all the 13 allocations were consumed in creating Z! I can understand the result for θ, because it is an instance of UnitRange rather than Array, but I still don’t understand why creating X uses 0 allocation…
Also, I don’t understand why the third line of the body of test() (where Z is created) consumes as much as 13 allocations. Don’t we need just one allocation to create Z and fill its contents with A*X?