Inconsistency in Allocs reported by track-allocs vs BenchmarkTools

I have tried to write a low-overhead function that adds a number of Lorentzian peaks to a grid of intensities. When run under @benchmark, it reports “0 allocations”, but when run within my program, I get a large number of allocations reported inside this functions:

        - # puts Lorentzians of weights Snm at energies Enm
        - function broadened_peaks!(
        - 	Sqω::Union{Vector{ComplexF64}, Vector{Float64}},
        - 	Snm::Union{Matrix{ComplexF64}, Matrix{Float64}},
        - 	Enm::Matrix{Float64},
        - 	Egrid::Vector{Float64},
        - 	dE::Float64
        - 	)
        - 
  4476632 	for (E,S) in zip(Enm,Snm)
192964896 		for (i,e) in enumerate(Egrid)
690203472 			Sqω[i] += S*Lorentzian(e-E, dE)
884581976 		end
 12304264 	end
        - end

Why are such high allocations being reported here?

This is likely due to compilation.

That’s why the manual recommends running the workload once and then using Profile.clear_malloc_data() to reset the counters.

Sorry, I don’t understand - shouldn’t compilation be a ‘run once’ operation? Does this mean that for whatever reason, broadened_peaks! is being recompiled at every iteration?

For further context, this was not run in the REPL - there is a driver script that in essence just calls this function several million times. I also tested with different numbers of repetitions - the number of allocations changes depending on the number of repetitions.

Are you sure BenchmarkTools calls your function with the same combination of parameter types as when executed by your driver?

I’ve tested for all four possible combinations of types under BenchmarkTools - one gives InexactError, the other three show zero allocations.

To be extra sure, I rewrote two methods without using any Union types, several billion allocations are still being associated with this function call.

        - function broadened_peaks!(
        - 	Sqω::Vector{ComplexF64},
        - 	Snm::Matrix{ComplexF64},
        - 	Enm::Matrix{Float64},
        - 	Egrid::Vector{Float64},
        - 	dE::Float64
        - 	)
        - 
  6748880 	for (E,S) in zip(Enm,Snm)
315413912 		for (i,e) in enumerate(Egrid)
1343077008 			Sqω[i] += S*Lorentzian(e-E, dE)
1565108288 		end
 20662600 	end
        - end

        - 
        - # puts Lorentzians of weights Snm at energies Enm
        - function broadened_peaks!(
        - 	Sqω::Vector{Float64},
        - 	Snm::Matrix{Float64},
        - 	Enm::Matrix{Float64},
        - 	Egrid::Vector{Float64},
        - 	dE::Float64
        - 	)
        - 
  2891376 	for (E,S) in zip(Enm,Snm)
133055928 		for (i,e) in enumerate(Egrid)
389021376 			Sqω[i] += S*Lorentzian(e-E, dE)
710797752 		end
  8618144 	end
        - end

       ```

The allocations are likely caused by track-allocations. From the manual I linked above:

--track-allocation changes code generation to log the allocations, and so the allocations may be different than what happens without the option. We recommend using the allocation profiler instead.

The loop constructs rely heavily on inlining and subsequent compiler optimizations to avoid allocations. I think track-allocations might just hinder that with the extra code that tries to track allocations per line.

1 Like