Inconsistency in Allocs reported by track-allocs vs BenchmarkTools

Spuriosity1 · June 24, 2024, 10:57pm

I have tried to write a low-overhead function that adds a number of Lorentzian peaks to a grid of intensities. When run under @benchmark, it reports “0 allocations”, but when run within my program, I get a large number of allocations reported inside this functions:

        - # puts Lorentzians of weights Snm at energies Enm
        - function broadened_peaks!(
        - 	Sqω::Union{Vector{ComplexF64}, Vector{Float64}},
        - 	Snm::Union{Matrix{ComplexF64}, Matrix{Float64}},
        - 	Enm::Matrix{Float64},
        - 	Egrid::Vector{Float64},
        - 	dE::Float64
        - 	)
        - 
  4476632 	for (E,S) in zip(Enm,Snm)
192964896 		for (i,e) in enumerate(Egrid)
690203472 			Sqω[i] += S*Lorentzian(e-E, dE)
884581976 		end
 12304264 	end
        - end

Why are such high allocations being reported here?

abraemer · June 25, 2024, 6:18pm

This is likely due to compilation.

That’s why the manual recommends running the workload once and then using Profile.clear_malloc_data() to reset the counters.

Spuriosity1 · June 25, 2024, 11:18pm

Sorry, I don’t understand - shouldn’t compilation be a ‘run once’ operation? Does this mean that for whatever reason, broadened_peaks! is being recompiled at every iteration?

For further context, this was not run in the REPL - there is a driver script that in essence just calls this function several million times. I also tested with different numbers of repetitions - the number of allocations changes depending on the number of repetitions.

ufechner7 · June 25, 2024, 11:48pm

Are you sure BenchmarkTools calls your function with the same combination of parameter types as when executed by your driver?

Spuriosity1 · June 26, 2024, 12:16am

I’ve tested for all four possible combinations of types under BenchmarkTools - one gives InexactError, the other three show zero allocations.

To be extra sure, I rewrote two methods without using any Union types, several billion allocations are still being associated with this function call.

        - function broadened_peaks!(
        - 	Sqω::Vector{ComplexF64},
        - 	Snm::Matrix{ComplexF64},
        - 	Enm::Matrix{Float64},
        - 	Egrid::Vector{Float64},
        - 	dE::Float64
        - 	)
        - 
  6748880 	for (E,S) in zip(Enm,Snm)
315413912 		for (i,e) in enumerate(Egrid)
1343077008 			Sqω[i] += S*Lorentzian(e-E, dE)
1565108288 		end
 20662600 	end
        - end

        - 
        - # puts Lorentzians of weights Snm at energies Enm
        - function broadened_peaks!(
        - 	Sqω::Vector{Float64},
        - 	Snm::Matrix{Float64},
        - 	Enm::Matrix{Float64},
        - 	Egrid::Vector{Float64},
        - 	dE::Float64
        - 	)
        - 
  2891376 	for (E,S) in zip(Enm,Snm)
133055928 		for (i,e) in enumerate(Egrid)
389021376 			Sqω[i] += S*Lorentzian(e-E, dE)
710797752 		end
  8618144 	end
        - end

       ```

abraemer · June 26, 2024, 5:19am

The allocations are likely caused by track-allocations. From the manual I linked above:

--track-allocation changes code generation to log the allocations, and so the allocations may be different than what happens without the option. We recommend using the allocation profiler instead.

The loop constructs rely heavily on inlining and subsequent compiler optimizations to avoid allocations. I think track-allocations might just hinder that with the extra code that tries to track allocations per line.

Topic		Replies	Views
Benchmark is moving target? Performance	2	184	December 3, 2024
Way to show where memory allocations occur? General Usage	6	9711	January 24, 2018
Track memory allocation not working correctly General Usage performance	8	1935	July 26, 2021
Way to return the number of allocations? General Usage	10	1119	August 2, 2017
Finding the memory allocation in some code General Usage performance	3	856	August 25, 2017

Inconsistency in Allocs reported by track-allocs vs BenchmarkTools

Related topics