I have tried to write a low-overhead function that adds a number of Lorentzian peaks to a grid of intensities. When run under @benchmark, it reports “0 allocations”, but when run within my program, I get a large number of allocations reported inside this functions:
- # puts Lorentzians of weights Snm at energies Enm
- function broadened_peaks!(
- Sqω::Union{Vector{ComplexF64}, Vector{Float64}},
- Snm::Union{Matrix{ComplexF64}, Matrix{Float64}},
- Enm::Matrix{Float64},
- Egrid::Vector{Float64},
- dE::Float64
- )
-
4476632 for (E,S) in zip(Enm,Snm)
192964896 for (i,e) in enumerate(Egrid)
690203472 Sqω[i] += S*Lorentzian(e-E, dE)
884581976 end
12304264 end
- end
Why are such high allocations being reported here?
This is likely due to compilation.
That’s why the manual recommends running the workload once and then using Profile.clear_malloc_data()
to reset the counters.
Sorry, I don’t understand - shouldn’t compilation be a ‘run once’ operation? Does this mean that for whatever reason, broadened_peaks! is being recompiled at every iteration?
For further context, this was not run in the REPL - there is a driver script that in essence just calls this function several million times. I also tested with different numbers of repetitions - the number of allocations changes depending on the number of repetitions.
Are you sure BenchmarkTools calls your function with the same combination of parameter types as when executed by your driver?
I’ve tested for all four possible combinations of types under BenchmarkTools - one gives InexactError, the other three show zero allocations.
To be extra sure, I rewrote two methods without using any Union types, several billion allocations are still being associated with this function call.
- function broadened_peaks!(
- Sqω::Vector{ComplexF64},
- Snm::Matrix{ComplexF64},
- Enm::Matrix{Float64},
- Egrid::Vector{Float64},
- dE::Float64
- )
-
6748880 for (E,S) in zip(Enm,Snm)
315413912 for (i,e) in enumerate(Egrid)
1343077008 Sqω[i] += S*Lorentzian(e-E, dE)
1565108288 end
20662600 end
- end
-
- # puts Lorentzians of weights Snm at energies Enm
- function broadened_peaks!(
- Sqω::Vector{Float64},
- Snm::Matrix{Float64},
- Enm::Matrix{Float64},
- Egrid::Vector{Float64},
- dE::Float64
- )
-
2891376 for (E,S) in zip(Enm,Snm)
133055928 for (i,e) in enumerate(Egrid)
389021376 Sqω[i] += S*Lorentzian(e-E, dE)
710797752 end
8618144 end
- end
```
The allocations are likely caused by track-allocations
. From the manual I linked above:
--track-allocation
changes code generation to log the allocations, and so the allocations may be different than what happens without the option. We recommend using the allocation profiler instead.
The loop constructs rely heavily on inlining and subsequent compiler optimizations to avoid allocations. I think track-allocations
might just hinder that with the extra code that tries to track allocations per line.
1 Like