Hi!
A have a function AND which takes a pre-allocated vector and writes in the result of some operations. Following the julia naming convention, I wanted to change its name to AND! so that this behavior is reflected compared to the versions of this AND function which do not take a pre-allocated result vector as an input. Then I noticed that the performance drops significantly if I perform this change of function name, and also if I change the function name to anything else, e.g. WOOHOO. I do not understand why changing the name of a function results in decreased performance and increased allocation, nor do I know what type of things to check to see what I am doing incorrectly.
I posted a video on slack of this, it is shown after 1:05 following the initial compilation of the code: Slack
Does anyone have any ideas?
Best,
-Tusike
Some additional notes:
- I was advised to check @code_warntype, it returns that everything is fine.
- I know I am not interpolating variables in @btime, but I have checked since and it changes nothing.
- –track-allocation shows that after the name change, significant allocations take place:
- function WOOHOO(resTrace::T, rm::AbstractRobustnessMetric, childTraces::T...) where {T<:AbstractTrace}
3200 resBuffer = Zygote.bufferfrom(getTrajectory(resTrace))
11200 childTrajectories = [getTrajectory(c) for c in childTraces]
0 @inbounds for k in eachindex(resBuffer)
38404800 resBuffer[k] = AND(rm, map(t -> t[k], childTrajectories))
- end
0 copyWithNewData(resTrace, copy(resBuffer))
0 return nothing
- end
compared to the original
- function AND(resTrace::T, rm::AbstractRobustnessMetric, childTraces::T...) where {T<:AbstractTrace}
0 resBuffer = Zygote.bufferfrom(getTrajectory(resTrace))
0 childTrajectories = [getTrajectory(c) for c in childTraces]
0 @inbounds for k in eachindex(resBuffer)
0 resBuffer[k] = AND(rm, map(t -> t[k], childTrajectories))
- end
- copyWithNewData(resTrace, copy(resBuffer))
0 return nothing
- end
(Note that there is no recursion here and the inner AND function that is being called has different parameters than the one posted here.)