Using julia --track-allocation
identifies the top hotspots in my code for memory allocation:
- function zSQdensity(z::Float64, wa::WorkArea)
0 ev = wa.evaluator
0 dat::DataFrame = wa.dat
0 objective::Objective = wa.objective
0 if objective == justZ || objective == just1
0 d = exp(-0.5 * z^2)
- else
1356970560 d = exp(-0.5 * (1.0 - 2.0 * ev.Ξ») * z^2)
- end
190084928 for i in wa.i_start:wa.i_end
2062343712 Y = dat.Y[i]
-
- # conditional Y=1 | z
- # next line gets most of the CPU time
11654907264 cd = logistic(z*ev.Ο + ev.k)
0 if Y
748204864 d *= cd
- else
4331043904 d *= (1.0-cd)
- end
-
0 end
0 if objective == justZ || objective == WZ
442565376 d *= z
- end
0 return d
- end
logistic
is from StatsFuns
and is 1/(1+e^{-x}). I am baffled that any allocation is going on. The top spot is cd = logistic(z*ev.Ο + ev.k)
and all the quantities involved are scalars. I figured the compiler would be doing the math in registers.
Is it just that d
, Y
and cd
are allocated fresh at each iteration? But Y
and cd
are allocated the same number of times and yet have much different total bytes allocated. For that matter, d *= cd
multiplies an existing number. Why does that cause any new memory allocation?
Or maybe this is the profiling version of a Heisenbug, in which Iβm actually tracking the overhead of the profiler?
These functions are in an inner loop that is called a lot, and so itβs not surprising they should account for a lot of the load. But I donβt see how they are allocating memory, and so I donβt see how to reduce it.
It may be relevant that the work is going on within a thread, although julia
was running single-threaded for the test. That is, Threads.nthreads()
was 1, but Threads.@spawn
started some tasks within which the computation above ran.
Thanks.
Ross