Hello all,
I’m new to Julia and as an exercise I’ve ported a medium-complex Python code to Julia. Even without particular optimizations, the Julia code is 4 to 5 times faster than Python (nice!), but I’ve decided to try additional optimizations, at least the low-hanging fruits. Using the excellent Performance Tips from the manual and the Profile
module I’ve been able to achieve an overall 50% improvement to the code performance (nice!), but I’m not stuck in a somewhat annoying situation where the profiler shows that in my “hot path” most of the time is actually being spent in materialize
(39% of the runtime, versus 21% of the function that does the computations I care about), that is being called indirectly —if the profiler is to be trusted in identifying the source code lines correctly— in the whereabouts of findall
calls.
I wouldn’t mind it as much, if not for the fact that it feels “unfair” that the code is spending more time on administrative tasks, so to say, than in what I want it to actually do.
I have already managed to reduce the use of findall
with simpler looping constructs in a couple of instances (with measurable performance benefits), but the remaining ones don’t seem to offer any advantage (and in fact sometimes provide worse performance) when findall
gets replaced with something different.
The general structure for the remaining uses of findall
is something like this:
for xy in findall(A .== constant) # or A .> constant or similar
work_on!(B, C, A, xy)
end
where A
, B
and C
are (same-sized) arrays. There is also a A[findall(A .> 0)] .-= 1
which I’m sure can’t be improved any further. The reason why this approach is used is that on each turn the set of elements where work needs to be done is sparse and scattered throughout the original matrices (using a “dense” approach is an order of magnitude slower).
My understanding is that the materialize
isn’t called by the findall
directly, but is a byproduct of the broadcast, so I was thinking that an iterator-like argument to findall
could help (I’ve seen some discussions in the forum about using broadcasted
for that), but IME these approach are consistently worse, and so also things like iterating over enumerate(A)
and skipping unwanted items.
Is there a way out of this that I’m not seeing or am I stuck with ~40% of my runtime going on “finding where to do the work”?