Very commonly, I want to define some function mainfunc
and then evaluate it many times in a loop where some of the arguments are constant during looping but others vary with each iteration.
Here is an example of what I mean.
using BenchmarkTools
using StaticArrays
using LinearAlgebra: (⋅)
mainfunc(r, a, b, c) =
(r⋅a)*(r⋅b)*(r⋅c)/3 + (a⋅b)*(r⋅c) - (a⋅c)*(r⋅b) - (b⋅c)*(r⋅a)
fabricate_vectors(n) = [rand(SVector{3,Float64}) for _ ∈ 1:n]
const r = rand(SVector{3,Float64})
const as = fabricate_vectors(100)
const bs = fabricate_vectors(100)
const cs = fabricate_vectors(100)
const buffer = Array{Float64}(undef, (length(as), length(bs), length(cs)));
function loopfunc1!(out)
for (k,c) ∈ enumerate(cs)
for (j,b) ∈ enumerate(bs)
for (i,a) ∈ enumerate(as)
out[i,j,k] = mainfunc(r, a, b, c)
end
end
end
return out
end;
In general, all (sub)expressions within mainfunc
need to be evaluated anew for every innermost loop iteration since the output of mainfunc
might depend on and/or mutate global state whenever it is called. However, I am interested in the case where mainfunc
is pure in that it does not do these things and its output depends only on its input values. I have tried to ensure this is the case for mainfunc
, since I don’t expect “mathematical” functions such as +
, *
, or ⋅
to have any side effects for the kinds of inputs (simple vectors of numbers) I am interested in.
Then, it seems that an efficient strategy might be as follows, at least in concept.
function loopfunc2!(out)
for (k,c) ∈ enumerate(cs)
f1(a,b) = mainfunc(r,a,b,c)
for (j,b) ∈ enumerate(bs)
f2(a) = f1(a,b)
for (i,a) ∈ enumerate(as)
out[i,j,k] = f2(a)
end
end
end
return out
end;
Here, I hoped that the compiler would recognize that in defining f1
, sub-expressions within the wrapped mainfunc
that contain only r
and c
become constant and don’t need to be evaluated again and again in the inner loops that follow. Thus, to save work, these sub-expressions could be evaluated just once per outermost loop right when f1
is defined. Similarly, in defining f2
, only sub-expressions involving a
of the (doubly) wrapped mainfunc
are non-constant in the innermost loop.
Unfortunately, my attempt to get the compiler to infer what I want by using wrapper functions doesn’t make a difference.
julia> @btime loopfunc1!(buffer);
8.531 ms (0 allocations: 0 bytes)
julia> @btime loopfunc2!(buffer);
8.536 ms (0 allocations: 0 bytes)
Therefore, I am wondering if there is a way to get the Julia compiler to make such optimizations when they are possible.
A manual solution would be to have mainfunc
take vectors of vectors as arguments, move the loops into mainfunc
, and manually pre-compute the subexpressions at each loop level.
function mainfunc_explicit_loops!(out, r, as, bs, cs)
# (r⋅a)*(r⋅b)*(r⋅c)/3 + (a⋅b)*(r⋅c) - (a⋅c)*(r⋅b) - (b⋅c)(r⋅a)
for (k,c) ∈ enumerate(cs)
rc = r⋅c
rc_div_3 = rc/3
for (j,b) ∈ enumerate(bs)
rb = r⋅b
bc = b⋅c
for (i,a) ∈ enumerate(as)
ra = r⋅a
ab = a⋅b
ac = a⋅c
out[i,j,k] = ra*rb*rc_div_3 + ab*rc - ac*rb - bc*ra
end
end
end
return out
end
julia> @btime mainfunc_explicit_loops!(buffer, r, as, bs, cs);
4.788 ms (0 allocations: 0 bytes)
We get a 2x speedup doing things by hand.
However, there are some disadvantages. For one, being able to define something like the non-explicit mainfunc
is very convenient since the code looks exactly like the mathematical expression it represents, and looping patterns over combinations of input values, preallocation of result
etc. are kept out of the definition of mainfunc
itself, allowing greater orthogonalization of design.
I am also interested in solutions that don’t depend on the compiler but are maybe cleaner than my mainfunc_explicit_loops!
solution. Thanks!