Reducing allocations for long expressions

Hi, I have a question about how to reduce the allocations for long expressions, here is my example

function rel(a, b, c, f, delta, ZB, s, t0, y0)
    term = 0.0
    for i = 1:length(s)
        term += s[i] * (f[1, 1] * a[i] * exp(1im * (delta[i, 1] + ZB[1]) * t0) * conj(y0[89]) + f[2, 1] * b[i] * exp(1im * (delta[i, 2] + ZB[1]) * t0) * conj(y0[93]) + f[3, 1] * c[i] * exp(1im * (delta[i, 3] + ZB[1]) * t0) * conj(y0[97]) + f[4, 1] * b[i] * exp(1im * (delta[i, 4] + ZB[1]) * t0) * conj(y0[101]) + f[5, 1] * c[i] * exp(1im * (delta[i, 5] + ZB[1]) * t0) * conj(y0[105]) + f[6, 1] * b[i] * exp(1im * (delta[i, 6] + ZB[1]) * t0) * conj(y0[109]) + f[7, 1] * a[i] * exp(1im * (delta[i, 7] + ZB[1]) * t0) * conj(y0[113]) + f[9, 1] * c[i] * exp(1im * (delta[i, 9] + ZB[1]) * t0) * conj(y0[121]) + f[10, 1] * b[i] * exp(1im * (delta[i, 10] + ZB[1]) * t0) * conj(y0[125]) + f[11, 1] * a[i] * exp(1im * (delta[i, 11] + ZB[1]) * t0) * conj(y0[129]) +
                        conj(f[1, 1] * a[i]) * exp(-1im * (delta[i, 1] + ZB[1]) * t0) * y0[89] + conj(f[2, 1] * b[i]) * exp(-1im * (delta[i, 2] + ZB[1]) * t0) * y0[93] + conj(f[3, 1] * c[i]) * exp(-1im * (delta[i, 3] + ZB[1]) * t0) * y0[97] + conj(f[4, 1] * b[i]) * exp(-1im * (delta[i, 4] + ZB[1]) * t0) * y0[101] + conj(f[5, 1] * c[i]) * exp(-1im * (delta[i, 5] + ZB[1]) * t0) * y0[105] + conj(f[6, 1] * b[i]) * exp(-1im * (delta[i, 6] + ZB[1]) * t0) * y0[109] + conj(f[7, 1] * a[i]) * exp(-1im * (delta[i, 7] + ZB[1]) * t0) * y0[113] + conj(f[9, 1] * c[i]) * exp(-1im * (delta[i, 9] + ZB[1]) * t0) * y0[121] + conj(f[10, 1] * b[i]) * exp(-1im * (delta[i, 10] + ZB[1]) * t0) * y0[125] + conj(f[11, 1] * a[i]) * exp(-1im * (delta[i, 11] + ZB[1]) * t0) * y0[129] +
                        f[1, 2] * b[i] * exp(1im * (delta[i, 1] + ZB[2]) * t0) * conj(y0[90]) + f[2, 2] * c[i] * exp(1im * (delta[i, 2] + ZB[2]) * t0) * conj(y0[94]) + f[4, 2] * c[i] * exp(1im * (delta[i, 4] + ZB[2]) * t0) * conj(y0[102]) + f[6, 2] * c[i] * exp(1im * (delta[i, 6] + ZB[2]) * t0) * conj(y0[110]) + f[7, 2] * b[i] * exp(1im * (delta[i, 7] + ZB[2]) * t0) * conj(y0[114]) + f[10, 2] * c[i] * exp(1im * (delta[i, 10] + ZB[2]) * t0) * conj(y0[126]) + f[11, 2] * b[i] * exp(1im * (delta[i, 11] + ZB[2]) * t0) * conj(y0[130]) + f[12, 2] * a[i] * exp(1im * (delta[i, 12] + ZB[2]) * t0) * conj(y0[134]) +
                        conj(f[1, 2] * b[i]) * exp(-1im * (delta[i, 1] + ZB[2]) * t0) * y0[90] + conj(f[2, 2] * c[i]) * exp(-1im * (delta[i, 2] + ZB[2]) * t0) * y0[94] + conj(f[4, 2] * c[i]) * exp(-1im * (delta[i, 4] + ZB[2]) * t0) * y0[102] + conj(f[6, 2] * c[i]) * exp(-1im * (delta[i, 6] + ZB[2]) * t0) * y0[110] + conj(f[7, 2] * b[i]) * exp(-1im * (delta[i, 7] + ZB[2]) * t0) * y0[114] + conj(f[10, 2] * c[i]) * exp(-1im * (delta[i, 10] + ZB[2]) * t0) * y0[126] + conj(f[11, 2] * b[i]) * exp(-1im * (delta[i, 11] + ZB[2]) * t0) * y0[130] + conj(f[12, 2] * a[i]) * exp(-1im * (delta[i, 12] + ZB[2]) * t0) * y0[134] +
                        f[1, 3] * a[i] * exp(1im * (delta[i, 1] + ZB[3]) * t0) * conj(y0[91]) + f[2, 3] * b[i] * exp(1im * (delta[i, 2] + ZB[3]) * t0) * conj(y0[95]) + f[3, 3] * c[i] * exp(1im * (delta[i, 3] + ZB[3]) * t0) * conj(y0[99]) + f[4, 3] * b[i] * exp(1im * (delta[i, 4] + ZB[3]) * t0) * conj(y0[103]) + f[5, 3] * c[i] * exp(1im * (delta[i, 5] + ZB[3]) * t0) * conj(y0[107]) + f[6, 3] * b[i] * exp(1im * (delta[i, 6] + ZB[3]) * t0) * conj(y0[111]) + f[7, 3] * a[i] * exp(1im * (delta[i, 7] + ZB[3]) * t0) * conj(y0[115]) + f[9, 3] * c[i] * exp(1im * (delta[i, 9] + ZB[3]) * t0) * conj(y0[123]) + f[10, 3] * b[i] * exp(1im * (delta[i, 10] + ZB[3]) * t0) * conj(y0[127]) + f[11, 3] * a[i] * exp(1im * (delta[i, 11] + ZB[3]) * t0) * conj(y0[131]) +
                        conj(f[1, 3] * a[i]) * exp(-1im * (delta[i, 1] + ZB[3]) * t0) * y0[91] + conj(f[2, 3] * b[i]) * exp(-1im * (delta[i, 2] + ZB[3]) * t0) * y0[95] + conj(f[3, 3] * c[i]) * exp(-1im * (delta[i, 3] + ZB[3]) * t0) * y0[99] + conj(f[4, 3] * b[i]) * exp(-1im * (delta[i, 4] + ZB[3]) * t0) * y0[103] + conj(f[5, 3] * c[i]) * exp(-1im * (delta[i, 5] + ZB[3]) * t0) * y0[107] + conj(f[6, 3] * b[i]) * exp(-1im * (delta[i, 6] + ZB[3]) * t0) * y0[111] + conj(f[7, 3] * a[i]) * exp(-1im * (delta[i, 7] + ZB[3]) * t0) * y0[115] + conj(f[9, 3] * c[i]) * exp(-1im * (delta[i, 9] + ZB[3]) * t0) * y0[123] + conj(f[10, 3] * b[i]) * exp(-1im * (delta[i, 10] + ZB[3]) * t0) * y0[127] + conj(f[11, 3] * a[i]) * exp(-1im * (delta[i, 11] + ZB[3]) * t0) * y0[131] +
                        f[2, 4] * a[i] * exp(1im * (delta[i, 2] + ZB[4]) * t0) * conj(y0[96]) + f[3, 4] * b[i] * exp(1im * (delta[i, 3] + ZB[4]) * t0) * conj(y0[100]) + f[4, 4] * a[i] * exp(1im * (delta[i, 4] + ZB[4]) * t0) * conj(y0[104]) + f[5, 4] * b[i] * exp(1im * (delta[i, 5] + ZB[4]) * t0) * conj(y0[108]) + f[6, 4] * a[i] * exp(1im * (delta[i, 6] + ZB[4]) * t0) * conj(y0[112]) + f[8, 4] * c[i] * exp(1im * (delta[i, 8] + ZB[4]) * t0) * conj(y0[120]) + f[9, 4] * b[i] * exp(1im * (delta[i, 9] + ZB[4]) * t0) * conj(y0[124]) + f[10, 4] * a[i] * exp(1im * (delta[i, 10] + ZB[4]) * t0) * conj(y0[128]) +
                        conj(f[2, 4] * a[i]) * exp(-1im * (delta[i, 2] + ZB[4]) * t0) * y0[96] + conj(f[3, 4] * b[i]) * exp(-1im * (delta[i, 3] + ZB[4]) * t0) * y0[100] + conj(f[4, 4] * a[i]) * exp(-1im * (delta[i, 4] + ZB[4]) * t0) * y0[104] + conj(f[5, 4] * b[i]) * exp(-1im * (delta[i, 5] + ZB[4]) * t0) * y0[108] + conj(f[6, 4] * a[i]) * exp(-1im * (delta[i, 6] + ZB[4]) * t0) * y0[112] + conj(f[8, 4] * c[i]) * exp(-1im * (delta[i, 8] + ZB[4]) * t0) * y0[120] + conj(f[9, 4] * b[i]) * exp(-1im * (delta[i, 9] + ZB[4]) * t0) * y0[124] + conj(f[10, 4] * a[i]) * exp(-1im * (delta[i, 10] + ZB[4]) * t0) * y0[128])
    end
    return term
end

a = rand(5)
b = rand(5)
c = rand(5)
s = rand(5)
f = rand(12, 4)
delta = rand(5, 12)
ZB = rand(4)
y0 = rand(136)
t0 = 0.1
@time rel(a, b, c, f, delta, ZB, s, t0, y0)

I also find something interesting, when I only use the first 3 line of term, it only cause 1 allocations, but if I consider the 4th line, it suddenly cause hundreads of allocations. Since in my real program, this sub-function is put into a for loop, that means it will cause a lot of allocations which decided by the total number of for loop~

Are you sure you can’t express that as a some coupled loops? Looking at that quickly it doesn’t seem to be something that should be allocating, but the chances of having some bug there are way too large…

(Or at least split that into smaller intermediate calculations)

Yes, the problem is the formula is too long, and I cannot make it into coupled loops because there is no apparent regularity for the formula, and split into small parts seems good way, I just wondering whether can have better way

This is a small thing, but cis(x) will be faster and more accurate than exp(im*x)

Thanks, I don’t know that, I will change that in all my program

With suggested XXL-formula splitting below we get 0-allocations:

@btime rel2($a, $b, $c, $f, $delta, $ZB, $s, $t0, $y0)
  9.200 μs (0 allocations: 0 bytes)
14.687328014022057 + 0.0im

The original XXL without the splitting allocates:

@btime rel($a, $b, $c, $f, $delta, $ZB, $s, $t0, $y0)
  18.600 μs (365 allocations: 16.95 KiB)
Code
function rel2(a, b, c, f, delta, ZB, s, t0, y0)
    term = 0.0 + 0.0im
    for i = 1:length(s)
        t = 0.0 + 0.0im
        t += f[1, 1] * a[i] * exp(1im * (delta[i, 1] + ZB[1]) * t0) * conj(y0[89]) + f[2, 1] * b[i] * exp(1im * (delta[i, 2] + ZB[1]) * t0) * conj(y0[93]) + f[3, 1] * c[i] * exp(1im * (delta[i, 3] + ZB[1]) * t0) * conj(y0[97]) + f[4, 1] * b[i] * exp(1im * (delta[i, 4] + ZB[1]) * t0) * conj(y0[101]) + f[5, 1] * c[i] * exp(1im * (delta[i, 5] + ZB[1]) * t0) * conj(y0[105]) + f[6, 1] * b[i] * exp(1im * (delta[i, 6] + ZB[1]) * t0) * conj(y0[109]) + f[7, 1] * a[i] * exp(1im * (delta[i, 7] + ZB[1]) * t0) * conj(y0[113]) + f[9, 1] * c[i] * exp(1im * (delta[i, 9] + ZB[1]) * t0) * conj(y0[121]) + f[10, 1] * b[i] * exp(1im * (delta[i, 10] + ZB[1]) * t0) * conj(y0[125]) + f[11, 1] * a[i] * exp(1im * (delta[i, 11] + ZB[1]) * t0) * conj(y0[129]) 
        t += conj(f[1, 1] * a[i]) * exp(-1im * (delta[i, 1] + ZB[1]) * t0) * y0[89] + conj(f[2, 1] * b[i]) * exp(-1im * (delta[i, 2] + ZB[1]) * t0) * y0[93] + conj(f[3, 1] * c[i]) * exp(-1im * (delta[i, 3] + ZB[1]) * t0) * y0[97] + conj(f[4, 1] * b[i]) * exp(-1im * (delta[i, 4] + ZB[1]) * t0) * y0[101] + conj(f[5, 1] * c[i]) * exp(-1im * (delta[i, 5] + ZB[1]) * t0) * y0[105] + conj(f[6, 1] * b[i]) * exp(-1im * (delta[i, 6] + ZB[1]) * t0) * y0[109] + conj(f[7, 1] * a[i]) * exp(-1im * (delta[i, 7] + ZB[1]) * t0) * y0[113] + conj(f[9, 1] * c[i]) * exp(-1im * (delta[i, 9] + ZB[1]) * t0) * y0[121] + conj(f[10, 1] * b[i]) * exp(-1im * (delta[i, 10] + ZB[1]) * t0) * y0[125] + conj(f[11, 1] * a[i]) * exp(-1im * (delta[i, 11] + ZB[1]) * t0) * y0[129] 
        t += f[1, 2] * b[i] * exp(1im * (delta[i, 1] + ZB[2]) * t0) * conj(y0[90]) + f[2, 2] * c[i] * exp(1im * (delta[i, 2] + ZB[2]) * t0) * conj(y0[94]) + f[4, 2] * c[i] * exp(1im * (delta[i, 4] + ZB[2]) * t0) * conj(y0[102]) + f[6, 2] * c[i] * exp(1im * (delta[i, 6] + ZB[2]) * t0) * conj(y0[110]) + f[7, 2] * b[i] * exp(1im * (delta[i, 7] + ZB[2]) * t0) * conj(y0[114]) + f[10, 2] * c[i] * exp(1im * (delta[i, 10] + ZB[2]) * t0) * conj(y0[126]) + f[11, 2] * b[i] * exp(1im * (delta[i, 11] + ZB[2]) * t0) * conj(y0[130]) + f[12, 2] * a[i] * exp(1im * (delta[i, 12] + ZB[2]) * t0) * conj(y0[134]) 
        t += conj(f[1, 2] * b[i]) * exp(-1im * (delta[i, 1] + ZB[2]) * t0) * y0[90] + conj(f[2, 2] * c[i]) * exp(-1im * (delta[i, 2] + ZB[2]) * t0) * y0[94] + conj(f[4, 2] * c[i]) * exp(-1im * (delta[i, 4] + ZB[2]) * t0) * y0[102] + conj(f[6, 2] * c[i]) * exp(-1im * (delta[i, 6] + ZB[2]) * t0) * y0[110] + conj(f[7, 2] * b[i]) * exp(-1im * (delta[i, 7] + ZB[2]) * t0) * y0[114] + conj(f[10, 2] * c[i]) * exp(-1im * (delta[i, 10] + ZB[2]) * t0) * y0[126] + conj(f[11, 2] * b[i]) * exp(-1im * (delta[i, 11] + ZB[2]) * t0) * y0[130] + conj(f[12, 2] * a[i]) * exp(-1im * (delta[i, 12] + ZB[2]) * t0) * y0[134] 
        t += f[1, 3] * a[i] * exp(1im * (delta[i, 1] + ZB[3]) * t0) * conj(y0[91]) + f[2, 3] * b[i] * exp(1im * (delta[i, 2] + ZB[3]) * t0) * conj(y0[95]) + f[3, 3] * c[i] * exp(1im * (delta[i, 3] + ZB[3]) * t0) * conj(y0[99]) + f[4, 3] * b[i] * exp(1im * (delta[i, 4] + ZB[3]) * t0) * conj(y0[103]) + f[5, 3] * c[i] * exp(1im * (delta[i, 5] + ZB[3]) * t0) * conj(y0[107]) + f[6, 3] * b[i] * exp(1im * (delta[i, 6] + ZB[3]) * t0) * conj(y0[111]) + f[7, 3] * a[i] * exp(1im * (delta[i, 7] + ZB[3]) * t0) * conj(y0[115]) + f[9, 3] * c[i] * exp(1im * (delta[i, 9] + ZB[3]) * t0) * conj(y0[123]) + f[10, 3] * b[i] * exp(1im * (delta[i, 10] + ZB[3]) * t0) * conj(y0[127]) + f[11, 3] * a[i] * exp(1im * (delta[i, 11] + ZB[3]) * t0) * conj(y0[131]) 
        t += conj(f[1, 3] * a[i]) * exp(-1im * (delta[i, 1] + ZB[3]) * t0) * y0[91] + conj(f[2, 3] * b[i]) * exp(-1im * (delta[i, 2] + ZB[3]) * t0) * y0[95] + conj(f[3, 3] * c[i]) * exp(-1im * (delta[i, 3] + ZB[3]) * t0) * y0[99] + conj(f[4, 3] * b[i]) * exp(-1im * (delta[i, 4] + ZB[3]) * t0) * y0[103] + conj(f[5, 3] * c[i]) * exp(-1im * (delta[i, 5] + ZB[3]) * t0) * y0[107] + conj(f[6, 3] * b[i]) * exp(-1im * (delta[i, 6] + ZB[3]) * t0) * y0[111] + conj(f[7, 3] * a[i]) * exp(-1im * (delta[i, 7] + ZB[3]) * t0) * y0[115] + conj(f[9, 3] * c[i]) * exp(-1im * (delta[i, 9] + ZB[3]) * t0) * y0[123] + conj(f[10, 3] * b[i]) * exp(-1im * (delta[i, 10] + ZB[3]) * t0) * y0[127] + conj(f[11, 3] * a[i]) * exp(-1im * (delta[i, 11] + ZB[3]) * t0) * y0[131]
        t += f[2, 4] * a[i] * exp(1im * (delta[i, 2] + ZB[4]) * t0) * conj(y0[96]) + f[3, 4] * b[i] * exp(1im * (delta[i, 3] + ZB[4]) * t0) * conj(y0[100]) + f[4, 4] * a[i] * exp(1im * (delta[i, 4] + ZB[4]) * t0) * conj(y0[104]) + f[5, 4] * b[i] * exp(1im * (delta[i, 5] + ZB[4]) * t0) * conj(y0[108]) + f[6, 4] * a[i] * exp(1im * (delta[i, 6] + ZB[4]) * t0) * conj(y0[112]) + f[8, 4] * c[i] * exp(1im * (delta[i, 8] + ZB[4]) * t0) * conj(y0[120]) + f[9, 4] * b[i] * exp(1im * (delta[i, 9] + ZB[4]) * t0) * conj(y0[124]) + f[10, 4] * a[i] * exp(1im * (delta[i, 10] + ZB[4]) * t0) * conj(y0[128])
        t += conj(f[2, 4] * a[i]) * exp(-1im * (delta[i, 2] + ZB[4]) * t0) * y0[96] + conj(f[3, 4] * b[i]) * exp(-1im * (delta[i, 3] + ZB[4]) * t0) * y0[100] + conj(f[4, 4] * a[i]) * exp(-1im * (delta[i, 4] + ZB[4]) * t0) * y0[104] + conj(f[5, 4] * b[i]) * exp(-1im * (delta[i, 5] + ZB[4]) * t0) * y0[108] + conj(f[6, 4] * a[i]) * exp(-1im * (delta[i, 6] + ZB[4]) * t0) * y0[112] + conj(f[8, 4] * c[i]) * exp(-1im * (delta[i, 8] + ZB[4]) * t0) * y0[120] + conj(f[9, 4] * b[i]) * exp(-1im * (delta[i, 9] + ZB[4]) * t0) * y0[124] + conj(f[10, 4] * a[i]) * exp(-1im * (delta[i, 10] + ZB[4]) * t0) * y0[128]
        term += s[i] * t
    end           
    return term
end
1 Like

Aren’t most of the allocations due to the fact you’re referring non-constant globals?

1 Like

What do you mean? Which one is the non-constant globals? could you say more details?

Looks very cool, I will try~ thanks

All of them. It’s the second entry of the performance tips. Also, you initialise term as a Float64 but then you add to it complex numbers. That’s another well known pit-fall. This is fixed in @rafael.guerra code above.

Yeah, I know that, I just didn’t pay more attention to it, I just start to use julia not too long ago from MATLAB, so, I am not that sensitive about the type, but I realized that, thanks

It seems like all the global variables were passed in as arguments in the original method, so the compiler should still specialize on the concrete types upon the method call. I can’t really make out anything in that massive expression, maybe there’s a global there.

Also tried just changing term = 0.0 to term = 0.0 + 0.0im in the original rel method to make term type stable, and the 300+ allocations still happen.

1 Like

There are performance issues with functions with very large (> 16, I think?) numbers of arguments, which can be an issue because a + b + c + .... turns into a single call to +(a, b, c, ....). You might want to try using a macro like this to split those up into nested calls to the two-argument + and *:

using MacroTools: @capture, prewalk

function _binarize(expr)
	if @capture(expr, f_(args__))
		if f ∈ (:+, :*)
			if length(args) > 2
				return :($f($(args[1]), $f(($(args[2:end]...)))))
			end
		end
	end
	return expr
end


macro binarize(expr)
	prewalk(_binarize, expr)
end
julia> @macroexpand @binarize 1 + 2 + 3 + 4 * 5 * 6
:(1 + (2 + (3 + 4 * (5 * 6))))
1 Like