How to remove extra allocation when doing point-wise assignment?

Hello, I found that if exists assignment, the point-wise loop will introduce lots of additional allocation (on Julia 1.0),

julia> function f1(a, b, c)
           @. t1 = a * b / c
       end
f1 (generic function with 1 method)

julia> function f2(a, b, c)
           for j = 1: size(a, 2), i = 1: size(a, 1)
               t1[i,j] = a[i,j] * b[i,j] / c[i,j]
           end
       end
f2 (generic function with 1 method)

julia> t1, x1, x2, x3 = [rand(10000, 5000) for _ in 1: 4];

julia> f1(x1, x2, x3); f2(x1, x2, x3);

julia> @time f1(x1, x2, x3);
  0.095687 seconds (8 allocations: 256 bytes)

julia> @time f2(x1, x2, x3);
  2.002025 seconds (142.34 M allocations: 2.121 GiB, 3.33% gc time)

As in comparison, the point-wise loop beats the broadcast fusion if no assignment:

julia> function g1(a, b, c)
           @. a * b / c
       end
g1 (generic function with 1 method)

julia> function g2(a, b, c)
           for j = 1: size(a, 2), i = 1: size(a, 1)
               a[i,j] * b[i,j] / c[i,j]
           end
       end
g2 (generic function with 1 method)

julia> g1(x1, x2, x3); g2(x1, x2, x3);

julia> @time g1(x1, x2, x3);
  0.299063 seconds (6 allocations: 381.470 MiB, 9.51% gc time)

julia> @time g2(x1, x2, x3);
  0.033901 seconds (4 allocations: 160 bytes)

Is there anything I am doing improperly? I may need the point-wise loop for implementing SharedArray-based parallelism. Thank you!

You should generally avoid modifying a global variable (your t1) within a function. Also, it’s better to use BenchmarkTools.jl’s @btime to benchmark your functions performance.

4 Likes

Avoid that global t1 as pointed out by @carstenbauer, and they are essentially equal:

using BenchmarkTools

function f1(t1, a, b, c)
  @. t1 = a * b / c
end

function f2(t1, a, b, c)
  for i in eachindex(a)
    t1[i] = a[i]b[i] / c[i]
  end
end

function ff()
  t1, x1, x2, x3 = [rand(10000,5000) for _ in 1:4]
  @btime f1($t1, $x1, $x2, $x3)
  @btime f2($t1, $x1, $x2, $x3)
end

And

ff()

   100.161 ms (0 allocations: 0 bytes)
   100.087 ms (0 allocations: 0 bytes)

1 Like