Confusion on performance when using the broadcasting macro @. vs explicit . operators

Note that full broadcasting isn’t always the fastest option. Notice that, with full dots, it computes 2*N^2 squared differences and N^2 exps. It can actually get away with N times fewer, the only thing it needs N^2 of are multiplications. Your first version (with the missing .) partially realized this, which was why it was faster. Here’s a version that goes further, spending a little extra memory to save a lot of computations:

function get_f3(θ, ϕ)
  α0 = 1.5
  θ1 = π/4
  ϕ1 = π
  σ = 0.2

  expΔθ²σ² = exp.(abs2.(θ .- θ1) ./ (-2 * σ ^ 2))
  expΔϕ²σ² = exp.(abs2.(ϕ .- ϕ1) ./ (-2 * σ ^ 2))

  return α0 .* expΔθ²σ² .* expΔϕ²σ² # could factor the α0 into one of the other terms to save a tiny bit more
end
julia> @btime get_f($θ, $ϕ');
  31.066 μs (7 allocations: 80.02 KiB)

julia> @btime get_f1($θ, $ϕ');
  52.665 μs (3 allocations: 78.20 KiB)

julia> @btime get_f3($θ, $ϕ');
  4.783 μs (7 allocations: 80.03 KiB)
4 Likes