Note that full broadcasting isn’t always the fastest option. Notice that, with full dots, it computes 2*N^2 squared differences and N^2 exps. It can actually get away with N times fewer, the only thing it needs N^2 of are multiplications. Your first version (with the missing .) partially realized this, which was why it was faster. Here’s a version that goes further, spending a little extra memory to save a lot of computations:
function get_f3(θ, ϕ)
α0 = 1.5
θ1 = π/4
ϕ1 = π
σ = 0.2
expΔθ²σ² = exp.(abs2.(θ .- θ1) ./ (-2 * σ ^ 2))
expΔϕ²σ² = exp.(abs2.(ϕ .- ϕ1) ./ (-2 * σ ^ 2))
return α0 .* expΔθ²σ² .* expΔϕ²σ² # could factor the α0 into one of the other terms to save a tiny bit more
end
julia> @btime get_f($θ, $ϕ');
31.066 μs (7 allocations: 80.02 KiB)
julia> @btime get_f1($θ, $ϕ');
52.665 μs (3 allocations: 78.20 KiB)
julia> @btime get_f3($θ, $ϕ');
4.783 μs (7 allocations: 80.03 KiB)