I have recently tried writing some code and need help with vectorization. I like that Julia’s loops are efficient, but sometimes I want to write less verbose code. How to do it in this case? What are the best practices? Below is an example of what I mean:

Imagine you have a vector a and you are interested in knowing the sum of a[i]/(1 + a[i]). If you want to get it done, you might be tempted to write:

function f1(a)
return sum( a ./ (1. .+ a))
end

But you might also write

function f2(a)
Val = 0.0
for i in each index(a)
Val += a[i] / (1. + a[i])
end
return value
end

Ignoring any typing issues, that is, a is an array of Float64.

Then you have

using BenchmarkTools
a = rand(9000000)
@btime f1(a)
# 22.670 ms (3 allocations: 68.66 MiB)
@btime f2(a)
# 15.136 ms (1 allocation: 16 bytes)

Which is a slight difference, but it quickly adds up.

How to make the f1 as efficient memory and time-wise?

mapreduce likely uses optimizations along the lines of

julia> function f2(a)
vOut = 0.0
@simd for a1 in a
vOut += a1 / (1.0 + a1)
end
return vOut
end
f2 (generic function with 1 method)
julia> @btime f2($a)
2.114 ms (0 allocations: 0 bytes)
2.76175963830546e6

If you want to stick to one liners and vectorized operations, I suggest using LoopVectorization and LazyArrays. When you have to sum a broadcasted operation, LazyArrays is usually the best option, it achieves almost the same speed as mapreduce.

Specifically:

using BenchmarkTools
a = rand(1_000)
#baseline
f1(a) = sum( a ./ (1. .+ a))
@btime f1($a)
#with a generator
f2(a) = sum(x / (1. + x) for x in a)
@btime f2($a)
# with lazyarray (notice @~ in the broadcasted part)
using LazyArrays
f3(a) = sum(@~ a ./ (1. .+ a))
@btime f3($a)
#with @turbo
using LoopVectorization
f4(a) = sum(@turbo a ./ (1. .+ a))
@btime f4($a)
# with mapreduce
f5(a) = mapreduce(x -> x/(1+x), +, a)
@btime f5($a)