# What is idiomatic way of vectorizing longer operations in Julia?

Hello.

I have recently tried writing some code and need help with vectorization. I like that Julia’s loops are efficient, but sometimes I want to write less verbose code. How to do it in this case? What are the best practices? Below is an example of what I mean:

Imagine you have a vector `a` and you are interested in knowing the sum of `a[i]/(1 + a[i])`. If you want to get it done, you might be tempted to write:

``````function f1(a)
return sum( a ./ (1. .+ a))
end
``````

But you might also write

``````function f2(a)
Val = 0.0
for i in each index(a)
Val += a[i] / (1. + a[i])
end
return value
end
``````

Ignoring any typing issues, that is, `a` is an array of Float64.

Then you have

``````using BenchmarkTools

a = rand(9000000)

@btime f1(a)
# 22.670 ms (3 allocations: 68.66 MiB)

@btime f2(a)
# 15.136 ms (1 allocation: 16 bytes)
``````

Which is a slight difference, but it quickly adds up.

How to make the `f1` as efficient memory and time-wise?

``````julia> @btime mapreduce(x -> x/(1+x), +, a)
2.042 ms (1 allocation: 16 bytes)
2.7611581397048105e6
``````

`mapreduce` seems pretty good. probably there is more wizardry to be done for another factor of 2 or so if you’re willing to try a lot harder

1 Like

``````function f3(a)
return sum(x -> (x / (1+x)), a)
end

@btime f3(\$a)
2.043 ms (0 allocations: 0 bytes)
``````

But your question is probably more general. `Tullio.jl` is one way of writing complex loops in a very compact format.

What??? Very cool, thank you!

Why is `mapreduce` faster than the for loop?

Are there other functions like that I should know about?

Also I feel it loses a bit of the readability that the vectorization has

`mapreduce` likely uses optimizations along the lines of

``````julia> function f2(a)
vOut = 0.0
@simd for a1 in a
vOut += a1 / (1.0 + a1)
end
return vOut
end
f2 (generic function with 1 method)

julia> @btime f2(\$a)
2.114 ms (0 allocations: 0 bytes)
2.76175963830546e6
``````

Now it’s the same speed at f3

1 Like

If you want to stick to one liners and vectorized operations, I suggest using `LoopVectorization` and `LazyArrays`. When you have to sum a broadcasted operation, LazyArrays is usually the best option, it achieves almost the same speed as `mapreduce`.

Specifically:

``````using BenchmarkTools
a = rand(1_000)

#baseline
f1(a) =  sum( a ./ (1. .+ a))
@btime f1(\$a)

#with a generator
f2(a) = sum(x / (1. + x) for x in a)
@btime f2(\$a)

# with lazyarray (notice @~ in the broadcasted part)
using LazyArrays
f3(a) =  sum(@~ a ./ (1. .+ a))
@btime f3(\$a)

#with @turbo
using LoopVectorization
f4(a) = sum(@turbo a ./ (1. .+ a))
@btime f4(\$a)

# with mapreduce
f5(a) = mapreduce(x -> x/(1+x), +, a)
@btime f5(\$a)
``````

with time

``````  693.377 ns (1 allocation: 7.94 KiB) # baseline
889.583 ns (0 allocations: 0 bytes) # generator
517.188 ns (0 allocations: 0 bytes) # Lazy Array option
551.875 ns (1 allocation: 7.94 KiB) # LoopVectorization
454.315 ns (0 allocations: 0 bytes) # mapreduce
``````

I wrote a week ago how to keep a balance between speed and clear code, with emphasis on vectorized operations. See here