I’m computing a derivative from a summation. I thought vectorizing the function would be faster than a loop, but it isn’t the case.
using BenchmarkTools, Random
ωc = 0.05
chi = 0.01
const k = ωc^2
const couple = sqrt(2/ωc^3) * chi
const coeff = reshape([1.359e+00, 8.143e-01, 3.142e+00, 7.304e-02, 3.391e+00,
3.142e+00, 3.099e-01, 2.009e+00, -3.142e+00, 1.714e-02, 4.847e+00,
3.142e+00, 4.495e-03, 6.291e+00, 3.142e+00], 3, 5)
function fitted(x::T) where T<:Real
u = 0.0
for i in 1:5
u += coeff[1, i] * sin(coeff[2, i] * x + coeff[3, i])
end
return u
end
function ∂U∂q1(x::AbstractVector{T}) where T<:Real
∑μ = 0.0
for i in 1:length(x)-1
∑μ += fitted(x[i])
end
return k * (couple * ∑μ + x[end])
end
function ∂U∂q2(x::AbstractVector{T}) where T<:Real
∑μ = sum(fitted.(view(x, 1:length(x)-1)))
return k * (couple * ∑μ + x[end])
end
n = 100
x = Random.rand(n)
@btime ∂U∂q1(x)
@btime ∂U∂q2(x)
I am no expert on this, but I think we/people (me included) assume that vectorization makes things faster, because that is the case in certain languages such as R.
To quote the julia docs:
In R, performance requires vectorization. In Julia, almost the opposite is true: the best performing code is often achieved by using devectorized loops.
One thing that inevitably comes up in these threads is that your naive summing algorithm is different than what Julia’s sum uses, which is a more accurate floating point summation algorithm. If you are certain that your problem won’t encounter any of the errors that plague the simple straight summation, then that is fine. If you might encounter catastrophic cancellation or anything like that, go with Base sum.
I think it’s unlikely to happen. It might happen for this trivial example code.
But in my reall code, x[2:end-1] are all around a certain value, so they have pretty much the same output from the fitted function.
Yup, “vectorizing”, in the sense that Matlab and numpy use it, is generally neither necessary nor useful in Julia. If you need a loop, just use a loop!
Both indeed get rid of the large allocation. And yes, they are slightly slower, but surely more readable than what I wrote.
I think view’s allocation problem is solved on 1.5, but on 1.4.1 there are still 1 or 2 allocations.
As others said, sum is more accurate because it takes care of rounding. Also, you don’t need to use broadcasting and views to avoid allocations completely. Note that sum accepts a function as first argument, so you can simply do:
function ∂U∂q2(x::AbstractVector{T}) where T<:Real
# ∑μ = sum(fitted.(view(x, 1:length(x)-1)))
∑μ = sum(fitted, x[i] for i = 1:length(x)-1)
return k * (couple * ∑μ + x[end])
end
which when timed in a script, can yield same speed as loops: