# pre-allocating array is slow

I tried three ways of calculating the first difference of a series. To my disappointment, the 3rd method is 100 times as slow as the other two. So, in such a case, using the first two methods are better?

``````using BenchmarkTools
using Random

A=randn(1000)

println("B1")
@btime B1=A[2:end]-A[1:end-1]

println("B2")
@btime B2=@view(A[2:end])-@view(A[1:end-1])

println("B3")

@btime begin
B3=Array{Float64}(undef, length(A)-1)
@inbounds @simd for i in 1:length(A)-1
B3[i]=A[i+1]-A[i]
end
end
``````

####################
output:

``````B1
1.330 μs (8 allocations: 23.92 KiB)
B2
646.524 ns (8 allocations: 8.14 KiB)
B3
121.100 μs (5467 allocations: 108.97 KiB)
``````
2 Likes

Welcome! Two things:

• Please mark your code as such with ````` backtick blocks so it formats nicely here
• Take a look at the Performance Tips manual page and in particular the very first section: You want to ensure the code you benchmark is inside a function. Do that and it should be the fastest of all three options.

What Matt said:

``````julia> function test(A)
B = Array{Float64}(undef, length(A) - 1)
@inbounds @simd for i ∈ length(A) - 1
B[i] = A[i+1] - A[i]
end
return B
end
test (generic function with 1 method)

julia> @btime test(\$A);
111.547 ns (1 allocation: 7.94 KiB)
``````
2 Likes

But also note that `B2` is pretty much the same speed if it’s in a function, too. Pre-allocating here would only be a big win if you can do it outside of your performance-critical (or benchmarked) section of code.

1 Like

All tests should be inside functions and the arguments should be interpolated (i.e., the `\$` in `@btime test(\$A);`).

Ok. Thanks.

I tested the following four functions:

``````A=randn(1000)

function test1(A)
B1=A[2:end]-A[1:end-1]
return B1
end

function  test2(A)
B2=@view(A[2:end])-@view(A[1:end-1])
return B2
end

function test3(A)
B = Array{Float64}(undef, length(A) - 1)
@inbounds @simd for i ∈ length(A) - 1
B[i] = A[i+1] - A[i]
end
return B
end

function test4(A)
B = Array{Float64}(undef, length(A) - 1)
for i ∈ length(A) - 1
B[i] = A[i+1] - A[i]
end
return B
end
``````

Then I tested each of them using commands such as @btime test1(\$A). The results are:

julia> @btime test1(\$A)
1.200 μs (3 allocations: 23.81 KiB)

@btime test2(\$A)
512.315 ns (1 allocation: 7.94 KiB)

julia> @btime test3(\$A)
162.712 ns (1 allocation: 7.94 KiB)

julia> @btime test4(\$A)
160.805 ns (1 allocation: 7.94 KiB)

So, the pre-allocating one is the fastest.
It looks like

``````@inbounds @simd
``````

doesn’t improve performance.

2 Likes

By the way, there is a bug in the loop functions. The `for i in x` syntax needs something iterable (which numbers are in Julia, so the code actually runs, but the output is wrong, i.e. `test1(A) == test2(A) != test3(A) != test4(A)`):

``````julia> for i ∈ 100
println(i)
end

100
``````

After replacing it with `for i in eachindex(B)` I get timings for `test3` and `test4` which are a lot closer to `test2` and also to `diff(A)`, which is exactly what the function does, no?

2 Likes

You are right. After I change code

``````    @inbounds @simd for i ∈ length(A) - 1
``````

to

``````    @inbounds @simd for i ∈ 1: length(A) - 1
``````

The results are:
julia> @btime test1(\$A)
1.150 μs (3 allocations: 23.81 KiB)

@btime test2(\$A)
479.899 ns (1 allocation: 7.94 KiB)

julia> @btime test3(\$A)
623.626 ns (1 allocation: 7.94 KiB)

julia> @btime test4(\$A)
602.162 ns (1 allocation: 7.94 KiB)

Then I tried

``````c:\> julia -t 2   #changing 2 to 8 doesn't make any difference
``````

Function test4 doesn’t improve, but test3 is as fast as test2.

That doesn’t have to do with `-t 2`, simd does not depend on multi-threading.