Hi everyone!
I am new to Julia and so far, I have mostly exprimented basic things. However I realised that the way the arrays are defined may have a strong influence on the performance.
Here is the situation: let say that I have a loop, at each iteration I define a local array which may depend on global variables and then I perform some elementary array operations. Since the loop is typically big, I would like to know the fastest way to define the local array. For the application I have in mind, there are two nested loops but I don’t think that it has an influence.
Reading the Julia doc, I realised that there are many possible ways to define the same array, using semicolons, comma, spaces, etc. I initially thought that all are equivalent but it turns out that they are not and I do not fully understand why (and I did not find anything about performance in the doc). Below is a minimal working example.
First I import some packages and define some global variables.
using StaticArrays
using BenchmarkTools
a = 1.
b = 2.
c = 3.
d = 4.
x = 5.
y = 6.
z = 7.
N = 3000 # Loop size
Then, I tested seven ways to do the same thing.
- First the simplest one, using only spaces and tabs, would be
function standard_array()
for i in 1:N
for j in 1:N
A = [a b 0.5*i
c d 0.
(x*y-z) (x*y+z) (j+1.)] # The local array
u = [b,c,z*y-x^2] # Some local vector
v = A*u # A simple operation
end
end
end
- The same function can be implemented using semicolons as follows.
function semicolon_concat_array()
for i in 1:N
for j in 1:N
A = [[a ;; b ;; 0.5*i];
[c ;; d ;; 0.];
[x*y-z ;; x*y+z ;; j+1.]]
u = [b,c,z*y-x^2]
v = A*u
end
end
end
- Since the elements of the array have a cumbersome expression, I may prefer to define the columns first and then concatenate the result. Again it can be done using spaces …
function column_simple_concat_array()
for i in 1:N
for j in 1:N
e1 = [a, c, x*y-z]
e2 = [b, d, x*y+z]
e3 = [0.5*i, 0., j+1.]
A = [e1 e2 e3]
u = [b,c,z*y-x^2]
v = A*u
end
end
end
- …or semicolons.
function column_semicolon_concat_array()
for i in 1:N
for j in 1:N
e1 = [a, c, x*y-z]
e2 = [b, d, x*y+z]
e3 = [0.5*i, 0., j+1.]
A = [e1 ;; e2 ;; e3]
u = [b,c,z*y-x^2]
v = A*u
end
end
end
- If the elements are very cumbersome, then maybe even an elementwise definition would be easier to handle.
function element_array()
for i in 1:N
for j in 1:N
A = zeros(3,3)
A[1,1] = a
A[2,1] = c
A[3,1] = x*y-z
A[1,2] = b
A[2,2] = d
A[3,2] = x*y+z
A[1,3] = 0.5*i
A[2,3] = 0.
A[3,3] = j+1.
u = [b,c,z*y-x^2]
v = A*u
end
end
end
- Finally, there are also two more exotic definitions: using static arrays…
function static_smatrix()
for i in 1:N
for j in 1:N
A = @SMatrix [a b 0.5*i
c d 0.
(x*y-z) (x*y+z) (j+1.)]
u = SVector(b,c,z*y-x^2)
v = A*u
end
end
end
- …or tuples (with a byhand multiplication):
function byhand_tuple()
for i in 1:N
for j in 1:N
a11,a12,a13,a21,a22,a23,a31,a32,a33=a,b,0.5*i,c,d,0.,x*y-z,x*y+z,j+1.
u1,u2,u3 = b,c,z*y-x^2
v1 = a11*u1 + a12*u2 + a13*u3
v2 = a21*u1 + a22*u2 + a23*u3
v3 = a31*u1 + a32*u2 + a33*u3
v = v1,v2,v3
end
end
end
I used @btime
to benchmark these seven functions and here is what I obtained:
[1] Standard array ("standard_array")
7.252 s (140944979 allocations: 4.78 GiB)
[2] Semicolon concatenation ("semicolon_concat_array")
86.949 s (725944979 allocations: 23.42 GiB)
[3] Column concatenation ("column_simple_concat_array")
10.158 s (167944979 allocations: 6.39 GiB)
[4] Column concatenation with semicolon ("column_semicolon_concat_array")
30.007 s (266944979 allocations: 9.88 GiB)
[5] Elementwise definition ("element_array")
14.359 s (131944979 allocations: 4.11 GiB)
[6] Static SMatrix ("static_smatrix")
7.273 s (149944979 allocations: 3.84 GiB)
[7] Tuples ("byhand_tuple")
4.333 s (212944979 allocations: 3.31 GiB)
There are a few things I think I understand: the function [7] with tuples is the fastest one, as I expect since the tuples are immutables. Also [1] and [3] look similar but [1] is faster than [3] because it has less allocations, ok. But I still do not get the following points.
-
The definitions [2] and [4] involving semicolons are very bad, 7 to 20 times slower than the fastest definition [7]. Obviously there is something I do not understand with the semicolon operator.
-
What slows down the elementwise definition [5] ? I would expect it to be similar to [1] (with a comparable number of allocations) but it is 2 times slower.
-
Static arrays [6] do not seem to perform better than standard arrays [1]. I would expect them to be similar to tuples [7] (immutable objects and comparable numbers of allocations).
Sorry for the long post and thank you in advance for your help!