I am new to Julia and so far, I have mostly exprimented basic things. However I realised that the way the arrays are defined may have a strong influence on the performance.
Here is the situation: let say that I have a loop, at each iteration I define a local array which may depend on global variables and then I perform some elementary array operations. Since the loop is typically big, I would like to know the fastest way to define the local array. For the application I have in mind, there are two nested loops but I don’t think that it has an influence.
Reading the Julia doc, I realised that there are many possible ways to define the same array, using semicolons, comma, spaces, etc. I initially thought that all are equivalent but it turns out that they are not and I do not fully understand why (and I did not find anything about performance in the doc). Below is a minimal working example.
First I import some packages and define some global variables.
using StaticArrays using BenchmarkTools a = 1. b = 2. c = 3. d = 4. x = 5. y = 6. z = 7. N = 3000 # Loop size
Then, I tested seven ways to do the same thing.
- First the simplest one, using only spaces and tabs, would be
function standard_array() for i in 1:N for j in 1:N A = [a b 0.5*i c d 0. (x*y-z) (x*y+z) (j+1.)] # The local array u = [b,c,z*y-x^2] # Some local vector v = A*u # A simple operation end end end
- The same function can be implemented using semicolons as follows.
function semicolon_concat_array() for i in 1:N for j in 1:N A = [[a ;; b ;; 0.5*i]; [c ;; d ;; 0.]; [x*y-z ;; x*y+z ;; j+1.]] u = [b,c,z*y-x^2] v = A*u end end end
- Since the elements of the array have a cumbersome expression, I may prefer to define the columns first and then concatenate the result. Again it can be done using spaces …
function column_simple_concat_array() for i in 1:N for j in 1:N e1 = [a, c, x*y-z] e2 = [b, d, x*y+z] e3 = [0.5*i, 0., j+1.] A = [e1 e2 e3] u = [b,c,z*y-x^2] v = A*u end end end
- …or semicolons.
function column_semicolon_concat_array() for i in 1:N for j in 1:N e1 = [a, c, x*y-z] e2 = [b, d, x*y+z] e3 = [0.5*i, 0., j+1.] A = [e1 ;; e2 ;; e3] u = [b,c,z*y-x^2] v = A*u end end end
- If the elements are very cumbersome, then maybe even an elementwise definition would be easier to handle.
function element_array() for i in 1:N for j in 1:N A = zeros(3,3) A[1,1] = a A[2,1] = c A[3,1] = x*y-z A[1,2] = b A[2,2] = d A[3,2] = x*y+z A[1,3] = 0.5*i A[2,3] = 0. A[3,3] = j+1. u = [b,c,z*y-x^2] v = A*u end end end
- Finally, there are also two more exotic definitions: using static arrays…
function static_smatrix() for i in 1:N for j in 1:N A = @SMatrix [a b 0.5*i c d 0. (x*y-z) (x*y+z) (j+1.)] u = SVector(b,c,z*y-x^2) v = A*u end end end
- …or tuples (with a byhand multiplication):
function byhand_tuple() for i in 1:N for j in 1:N a11,a12,a13,a21,a22,a23,a31,a32,a33=a,b,0.5*i,c,d,0.,x*y-z,x*y+z,j+1. u1,u2,u3 = b,c,z*y-x^2 v1 = a11*u1 + a12*u2 + a13*u3 v2 = a21*u1 + a22*u2 + a23*u3 v3 = a31*u1 + a32*u2 + a33*u3 v = v1,v2,v3 end end end
@btime to benchmark these seven functions and here is what I obtained:
 Standard array ("standard_array") 7.252 s (140944979 allocations: 4.78 GiB)  Semicolon concatenation ("semicolon_concat_array") 86.949 s (725944979 allocations: 23.42 GiB)  Column concatenation ("column_simple_concat_array") 10.158 s (167944979 allocations: 6.39 GiB)  Column concatenation with semicolon ("column_semicolon_concat_array") 30.007 s (266944979 allocations: 9.88 GiB)  Elementwise definition ("element_array") 14.359 s (131944979 allocations: 4.11 GiB)  Static SMatrix ("static_smatrix") 7.273 s (149944979 allocations: 3.84 GiB)  Tuples ("byhand_tuple") 4.333 s (212944979 allocations: 3.31 GiB)
There are a few things I think I understand: the function  with tuples is the fastest one, as I expect since the tuples are immutables. Also  and  look similar but  is faster than  because it has less allocations, ok. But I still do not get the following points.
The definitions  and  involving semicolons are very bad, 7 to 20 times slower than the fastest definition . Obviously there is something I do not understand with the semicolon operator.
What slows down the elementwise definition  ? I would expect it to be similar to  (with a comparable number of allocations) but it is 2 times slower.
Static arrays  do not seem to perform better than standard arrays . I would expect them to be similar to tuples  (immutable objects and comparable numbers of allocations).
Sorry for the long post and thank you in advance for your help!