Does this mean that storing vectors as x(n,) rather than x(n,1) has performance benefits?
I’ll try again: Is there any advantage in having 1-D arrays as x(n,) over x(n,1)? Matlab programmers use x(n,1) by instinct. Are these (we) people paying a performance penalty for that?
(I think you’re confusing people by using Fortran notation here.)
The memory storage format is exactly the same for a Vector
(1d Array
) or a Matrix
(2d Array
) with one column, and in both cases Julia allows you to do 1d “linear” indexing x[i]
with no performance penalty. In fact, vectors as well as 1-column matrices in Julia can also be indexed as x[i,1]
, though this could be very slightly slower because of the extra index computation (which may get optimized away by the compiler in the Vector
case, however). But there is a useful semantic distinction between 1d arrays and matrices that is erased in Matlab where nearly everything (including scalars!) is a matrix.
(Julia only allows you to resize a 1d Array
, but it is possible to implement resizable multidimensional arrays; see ElasticArrays.jl.)
Yes, except when they use 1xn
by instinct. The inconsistent use of 1xn
and nx1
‘vectors’ in Matlab is a major source of confusion and frustration (and bugs), which you avoid completely by consistently using a proper vector type. I highly recommend it, independently of any performance considerations.
Is there any advantage in having 1-D arrays as x(n,) over x(n,1)?
For both Julia and Matlab, there shouldn’t be any performance penalty for a 1-D array, as everything is contiguous in memory. So even in Matlab, it doesn’t really matter if Nx1 vs. 1xN. (Note also that in Matlab, 1:N
defaults to a 1xN.)
I suspect you are getting at multidimensional arrays, where this matters more. Memory access is slow, and CPU caches usually fetch contiguous memory. You gain more cache benefit by accessing contiguous memory, which for both Matlab and Julia would be column order. So an NxM array might be quicker to access as columns than rows, and you might design the array or algorithm accordingly.
Note that this depends on what you’re storing. Julia can have Array{Any,1}
with elements of varying type and size. Speed of memory access can be trickier, e.g. when a single element exceeds cache size. But if we’re talking about N-dimensional arrays of uniform, “typical” (e.g., Float64
s and other isbits
) elements, it is helpful to anticipate column ordering.
Nah. I was asking something far more mundane. I’m a pretty decent matlab programmer and it never occurred to me that (n,) was any different from (n,1) in Julia. This thread, as the Zen people say, gave rise to doubt. So I asked, got a pretty clear no, and have walked away happy.
One thing I liked about the older versions of Matlab was that the only data type was the double precision complex array. This is Fortran-like brutality, which some of us appreciate.
May all your constants be Hollerith,
— Tim
But they are different.
The compiler has more information about a vector than a matrix: it knows the second dimension is 1
. This can yield better performance in cases such as:
julia> x = rand(128); X = reshape(x, (length(x),1));
julia> typeof.((x, X))
(Array{Float64,1}, Array{Float64,2})
julia> A = rand(32,128); B = similar(A);
julia> @benchmark @. $B = $A + $x'
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 773.423 ns (0.00% GC)
median time: 784.433 ns (0.00% GC)
mean time: 792.276 ns (0.00% GC)
maximum time: 1.226 μs (0.00% GC)
--------------
samples: 10000
evals/sample: 104
julia> @benchmark @. $B = $A + $X'
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 805.562 ns (0.00% GC)
median time: 816.124 ns (0.00% GC)
mean time: 821.296 ns (0.00% GC)
maximum time: 1.146 μs (0.00% GC)
--------------
samples: 10000
evals/sample: 89
LoopVectorization is more extreme:
julia> using LoopVectorization
julia> @benchmark @avx @. $B = $A + $x'
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 592.760 ns (0.00% GC)
median time: 594.268 ns (0.00% GC)
mean time: 597.365 ns (0.00% GC)
maximum time: 800.821 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 179
julia> @benchmark @avx @. $B = $A + $X'
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 1.022 μs (0.00% GC)
median time: 1.027 μs (0.00% GC)
mean time: 1.028 μs (0.00% GC)
maximum time: 3.066 μs (0.00% GC)
--------------
samples: 10000
evals/sample: 10
LLVM will often do “multiversioning”, which means creating extra loops for special cases that can be optimized more, such as the matrix being N
x1
. When LLVM does create code optimized for that special case, the performance difference might not be so large.
LoopVectorization doesn’t, hence we get the larger difference there.
But you have to pick and choose the cases that perform well.
I think LLVM will be faster than LoopVectorization for this next example if you don’t have AVX512 (I’m working on something that should let me address this in the next few weeks!), but if you do:
julia> C = rand(128,32);
julia> @benchmark @. $B = $A + $C'
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 2.534 μs (0.00% GC)
median time: 2.543 μs (0.00% GC)
mean time: 2.546 μs (0.00% GC)
maximum time: 3.700 μs (0.00% GC)
--------------
samples: 10000
evals/sample: 9
julia> @benchmark @avx @. $B = $A + $C'
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 1.299 μs (0.00% GC)
median time: 1.308 μs (0.00% GC)
mean time: 1.309 μs (0.00% GC)
maximum time: 3.969 μs (0.00% GC)
--------------
samples: 10000
evals/sample: 10
My point being that there are a lot of trade offs in the kinds of cases you optimize for.
Giving the compiler more information, e.g. by using a Vector
instead of a matrix where one axis equals 1
, will help it.