Size vs length

isentropic · April 6, 2020, 9:54am

I wonder why the performance of size is so much faster when it comes to matrices:

julia> a = rand(100,2)
100×2 Array{Float64,2}:
 0.47076    0.605881
 0.0432887  0.318102
 0.12271    0.121367
 0.0737559  0.885239
 0.389849   0.164043
 0.468373   0.703576
 0.943903   0.189336
 0.9317     0.852962
 0.58266    0.830349
 0.440795   0.436561
 0.504023   0.76458
 ⋮
 0.0640017  0.0812839
 0.985068   0.228459
 0.94799    0.547918
 0.951806   0.6971
 0.848779   0.19903
 0.450803   0.315992
 0.155962   0.0482648
 0.488436   0.954154
 0.163412   0.36779
 0.837305   0.675525
 0.798766   0.257736

julia> @benchmark length(a[:,1])
BenchmarkTools.Trial:
  memory estimate:  912 bytes
  allocs estimate:  2
  --------------
  minimum time:     439.081 ns (0.00% GC)
  median time:      466.611 ns (0.00% GC)
  mean time:        493.373 ns (2.84% GC)
  maximum time:     4.766 μs (89.33% GC)
  --------------
  samples:          10000
  evals/sample:     198

julia> @benchmark size(a)
BenchmarkTools.Trial:
  memory estimate:  32 bytes
  allocs estimate:  1
  --------------
  minimum time:     15.753 ns (0.00% GC)
  median time:      16.885 ns (0.00% GC)
  mean time:        19.617 ns (7.53% GC)
  maximum time:     2.197 μs (99.08% GC)
  --------------
  samples:          10000
  evals/sample:     998

DNF · April 6, 2020, 10:00am

It has nothing to do with size. It takes extra time because a[:, 1] creates a temporary array. If you want the size of only the first dimension you should write

size(a, 1)

size and length have identical performance for arrays:

julia> using BenchmarkTools

julia> @btime length(x) setup=(x=rand(10,12));
  1.647 ns (0 allocations: 0 bytes)

julia> @btime size(x, 1) setup=(x=rand(10,12));
  1.647 ns (0 allocations: 0 bytes)

julia> @btime size(x) setup=(x=rand(10,12));
  1.647 ns (0 allocations: 0 bytes)

BTW: in case you brought this habit with you from another programming language, you should probably not do this in that other language either. In matlab you should write size(a, 1), exactly like in Julia, and in Python, it should be numpy.size(a, 0), or a.shape[0].

isentropic · April 6, 2020, 10:14am

Wouldn’t creating a view fix this?

julia> @benchmark length(@view a[:,1])
BenchmarkTools.Trial:
  memory estimate:  64 bytes
  allocs estimate:  2
  --------------
  minimum time:     279.348 ns (0.00% GC)
  median time:      284.824 ns (0.00% GC)
  mean time:        293.145 ns (0.86% GC)
  maximum time:     6.973 μs (95.24% GC)
  --------------
  samples:          10000
  evals/sample:     296

as views do not allocate?

DNF · April 6, 2020, 10:15am

It still takes time, creating the view itself also has a cost. And anyway it is non-idiomatic, and a bit pointless, when there is already a correct way to do it.

isentropic · April 6, 2020, 10:17am

Do you then suggest always using size(a, dim) over slicing the array and then taking length?

DNF · April 6, 2020, 10:20am

Oh, yes, absolutely.

Sometimes the performance can be the same, if the compiler realizes that it doesn’t need to actually create the view, and skips straight to getting the result. But you cannot rely on it always, and besides it is un-idiomatic and confusing to readers.

Henrique_Becker · May 11, 2020, 3:19pm

Also, I need to note. Views do allocate. Just sometimes they are elided. If you want to have views that surely do not allocate use UnsafeArrays.jl.

mbauman · May 11, 2020, 3:41pm

Or just wait with great anticipation for Julia 1.5!

Tamas_Papp · May 11, 2020, 3:43pm

Or clone the repo today and just use it

mbauman · May 11, 2020, 3:44pm

When using BenchmarkTools, you want to be careful about how you refer to variables. The key thing to remember is that BenchmarkTools is trying to measure the performance as that snippet would behave inside a function. You want to use a $ to flag the a as being a local variable instead of a global (and thus type-unstable) reference:

julia> @benchmark length(@view a[:,1])
BenchmarkTools.Trial:
  memory estimate:  64 bytes
  allocs estimate:  2
  --------------
  minimum time:     138.676 ns (0.00% GC)
  median time:      142.613 ns (0.00% GC)
  mean time:        148.707 ns (0.66% GC)
  maximum time:     1.140 μs (87.63% GC)
  --------------
  samples:          10000
  evals/sample:     842

julia> @benchmark length(@view $a[:,1])
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     2.034 ns (0.00% GC)
  median time:      2.048 ns (0.00% GC)
  mean time:        2.074 ns (0.00% GC)
  maximum time:     16.413 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

The above is on Julia 1.5 where views are now always non-allocating, but even on Julia 1.4 the compiler was able to see that the view wasn’t acually needed when you’re only computing the length.

jtravs · May 12, 2020, 12:16pm

I can’t find anything about this in the NEWS for 1.5. Do you have a reference to the pull request?

non-Jedi · May 12, 2020, 1:06pm

The issue for non-allocating array views was Want non-allocating array views · Issue #14955 · JuliaLang/julia · GitHub which was closed by the more general PR below.

https://github.com/JuliaLang/julia/pull/34126

I think adding a note to the NEWS entry for the above PR (that it enables non-allocating array views) would be a helpful thing to create a pull request for. Would have to be to the release-1.5 branch as I understand it; or maybe the backports-release-1.5 branch?

jtravs · May 12, 2020, 1:24pm

Thanks! I probably messed something up, but see here: Note that views no longer allocate in NEWS by jtravs · Pull Request #35851 · JuliaLang/julia · GitHub

mbauman · May 12, 2020, 1:28pm

I know Jeff is planning on talking about that and has been working on the release notes and maybe even a blog post… note that this is the direct result of the bullet point you put it below!

Immutable structs (including tuples) that contain references can now be allocated on the stack, and allocated inline within arrays and other structs (#33886). This significantly reduces the number of heap allocations in some workloads. Code that requires assumptions about object layout and addresses (usually for interoperability with C or other languages) might need to be updated; for example any object that needs a stable address should be a mutable struct.

The most exciting part is that it’s far more general than just array views.