Local arrays in for loops

I’m pretty confused about type-stability. The documentation makes it sound like sometimes it’s good to tell the compiler what type variables are, but that you shouldn’t do it all the time.
All my functions and variables are type-fixed. So why should I or shouldn’t I specify the type of every variable and argument in my code?
Thanks a lot!

1 Like

I have to admit that you reached a case where solving those instabilities relied on subtle and non-intuitive changes for me. The use ntuple there to solve that part is something hard to understand (although it appears here from time to time).

The general answer to your question here is that “specifying” or, annotating, types, does not solve necessarily the problem of type instability. The instability is a property of the values the variables assume at run time.

Annotating the types required for functions can make your code not run by raising method errors if the types are not what you where expecting. That can help to track errors and indicate that something was not created with the expected type, but where the type stability has to be guaranteed is at the moment of creating the each value for each variable.

For example, the temp1c function (for which the compiler was able to infer types and non-allocate):

function temp1c(vertexList,elements)
  elementLength = length(elements[1])
  v = SVector(ntuple(i->i,elementLength))
  vNext = v.%elementLength .+ 1;
  temp1b(vertexList, elements, v, vNext)
end

You could defined it as:

function temp1c(vertexList::Vector{SVector{N,Float64}},
                elements::Vector{SVector{N,Int}}) where N

which conforms the type of variable you are feeding to the function. That will raise an error if the variables are not of those types, but it has no performance gain at all (annotating or not, the compiler will know what the types of variables are).

Yet, that can be useful to detect type-instabilities. For example, if you generated the elements vector in a type-unstable form, the function becomes horribly slow:

julia> elements = Any[ SVector(rand(1:N),rand(1:N),rand(1:N)) for i in 1:N ];

julia> @btime temp1c($vertexList, $elements)
  314.053 μs (11006 allocations: 672.13 KiB)

But if you had used the annotated version of the function, you would just get an error:

julia> @btime temp1c($vertexList, $elements)
ERROR: MethodError: no method matching temp1c(::Array{SArray{Tuple{3},Float64,1,3},1}, ::Array{Any,1})

which might help you to find where the instability happened (if it happened before that function call).

One thing: We could have figured out that in your version that used v = SVector((1:elementLength)...) there was a type-instability, by using the @code_warntype macro, which is always your friend:

julia> @code_warntype temp1c(vertexList,elements)
Variables
  #self#::Core.Compiler.Const(temp1c, false)
  vertexList::Array{SArray{Tuple{3},Float64,1,3},1}
  elements::Array{SArray{Tuple{3},Int64,1,3},1}
  elementLength::Int64
  v::Any
  vNext::Any

Body::SArray{Tuple{3},Float64,1,3}
1 ─ %1 = Base.getindex(elements, 1)::SArray{Tuple{3},Int64,1,3}
│        (elementLength = Main.length(%1))
│   %3 = (1:elementLength::Core.Compiler.Const(3, false))::Core.Compiler.Const(1:3, false)
│        (v = Core._apply_iterate(Base.iterate, Main.SVector, %3))
│   %5 = Base.broadcasted(Main.:%, v, elementLength::Core.Compiler.Const(3, false))::Any
│   %6 = Base.broadcasted(Main.:+, %5, 1)::Any
│        (vNext = Base.materialize(%6))
│   %8 = Main.temp1b(vertexList, elements, v, vNext)::SArray{Tuple{3},Float64,1,3}
└──      return %8


Note that v and vNext are inferred to be of type Any, which is bad (and red in the REPL). Those are solved changing that to the ntuple generator (to figure out how to solve that type instability would be another problem, but probably one could ask a much more specific question in the forum if we had reached that point).

2 Likes

“Statically-sized” types like SVectors or tuples require some care to ensure type stability.

Vector from the standard library does not include its length in the type signature, so collect(1:n) is always a Vector regardless of the value of n.

On the other hand, calling SVector(1:n...) produces different types for different values of n, because the length of a StaticArray is included in the type.
Ideally, if n is a compile time constant, it should be propagated, and the type of SVector(1:n...) should be inferable, but the Julia compiler currently is unable to do that :frowning:
ntuple(f, n) can be inlined for small n if n is known at compile time.
Julia compiler can detect that elementLength = length(elements[1]) is a compile time constant if elements[1] is a static array (with length <=16) and, hence, infer ntuple return type statically.

Now, to specify types or not depends on the specific use case. Here, we heavily rely on vertexList and elements containing static vectors to get rid of allocations, so specifying types is justified.
You may also write a type-generic variant for other cases which might be less performant.

3 Likes

This is complicated =P.
Ok, so how can I add this to the code and avoid allocations?

function temp1c(vertexList,elements)
  elementLength=size(elements[1],1)
  v=SVector(ntuple(i->i,elementLength))
  vNext = v .% elementLength .+ 1
  randvec = SA_F64[1,2,3]
  for k in 1:size(elements,1)
    vertices = vertexList[elements[k]];
    edges = vertices[vNext]-vertices[v];
    edgesNorm = norm.(edges);
    foo = SVector(ntuple(i->dot(edges[i],randvec),elementLength))
  end
end
1 Like

Yeah, I’m surprised that the SVector(ntuple(...)) isn’t working in this case. One workaround, which ends up being simpler as well, is to use broadcasting:

function temp1d(vertexList,elements)
  elementLength=size(elements[1],1)
  v=SVector(ntuple(i->i,elementLength))
  vNext = v .% elementLength .+ 1
  randvec = SA_F64[1,2,3]
  acc = zero(eltype(vertexList))
  for k in 1:size(elements,1)
    vertices = vertexList[elements[k]];
    edges = vertices[vNext]-vertices[v];
    edgesNorm = norm.(edges);
    foo = dot.(edges, Ref(randvec))
    acc += foo
  end
  acc
end
julia> @btime temp1d($vertexList, $elements)
  2.816 μs (0 allocations: 0 bytes)

(I added a running total of the foos because otherwise the compiler outsmarts the code and decides to skip most of it).

The relevant change here is to do:

dot.(edges, Ref(randvec))

which gives the same result as SVector([dot(edges[i], randvec) for i in 1:length(edges)]) but is much more efficient because of the careful work by the authors of the StaticArrays package. The Ref wrapper tells broadcasting to treat randvec as a scalar; that is, it tells the code that it should do dot(edges[i], randvec) instead of dot(edges[i], randvec[i]).

Here’s a simpler reduction of the issue you were seeing with ntuple in your code. I’m curious if we can figure out why it’s so slow:

julia> edges = SVector{3}(rand(SVector{3, Float64}, 3));

julia> vec = rand(SVector{3, Float64});

julia> function f1(edges, vec)
         dot.(edges, Ref(vec))
       end
f1 (generic function with 1 method)

julia> function f2(edges, vec::SVector{N}) where {N}
         SVector(ntuple(i -> dot(edges[i], vec), N))
       end
f2 (generic function with 1 method)

julia> @btime f1($edges, $vec)
  5.734 ns (0 allocations: 0 bytes)
3-element SArray{Tuple{3},Float64,1,3} with indices SOneTo(3):
 3.9926691014563733
 0.9619779502040817
 2.3644451633295263

julia> @btime f2($edges, $vec)
  62.905 ns (2 allocations: 64 bytes)
3-element SArray{Tuple{3},Float64,1,3} with indices SOneTo(3):
 3.9926691014563733
 0.9619779502040817
 2.3644451633295263
2 Likes

The good news is that the upcoming Julia 1.6 is 10x faster in the slow case and has no allocations:

julia> @btime f1($edges, $vec)
  2.640 ns (0 allocations: 0 bytes)
3-element SVector{3, Float64} with indices SOneTo(3):
 0.9928522616018985
 1.0852293528864778
 0.5121857972242163

julia> @btime f2($edges, $vec)
  6.210 ns (0 allocations: 0 bytes)
3-element SVector{3, Float64} with indices SOneTo(3):
 0.9928522616018985
 1.085229352886478
 0.5121857972242163

julia> versioninfo()
Julia Version 1.6.0-rc1
Commit a58bdd9010 (2021-02-06 15:49 UTC)

Not perfect, but definitely an improvement.

4 Likes

@rdeits Cool, I didn’t know of the Ref command!
This fixed my problem and my real code now works! Thanks a lot!
I might have to post stuff on occasion to ask how to avoid allocations, since all of the solutions (I’d call them workarounds) mentioned in this long topic are non-intuitive and I wouldn’t have come up with any of them by myself!
Thanks again, everyone!

1 Like

I think you could replace this with a static range:

v = SOneTo(elementLength) 
1 Like

Looking at error messages, I’ve come to a cleaner idea without ntuple:

function temp1d(vertexList,elements)
  L = elementLength = size(elements[1],1)
  v=SVector{L}(1:L) # type magically gets properly inferred
  vNext = v .% elementLength .+ 1
  randvec = SA_F64[1,2,3]
  acc = zero(eltype(vertexList))
  for k in 1:size(elements,1)
    vertices = vertexList[elements[k]];
    edges = vertices[vNext]-vertices[v];
    edgesNorm = norm.(edges);
    foo = map(x -> dot(x, randvec), edges)
    acc += foo
  end
  acc
end

We of course hope that things will only “improve” with time. But could these options (using ntuples, or the range, or other options) be dependent on implementation details that may change (and regress) from one Julia (or StaticArrays) version to the other? Or someone can see a deeper rationale behind these options?

I also was left with the impression that we had to resort to more or less arbitrary tricks here to deal with the stability of the types. I would be more comfortable with alternate, more “rational”, solutions, even if they relied on lower level code.

I’m not sure I agree that these are tricks, or that it’s arbitrary or unpredictable when the code is type-stable. Not here, anyway. Tuples have their lengths as part of their type, Arrays and ranges don’t.

So when you pass the length through a type that doesn’t encode size, it gets ‘lost’. It seems pretty predictable and explainable.

And the SVector{L}(1:L) is the author telling the compiler what the length is, so that makes sense too.

4 Likes

Ok, that is my point, I want to understand what was going on to be able to rationalize what happened and learn. For example:

test1(n) = SVector((1:n)...) 
#  test1(3)  -> 460.546 ns (5 allocations: 160 bytes)

test2(n) = SVector{n}((1:n)...)
# test2(3) -> 3.071 μs (25 allocations: 1.47 KiB)

test3(n) = SVector{n}(1:n)
# test3(3) ->   0.026 ns (0 allocations: 0 bytes)

test4(n) = SVector(1:n)
# test4(n) -> ERROR: The size of type `SArray{Tuple{S},T,1,S} where T where S` is not known.


Particularly it is suprising to me that test2 is worse than test1.

Also, the same strategy has to be tuned if one wanted a static array of floats:

julia> test(n) = SVector{n,Float64}(1.0:Float64(n))
test2 (generic function with 1 method)

julia> @btime test(3)
  1.149 μs (11 allocations: 672 bytes)

The ntuple approach works, though:

julia> test5(n) = SVector{n}(ntuple(i->Float64(i),n))
test5 (generic function with 1 method)

julia> @btime test5(3)
  0.026 ns (0 allocations: 0 bytes)

In all of your tests, the performance depends on inlining.
Try benchmarking with

julia> @btime test(n[]) setup=(n = Ref(3))

You need the size as a compile-time constant to eliminate allocations. That means, none of these functions are type-stable in general. If they happen to be, it’s all due to compiler inlining and constant propagation. That means, they may be type-stable in some contexts. You see a similar behavior with struct access, actually: an expression like a.x is in fact a function call getproperty(a, :x) and is not type-stable in general. But if that :x is hardcoded, the compiler propagates it as constant and can infer the return type statically.

test(v::SVector{N}) where N = SVector{N}(1:N) has N encoded in the argument type, not value, hence it’s type stable. Interestingly, it is faster than test(v::SVector{N}) where N = SVector{N}(SOneTo(N)) on my machine.

2 Likes

That didn’t help :frowning: . Actually the range option got worst.

Summary
julia> test1(n) = SVector((1:n)...)
test1 (generic function with 1 method)

julia> @btime test1(n[]) setup=(n=Ref(3))
  466.612 ns (6 allocations: 192 bytes)
3-element SArray{Tuple{3},Int64,1,3} with indices SOneTo(3):
 1
 2
 3

julia> test2(n) = SVector{n}((1:n)...)
test2 (generic function with 1 method)

julia> @btime test2(n[]) setup=(n=Ref(3))
  3.897 μs (35 allocations: 2.06 KiB)
3-element SArray{Tuple{3},Int64,1,3} with indices SOneTo(3):
 1
 2
 3

julia> test3(n) = SVector{n}(1:n)
test3 (generic function with 1 method)

julia> @btime test3(n[]) setup=(n=Ref(3))
  875.059 ns (11 allocations: 640 bytes)
3-element SArray{Tuple{3},Int64,1,3} with indices SOneTo(3):
 1
 2
 3

That is what I understand :slight_smile: , and that is why all my efforts to avoid allocations where in the direction of putting the length of the elements as a parameter of the types the functions. But that was not finally the best solution.

I don’t think the timing was supposed to get faster, but more accurate.

2 Likes

Timings are identical for me.