Local arrays in for loops

Ribeiro · February 28, 2021, 10:47pm

Hello,
Is there a nice way to use vectors in arrays without spending too much time on allocations?
I have some parallel for loops and if I set 3 scalar variables to certain values, it seems to perform better than if I set one array with 3 elements (and it works in CUDA kernels). I don’t really know why the two are different at all.
It would make my code neater if I could use arrays. I understand StaticArrays are different from regular arrays for some reason. However, I would need to mutate them and have Vectors of StaticArrays.
Here’s an example of what I do in the for loop:

point1 = points[element[k][1]]; point2 = points[element[k][2]]; point3 = points[element[k][3]];
e1 = point2-point1; e2 = point3-point2; e3 = point1-point3;
E1 = norm(e1); E2 = norm(e2); E3 = norm(e3);

Note that each point is a StaticArray. I also do more complicated stuff and function calls.
I’d love to do:

vertices = points[panels[k]];
for v in 1:panelType
  vnext = v%numVertices+1;
  e[v] = vertices[vnext]-vertices[v];
  E[v] = norm(e[v]);
end

Thanks a lot!

Mason · February 28, 2021, 10:56pm

Is there a reason you can’t use a multidimensional array?

Ribeiro · February 28, 2021, 11:05pm

@Mason you mean within my for loop have one variable per vertex per element? Because that would take up way too much memory and most of those variables are temporary.
Thanks!

lmiq · February 28, 2021, 11:07pm

You can do that with a vector of static arrays. In that code e[v] does not seem necessáry as an array (coukd be only a temporary vector). But that depends if you will use it later.

Mason · February 28, 2021, 11:10pm

I don’t think your question is phrased very clearly and I’m having some trouble understanding. I am suggesting that instead of writing code like

v = [[1, 2, 3], [4, 5, 6]]

and accessing elements like

v[i][j]

you should instead write

m = [1 4
     2 5
     3 6]

and access elements as

m[i, j]

This will have the same (well, slightly less) memory requirements than an array of arrays, but will be more performant. If you’re using a vector of staticarrays, it’s usually better to just use https://github.com/mateuszbaran/HybridArrays.jl

Mason · February 28, 2021, 11:11pm

If you show us an actually runnable snippet of code that demonstrates a simplified version of your problem, we might be able to offer more specific advice.

lmiq · February 28, 2021, 11:24pm

I might be wrong, but maybe this note I have may be helpful: Immutable variables · JuliaNotes.jl

Ribeiro · March 1, 2021, 8:59am

@Mason I started with a regular matrix and when I switched to a Vector of StaticArrays my code became 10x faster.
@lmiq Thanks for the link. I learned some stuff =].

I thought this was a common enough problem that I wouldn’t need to make a full example, but here’s one:

N=1000;
vertexList = Vector{StaticArrays.SVector{3,Float64}}(undef,N);
elements = Vector{StaticArrays.SVector{3,Int}}(undef,N);
for i in 1:1000
    vertexList[i] = SA_F64[rand(),rand(),rand()];
    elements[i] = SA[rand(1:N),rand(1:N),rand(1:N)];
end
elementLength = length(elements[1]);
v = collect(1:elementLength);
vNext = v.%elementLength .+ 1;

function temp1(vertexList,elements,N)
    for k in 1:N
        vertices = vertexList[elements[k]];
        edges = vertices[vNext]-vertices[v];
        edgesNorm = norm.(edges);
    end
end
function temp2(vertexList,elements,N)
    for k in 1:N
        vertex1= vertexList[elements[k][1]]; vertex2= vertexList[elements[k][2]]; vertex3= vertexList[elements[k][3]];
        edge1 = vertex2-vertex1; edge2 = vertex3-vertex2; edge3 = vertex1-vertex3;
        edgeNorm1 = norm(edge1); edgeNorm2 = norm(edge2); edgeNorm3 = norm(edge3);
    end
end
@btime temp1(vertexList,elements,N); # 1.054 ms (8000 allocations: 765.63 KiB)
@btime temp2(vertexList,elements,N); # 991.583 ns (0 allocations: 0 bytes)

So temp2 is waaaay better, but way less elegant. It also locks me in a certain elementLength.
Any tips?
Thanks a lot!

lmiq · March 1, 2021, 9:21am

You mean temp2 is better right?

Note that N=length(vertexList), you don’t need to pass it as a parameter.

In temp1 you should passa v and vNext as parameters, otherwise they are global variables.

I think you can do something on the line you want. But then I have to get to the computer to try.

Have you considered something like temp2 but with an additional loop on elementLength?

Ribeiro · March 1, 2021, 9:43am

Sorry, I just edited that mistake.
I wrote the sample code real quick, so some stuff is odd.
But the changes you mentioned don’t make a difference.
Thanks!

I’m messing around a bit and this helps a lot:

edges = @SVector [vertices[v%elementLength+1]-vertices[v] for v in 1:elementLength];

Because before edges was an array of StaticArrays. Now it’s a StaticArray of StaticArrays. It still allocates, but now the results are more comparable:

@btime temp1(vertexList,elements,N); # 1.060 μs (2 allocations: 224 bytes)
@btime temp2(vertexList,elements,N); # 991.583 ns (0 allocations: 0 bytes)

Any way I can get that down to zero?

lmiq · March 1, 2021, 9:45am

Probably this is allocating something. Can you make it a static vector as well?

Just a curiosity: I have never seen the % notation before. Is that from some package? Where did you find it?

Small things: while you may not see a difference in one specific example, it is strongly advised to avoid global variables. Thus, also compute the lengths of the vectors inside the function (elementLength, for example). That avoids having to pass them as parameters while guaranteeing that the function remains generic.

Also @btime requires interpolating the input variable ($). Otherwise you may get wrong results.

Ribeiro · March 1, 2021, 10:07am

No, that is a StaticArray.
% is the integer division remainder, as in most other languages. That v%len+1 thing is just a way to loop over the vertices so that v is 1,2,3 and v%len+1 is 2,3,1.
I do set the length inside my functions. Again, I wrote that example real quick.
Can you point me to some material explaining the usage of $? I’ve seen it in some code and never figured out what it does. The documentation isn’t clear enough for me.
Thanks!

lmiq · March 1, 2021, 10:12am

Yes, of course I have seen it It got me confused because of the context and because in Fortran that is used to refer to fields of custom types.

Not really. It has something to do with how variables are parsed in macros. But I could not find a clear explanation (for me) either. But one needs to add the $ to get proper results. Otherwise there might be some benchmark associated to parsing the variable in the macro which should not be counted.

DNF · March 1, 2021, 10:48am

Neither of your functions, temp1 or temp2, return anything, and they don’t appear to have any side effects, so in theory the compiler can decide to not do any work at all. You should make sure that each function actually returns something.

Also, as far as I can tell, each iteration is independent from the previous, so the compiler could also decide to just run the last iteration of the loop, since the previous iterations have no observable effect.

Ribeiro · March 1, 2021, 10:59am

N=1000;
vertexList = Vector{StaticArrays.SVector{3,Float64}}(undef,N);
elements = Vector{StaticArrays.SVector{3,Int}}(undef,N);
for i in 1:1000
    vertexList[i] = SA_F64[rand(),rand(),rand()];
    elements[i] = SA[rand(1:N),rand(1:N),rand(1:N)];
end

function temp1(vertexList,elements)
    elementLength = length(elements[1]);
    v = collect(1:elementLength);
    vNext = v.%elementLength .+ 1;
    acc = SA_F64[0,0,0];
    for k in 1:size(elements,1)
        vertices = vertexList[elements[k]];
        edges = @SVector [vertices[v%elementLength+1]-vertices[v] for v in 1:elementLength];
        edgesNorm = norm.(edges);
        acc += edgesNorm;
    end
    return acc;
end
function temp2(vertexList,elements)
    acc1 = 0.0;
    acc2 = 0.0;
    acc3 = 0.0;
    for k in 1:size(elements,1)
        vertex1= vertexList[elements[k][1]]; vertex2= vertexList[elements[k][2]]; vertex3= vertexList[elements[k][3]];
        edge1 = vertex2-vertex1; edge2 = vertex3-vertex2; edge3 = vertex1-vertex3;
        edgeNorm1 = norm(edge1); edgeNorm2 = norm(edge2); edgeNorm3 = norm(edge3);
        acc1 += edgeNorm1; acc2 += edgeNorm2; acc3 += edgeNorm3;
    end
    return acc1, acc2, acc3;
end
@btime temp1($vertexList,$elements); # 5.050 μs (2 allocations: 224 bytes)
@btime temp2($vertexList,$elements); # 4.886 μs (0 allocations: 0 bytes)

DNF · March 1, 2021, 10:59am

Maybe you can update the example. It’s not clear if you have fixed the global variables issue or not.

Edit: Aha! Simulpost

DNF · March 1, 2021, 11:03am

You will never get down to zero allocations, since v and vNext are allocated. If you remove the collect, it will go down to one allocation.

(BTW you should never collect ranges, just leave them alone.)

Ribeiro · March 1, 2021, 11:09am

Oh, that was dumb. I don’t need v and vNext. I kept them in due to the previous version.
Yeah, removing those gets rid of the 2 allocations. Thanks!
Ok, using StaticArrays basically got rid of most of my problems. I got some other points in my real code where there’s some questionable stuff happening, but I can’t reproduce it too easily in a cutdown, for some reason.
Thanks a lot for the help, folks!

Ribeiro · March 1, 2021, 11:11am

I did it due to the vNext vector. (1:3).%3.+1 doesn’t work.

DNF · March 1, 2021, 12:04pm

jl>  (1:3) .% 3 .+ 1
3-element Vector{Int64}:
 2
 3
 1

The reason it didn’t work was parsing ambiguity of 3.+1. It could mean 3 .+ 1 or 3. + 1. The error message explains it: ERROR: syntax: invalid syntax "3.+"; add space(s) to clarify

(Never collect ranges )

Topic		Replies	Views
Avoiding allocations of small but non-trivial arrays (work array alternative?) Performance question	38	4483	November 17, 2022
Help with allocations (variable length least squares) New to Julia memory-allocation	10	1159	March 25, 2021
Common allocation mistakes Performance memory-allocation	47	7137	August 21, 2023
Memory allocations when returning vectors General Usage array , memory-allocation	15	1491	June 6, 2018
Creating a Static Array with a loop General Usage	10	1004	April 12, 2022

Local arrays in for loops

Related topics