Nested static arrays perform worse than nested arrays

anneJulia · November 4, 2021, 8:10am

I am performing array operations, where I retrieve elements of an array (this array is nested). The functions below are minimal working examples of what I am trying to do.

using TimerOutputs
using StaticArrays
using StatsBase

const track = TimerOutput()

########## normal array: 

function nestedNormalArray(input)

    output = zeros(20)
    for i in 1:20
        output[i] = input[I][rand(1:5)]
    end
    return output
end

@timeit track "arrayNestedInput" for j in 1:10000000
    
    temp = Vector{Any}(undef,20) 
    for k in 1:20 
        random = digits(rand(1:((2^5)-1)), base=2, pad=5)
        temp[k] = random
    end

    nestedNormalArray(temp)
    
end


########## static array: 

function nestedStaticArray(input)

    output = zeros(MVector{20,Int32})
    for i in 1:20
        output[i] = input[I][rand(1:5)]
    end
    return output
end

@timeit track "staticArrayNestedInput" for j in 1:10000000
    
    temp = Vector{Any}(undef,20) 
    for k in 1:20 
        random = digits(rand(1:((2^5)-1)), base=2, pad=5)
        temp[k] = MVector{5}(random)
    end

    inside = SVector{20}(temp)
    nestedStaticArray(inside)
    
end

I used the package TimerOutputs to get the allocated memory and the time both functions needed. Surprisingly the version with the static arrays needed a lot more allocations and time.

 ─────────────────────────────────────────────────────────────────────────────────
                                          Time                   Allocations      
                                  ──────────────────────   ───────────────────────
         Tot / % measured:              128s / 74.9%           67.5GiB / 100%     

 Section                  ncalls     time   %tot     avg     alloc   %tot      avg
 ─────────────────────────────────────────────────────────────────────────────────
 staticArrayNestedInput        1    50.1s  52.2%   50.1s   39.2GiB  58.1%  39.2GiB
 arrayNestedInput              1    45.8s  47.8%   45.8s   28.3GiB  41.9%  28.3GiB
 ─────────────────────────────────────────────────────────────────────────────────

Does anybody know how to reduce the number of allocations and the time?

DNF · November 4, 2021, 9:12am

You are benchmarking on global variables, and using Vector{Any}. That’s basically the two worst things you can do performance-wise

Make sure to put your code in functions, and then benchmark it using the BenchmarkTools package. And avoid Vector{Any} at all cost, that will kill performance completely.

Take a look at the performance tips section of the manual: Performance Tips · The Julia Language

DNF · November 4, 2021, 9:16am

I’m not sure about the consequences, but it looks risky to overwrite in

anneJulia · November 4, 2021, 9:17am

Right, I am definitely going to change this

DNF · November 4, 2021, 9:20am

rand, too is a Base function that is better to leave alone here. In fact, you may just use rand(1:5) instead of sample(1:5).

anneJulia · November 4, 2021, 9:23am

Thank you, also going to change this

anneJulia · November 4, 2021, 9:34am

I put my code into functions, so that temp and inside are no longer global, and I changed Vector{Any} into Vector{Vector{Int64}} .

using TimerOutputs
using StaticArrays


const track = TimerOutput()

########## normal array: 

function nestedNormalArray(input)

    output = zeros(20)
    for i in 1:20
        output[i] = input[i][sample(1:5)]
    end
    return output
end

function makeNormalNestedArray()
    temp = Vector{Vector{Int64}}(undef,20) 
    for k in 1:20 
        random = digits(rand(1:((2^5)-1)), base=2, pad=5)
        temp[k] = random
    end
    return temp
end

@timeit track "arrayNestedInput" for j in 1:10000000
    nestedNormalArray(makeNormalNestedArray())
end


########## static array: 

function nestedStaticArray(input)

    output = zeros(MVector{20,Int32})
    for i in 1:20
        output[i] = input[i][sample(1:5)]
    end
    return output
end

function makeNestedStaticArray()

    temp = Vector{Vector{Int64}}(undef,20) 
    for k in 1:20 
        random = digits(rand(1:((2^5)-1)), base=2, pad=5)
        temp[k] = MVector{5}(random)
    end

    inside = SVector{20}(temp)
    return inside 
end

@timeit track "staticArrayNestedInput" for j in 1:10000000
    nestedStaticArray(makeNestedStaticArray())
end

And this is the output I get:

 ─────────────────────────────────────────────────────────────────────────────────
                                          Time                   Allocations      
                                  ──────────────────────   ───────────────────────
         Tot / % measured:             47.9s / 90.4%           88.1GiB / 100%     

 Section                  ncalls     time   %tot     avg     alloc   %tot      avg
 ─────────────────────────────────────────────────────────────────────────────────
 staticArrayNestedInput        1    27.3s  63.0%   27.3s   59.8GiB  67.9%  59.8GiB
 arrayNestedInput              1    16.0s  37.0%   16.0s   28.3GiB  32.1%  28.3GiB
 ─────────────────────────────────────────────────────────────────────────────────

Even more allocations are made.

DNF · November 4, 2021, 9:42am

I’m not sure about the end goal here, but can you do this?

using StaticArrays, BenchmarkTools

function nestedStaticArray(input)
    output = MVector{20,Int32}(undef)
    for i in eachindex(output)
        output[i] = input[i][rand(1:5)]
    end
    return output
end

temp = rand(SVector{20, SVector{5, Bool}})

jl> @btime nestedStaticArray($temp)
  121.540 ns (1 allocation: 96 bytes)
20-element MVector{20, Int32} with indices SOneTo(20):
 1
 1
 0
 0
 0
 1
 0
 1
 0
 1
 1
 1
 1
 1
 1
 1
 1
 0
 0
 1

If it’s all zeros and ones, output could also use Bools instead of Int32.

DNF · November 4, 2021, 9:46am

It may be nicer to use

output[i] = rand(input[i])

That will randomly select one of the elements of input[i], and not rely on having to hardcode the index range.

And you could, in principle just write the whole thing as

temp = rand(SVector{20, SVector{5, Bool}})
output = rand.(temp)

That would return an SVector{20, Bool}, but you could change this around if you like and absolutely need MVectors and Int32s.

anneJulia · November 4, 2021, 9:49am

In my actual code I’m using specific boolean vectors, I don’t use rand(), for the MWE I just wanted to give my function different boolean vectors.

DNF · November 4, 2021, 9:52am

Not sure I understand. Does this make a difference? rand(input[i]) will select a random element of input[i], is that not what is wanted? Maybe you can add some more detail to your MWE?

anneJulia · November 4, 2021, 10:03am

I will try to give my MWE more detail. In general, my goal is to give my function nestedStaticArray(input, idx) a boolean vector input and a nested vector idx which has (inside each nested vector) the indexes which I want to retrieve from input with a for loop.

anneJulia · November 4, 2021, 10:50am

using TimerOutputs
using StaticArrays

const track = TimerOutput()

function nestedStaticArray(input, idx, reference)
    output = zeros(MVector{20,Int32})
    for i in 1:20
        tempi = input[idx[i]]
        t = 0
        for u in 0:(length(tempi)-1)
            t = t+ 2^u * tempi[u+1]
        end
        output[i] = reference[t+1] 
    end
    return output
end

function makeNestedStaticArray()
    temp = Vector{Vector{Int64}}(undef,20) 
    for k in 1:20 
        random = rand(1:5, 5)
        temp[k] = MVector{5}(random)
    end
    inside = SVector{20}(temp)
    return inside 
end

function makeReferenceStatic()
    output = zeros(MVector{2^5,Int32})
    for i in 1:(2^5)
        if i % 2 == 0
            output[i] = 1
        else
            output[i] = 0
        end
    end
    return output
end

@timeit track "staticArrayNestedInput" for j in 1:10000000
    nestedStaticArray(digits(rand(1:((2^20)-1)), base=2, pad=20),makeNestedStaticArray(), makeReferenceStatic() )
end

I am giving my function nestedStaticArray a input vector of length 20 (boolean), in the ith iteration of the for loop I want to create a smaller boolean vector tempi consisting only of a subarray of input, namely input at the indexes of idx[i], the boolean vector tempi is then turned into a number of base 10 and output now gets the value of reference at this computed index. (The function makeReferenceStatic() is normally more complicated, but for the sake of this MWE I constructed it this way, and I also don’t do this: digits(rand(1:((2^20)-1)), base=2, pad=20), but I give my function a different boolean vector of length 20 each time I call the function)

 ─────────────────────────────────────────────────────────────────────────────────
                                          Time                   Allocations      
                                  ──────────────────────   ───────────────────────
         Tot / % measured:             62.0s / 91.2%           94.2GiB / 100%     

 Section                  ncalls     time   %tot     avg     alloc   %tot      avg
 ─────────────────────────────────────────────────────────────────────────────────
 staticArrayNestedInput        1    56.5s   100%   56.5s   94.2GiB  100%   94.2GiB
 ─────────────────────────────────────────────────────────────────────────────────

DNF · November 4, 2021, 8:00pm

Not sure if this does the same as yours, but it’s faster at least:

makeref() = iseven.(SOneTo(32))
rand5() = SVector{5}(rand(1:5) for _ in 1:5)
makenested() = SVector{20}(rand5() for _ in 1:20)

function nestedsarray(input, ind, ref)
    output = MVector{20, Bool}(undef)
    for i in eachindex(ind)
        tempi = input[ind[i]]
        t, u2 = 0, 1
        for u in eachindex(tempi)
            t += u2 * tempi[u]
            u2 *= 2
        end
        output[i] = ref[t+1]
    end
    return SVector{20,Bool}(output)
end

nestedsarray(rand(SVector{20,Bool}), makenested(), makeref())

I recommend using BenchmarkTools instead of running your code 10 million times:

@btime nestedsarray(rand(SVector{20,Bool}), makenested(), makeref())
  1.000 μs (1 allocation: 48 bytes)

anneJulia · November 12, 2021, 11:02am

Thank you
I will use @btime in the future.

Topic		Replies	Views
Loop over array of static arrays Performance question , staticarrays	12	156	October 26, 2024
Inexplicable allocations when summing `StaticArrays` Performance	8	1124	December 3, 2018
Static array over 1,024 elements is slow to compile? General Usage	11	3119	March 25, 2021
Local arrays in for loops New to Julia	55	1820	March 3, 2021
Alternative to very time-consuming array operations Performance question	9	581	November 4, 2021

Nested static arrays perform worse than nested arrays

Related topics