Variation in computation time for execution of the exact same code

Tamas_Papp · March 24, 2020, 2:35pm

https://github.com/JuliaCI/BenchmarkTools.jl/blob/master/doc/manual.md#interpolating-values-into-benchmark-expressions

DNF · March 24, 2020, 3:40pm

But stepSource returns a 2d array, not a 3d one. What I’m saying is that inside stepSource you create a 3d array, and then create a 2d slice which you return. The fact that it started with a 3d array has no effect outside the function, except that it causes an extra memory allocation.

jleman · March 24, 2020, 5:24pm

That makes sense. My recollection is that that using slices forced me into extra dimensions due to later calculations with other 3D elements. However, I am reworking the code with the StaticArray format you suggested. As an example here is what I have attempted for the three-phase sine source. Each sub-vector stores three voltages (one for each phase of a three-phase voltage source) at a given time step. This seems to output the right numerical data. One item that puzzles me is illustrated in the print statements. The first two make sense. In the third I am trying a quick way to get all the first elements of each vector for later plotting purposes. I could do this with another variable and loop, but is there an easier way?

using Plots
using LinearAlgebra
using StaticArrays

function threePhaseSine(θ)
    Δt = 2.0e-3; ω = 2.0*π*60.0  #
    Aphase(nt) = sqrt(2/3)*cos.(ω * Δt * nt .+ θ)  #nt is time step
    Bphase(nt) = sqrt(2/3)*cos.(ω * Δt * nt .- 2*π/3 .+ θ)
    Cphase(nt) = sqrt(2/3)*cos.(ω * Δt * nt .+ 2*π/3 .+ θ)
    Vsin = [MVector{3,Float64}([Aphase(nt), Bphase(nt), Cphase(nt)]) for nt in 1:10]

    return Vsin 
end

Vabc = threePhaseSine(0)

display(Vabc)
println("First vector of Vabc = ",Vabc[1])
println("First element of vector 1 of Vabc = ",Vabc[1][1])
println("First element of all vectors of Vabc = ",Vabc[:][1])  #this is unexpected

plot(Vabc[:][1]) #attempt to plot the voltage for Phase A as a function of time (doesn't work)

DNF · March 24, 2020, 6:20pm

Vabs[:] is just a copy of Vabs, so Vabs[:][1] is the same as writing Vabs[1], except you make an extra copy.

You probably want first.(Vabs).

Drop the square brackets around the phase functions here, this allocates a normal array and then converts it to an MVector.

In fact, it’s better to drop the {3,Float64} since the former is automatically understood from the number of inputs, and the element type is inferred, and will then allow also other types than Float64, so:

Vsin = [MVector(Aphase(nt), Bphase(nt), Cphase(nt)) for nt in 1:10]

Also consider whether you can use an SVector instead.

BTW: The dots you put inside the Aphase etc. functions don’t really make any sense, since threePhaseSine only works for scalar input. You can, however, broadcast the outer function over vector input, as threePhaseSine.(θ).

jleman · March 24, 2020, 6:59pm

Roger on the dots. I see there is no second.(Vabs) (i.e. for accessing the second or third elements of each vector). Is there a more general way to access?

Likewise, if I am using MVector or Mmatrix elements (because they have to be updated), how do I index certain blocks and update in place? For example in the code below, Vout is a vector of vector of vectors (?? multiple conductors, space, and time). I want to write a small array of Voltages (trappedChargeVoltage) to each conductor (all nc), each spatial step (all NDZ), but only the first time step (NDT = 1).

using StaticArrays
using LinearAlgebra

nc = 3
NDZ = 5
NDT = 10

VV = [MVector{nc,Float64}(zeros(nc)) for _ in 1:NDZ] #initialize vector of vectors to store update voltages at eachs spatial node
Vout = [[MVector{nc,Float64}(zeros(nc)) for _ in 1:NDZ] for _ in 1:NDT] #initialize array to store all output variables
#Vout contains voltages for conductors (nc) at at each spatial step (NDZ), and each time step (NDT)

trappedChargeV = [SVector{3,Float64}(10.0,0.0,-10.0)]
VV .= trappedChargeV
first.(Vout) .= trappedChargeV
display(first.(Vout))

DNF · March 24, 2020, 7:22pm

There is first and last, otherwise use getindex.(Vabs, n) to get the nth index.

You can update with an SVector too, but then you have to replace the entire subvector, for example:

VV = [zero(SVector{nc, Float64}) for _ in 1:NDZ]
# or 
VV = Vector{SVector{nc, Float64}}(undef, NDZ)

trappedChargeV = SVector(10.0,0.0,-10.0)
VV .= (trappedChargeV,)

Note that again you create unnecessary temporary arrays with MVector{nc,Float64}(zeros(nc)) since zeros(nc) creates an ordinary array which is converted to a StaticArray. Use unstead zero(MVector{3,Float64}), etc.

But here you may as well make it easier on yourself by just writing

trappedChargeV = SVector(10.0,0.0,-10.0)  # <- no need for {3, Float64}
VV = fill(trappedChargeV, NDZ)

This will not work:

first.(Vout) .= trappedChargeV

The left side creates a temporary array and assigns to that, and it disappears without touching Vout.

I would do this:

for v in Vout
    v[1] = trappedChargeV
end
# or
setindex!.(Vout, (trappedChargeV,), 1)

Notice that this works for SVectors.

DNF · March 24, 2020, 8:36pm

Note that keeping nc = 3 as a variable is a performance issue.

If you can, write them as literal values. Compare the following three functions. foo is yours, bar is the same but with literal instead of nc, and baz uses SVector. Also note that the first two don’t do what you (probably) want.

using BenchmarkTools, StaticArrays

function foo()
    nc = 3
    NDZ = 5
    NDT = 10

    VV = [MVector{nc,Float64}(zeros(nc)) for _ in 1:NDZ]
    
    trappedChargeV = [SVector{3,Float64}(10.0,0.0,-10.0)]
    VV .= trappedChargeV
    
    Vout = [[MVector{nc,Float64}(zeros(nc)) for _ in 1:NDZ] for _ in 1:NDT]
    first.(Vout) .= trappedChargeV
     
    return Vout, VV
end

function bar()
    NDZ = 5
    NDT = 10

    VV = [MVector{3,Float64}(zeros(3)) for _ in 1:NDZ]
    
    trappedChargeV = [SVector{3,Float64}(10.0,0.0,-10.0)]
    VV .= trappedChargeV
    
    Vout = [[MVector{3,Float64}(zeros(3)) for _ in 1:NDZ] for _ in 1:NDT]
    first.(Vout) .= trappedChargeV
     
    return Vout, VV
end

function baz()
    NDZ = 5
    NDT = 10

    trappedChargeV = SVector(10.0, 0.0, -10.0)
    VV = fill(trappedChargeV, NDZ)

    Vout = [[zero(SVector{3, Float64}) for _ in 1:NDZ] for _ in 1:NDT]
    setindex!.(Vout, (trappedChargeV,), 1)
     
    return Vout, VV
end

Benchmarks:

julia> @btime foo();
  74.461 μs (665 allocations: 45.28 KiB)

julia> @btime bar();
  3.046 μs (140 allocations: 10.03 KiB)

julia> @btime baz();
  642.392 ns (14 allocations: 2.58 KiB)

baggepinnen · March 24, 2020, 11:00pm

It’s surprising to me that constant propagation didn’t handle the foo case…

jleman · March 25, 2020, 8:05pm

Is there a good way to make something like this work, or is it best just to go to MVector.

vec1 = SVector{3}(rand(3))
for i = 1:4    
    vec1 .+= SVector{3}(rand(3))
end

kristoffer.carlsson · March 25, 2020, 8:09pm

Just do vec1 += .

jleman · March 25, 2020, 8:17pm

It updates in-place, correct?

DNF · March 25, 2020, 8:19pm

Yeah, there’s no need to update in-place. You can think of StaticArrays as values instead of containers.

No, it creates a new instance. But it will be super-fast, since it’s a stack-allocated value type.

Also, instead of

SVector{3}(rand(3))

write

rand(SVector{3})

then you will avoid creating an unnecessary temporary array.

Observe:

julia> function foo()
           vec1 = rand(MVector{3})
           for i = 1:4    
               vec1 .+= rand(MVector{3})
           end
           return vec1
       end

julia> function bar()
           vec1 = rand(SVector{3})
           for i = 1:4    
               vec1 += rand(SVector{3})  # <- no dot!
           end
           return vec1
       end

julia> @btime foo();
  88.801 ns (5 allocations: 160 bytes)

julia> @btime bar();
  74.856 ns (0 allocations: 0 bytes)  # <- no allocations!

jleman · March 26, 2020, 6:52am

Almost done implementing the staticarray approach and will post some figures showing the performance gains. Two more items that are stumping me.

I need to perform an elementwise absolute value operation on a result in the form of that shown below. I think I understand why what I have won’t work. But is there a straightforward way to do this?

vec1 = [[SVector{3,Float64}(-i,-i,-i) for i in 1:4] for _ in 1:10]
abs.(vec1)

I need to save the vec1 static-array output to a file. I normally save matrix results out as a csv file using writedlm, but that seems tedious as it keeps all the brackets. Any suggestions for a convenient way to store out the data? The intent would be to read the data back in for later post processing and plotting. Or is there a convenient way to convert the data back to a non-static array form where manipulating contents is a bit easier.

Elrod · March 26, 2020, 8:21am

May be worth pointing out that with a couple modifications, MArrays don’t allocate either:

julia> @inline inlinerand(::Type{<:MVector{3}}) = MVector(rand(),rand(),rand())
inlinerand (generic function with 1 method)

julia> function foo2()
           vec1 = inlinerand(MVector{3})
           for i = 1:4
               vec1 .+= inlinerand(MVector{3})
           end
           return SArray(vec1)
       end
foo2 (generic function with 1 method)

julia> @btime foo2()
  47.758 ns (0 allocations: 0 bytes)
3-element SArray{Tuple{3},Float64,1,3} with indices SOneTo(3):
 2.4573027826314195
 2.1526819466919456
 2.4786087715660896

DNF · March 26, 2020, 8:41am

Yes, this doesn’t work since vec1 is a vector of vectors of vectors. It’s a bit awkward to perform broadcasting to arbitrarily deep levels, but if you dig around on this forum you might find some nice solutions. Here’s one, I don’t think it’s the best one:

myabs(x) = abs(x)
myabs(x::AbstractArray) = myabs.(x)

Then myabs(vec1) should work.

BTW FYI, you can make a matrix of SVectors, if you want:

julia> vec1 = [SVector{3,Float64}(-i,-i,-i) for i in 1:4, j in 1:10]
4×10 Array{SArray{Tuple{3},Float64,1,3},2}:
 [-1.0, -1.0, -1.0]  [-1.0, -1.0, -1.0]  [-1.0, -1.0, -1.0]  …  [-1.0, -1.0, -1.0]  [-1.0, -1.0, -1.0]
 [-2.0, -2.0, -2.0]  [-2.0, -2.0, -2.0]  [-2.0, -2.0, -2.0]     [-2.0, -2.0, -2.0]  [-2.0, -2.0, -2.0]
 [-3.0, -3.0, -3.0]  [-3.0, -3.0, -3.0]  [-3.0, -3.0, -3.0]     [-3.0, -3.0, -3.0]  [-3.0, -3.0, -3.0]
 [-4.0, -4.0, -4.0]  [-4.0, -4.0, -4.0]  [-4.0, -4.0, -4.0]     [-4.0, -4.0, -4.0]  [-4.0, -4.0, -4.0]

Or maybe you could define a recursive helper function, like this:

rdot(f, x) = f(x)  # recursive dot
rdot(f, x::AbstractArray) = rdot.(f, x)

and then call

vec2 = rdot(abs, vec1)

This would work with other functions beside abs as well.

jleman · March 27, 2020, 6:57pm

Results update

It took me a couple of days to re-write the old allocation heavy code per the static array suggestions above, but got it working and here are the results:

Old Code (allocation heavy due to use of base array and slicing approach):

455.657741 seconds (585.01 M allocations: 73.176 GiB, 88.12% gc time)

Revised Code (making use of static vector/matricies (StaticArrays.jl))

23.023161 seconds (71.79 M allocations: 3.910 GiB, 66.59% gc time)

Between the above improvements and those discussed on a related thread that’s about a 1200X improvement to my original ugly code. Thanks to all who contributed comments and suggestions!

Topic		Replies	Views
Weird fluctuation in running time Performance	2	227	March 20, 2024
Opinions about code performance General Usage	1	238	July 3, 2022
Why sum of the measured times of each function is not equal to the total run time? General Usage	20	820	August 21, 2020
Why does my program's performance vary so much from run to run? Can it be fixed? Performance question , memory-allocation , optimization	11	1539	January 20, 2021
How much is it normal that @time differs in time? New to Julia	20	2360	June 28, 2017

Variation in computation time for execution of the exact same code

Related topics