Help in understanding efficiently handling multidimensional arrays

Hi!

I have a function simulation( p ) which returns a 2D (n by T) array of states, the state for each simulated time step. ‘p’ just stands for some parameters.

I also have to run a multitude of N simulations and I want to do it efficiently. So, I want to use a function simulation(p, N) which returns a 3D (n by T by N) array of the states. My question is, what is the most efficient way to reuse the code I already have in simulation(p)?

Basic solution:

function solution(p, N)
   x = Array{Float64,3}(n,T,N)
   for i = 1:N 
      x[1:n,1:T,i] = simulation(p)
   end
   return x
end

My issue here is that as far as I know, there is a lot of overhead due to simulation(p) allocating each x_i generated trajectory, which is then copied into x[1:n,1:T,i].

Preferred solution:
I would instead use a solution!(p, x) function which modifies the preallocated x array where the result would be stored. Then I would hope for something along the lines of:

function solution(p, N)
   x = Array{Float64,3}(n,T,N)
   for i = 1:N 
      simulation!(p, x[1:n,1:T,i])
   end
   return x
end

However, as far as I understand, putting in x[1:n,1:T,i] as the argument to simulation! actually copies that part of the array, so x itself will not be overwritten and in this form this does not work.

Unpreferred but working solution:
I can create the function solution!(p, x, i), which accepts a 3 dimensional array and modifies only the part where the last component has index i. Then I can use:

function solution(p, N)
   x = Array{Float64,3}(n,T,N)
   for i = 1:N 
      simulation!(p, x, i)
   end
   return x
end

which should do the trick. My problem with this solution, is that I also just want to be able to use the simulation!(p, x) function without a 2D array, and I do not know how to have simulation!(p, x) and simulation!(p, x, i) boil down to the same code, due to the different way x has to be accessed and treated in them. This is what I could avoid by the preferred solution, where I still only have the simulation!(p, x) function operating on a 2D array, without supplying any additional index i.

Any help or suggestion on how this is usually done is appreciated!

Best,
Tusike

1 Like

I see three ways to solve the problem, which one you choose might depend on your problem.

  1. Think about if your result really has to be a 3D array or if a vector of matrices would work for your purposes as well. Then some code like:
function solution(p, N)
   map(i -> simulation(p),1:N) 
end

would work and return a Vector of Matrices. You could use the RecursiveArrayTools package to make them look and index like a 3D Array.

  1. You can use view instead of copies in your preferred solution. You have to be careful, because indexing into views might be a bit slower than into plain arrays, so if you see a performance drop, use another solution:
function solution(p, N)
   x = Array{Float64,3}(n,T,N)
   for i = 1:N 
      simulation!(p, view(x,1:n,1:T,i))
   end
   return x
end
  1. You can pre-allocate a matrix that you pass to your simulation! function and copy the result into the 3D array after each simulation:
function solution(p, N)
   x = Array{Float64,2}(n,T,N)
   xT = zeros(n,T)
   for i = 1:N 
      simulation!(p, xT)
      x[1:n,1:T,i] = xT
      fill!(xT,0.0)
   end
   return x
end

Hope this helps…

2 Likes

Thanks for the help!

I considered a vector of matrices, but after some googling I found some performance metrics indicating it is slower than using 3D arrays. (Speed is mostly what I care about for my use case, which is why I will probably avoid the view). I guess I could still try and see which performs better in my code.

Your third method is so far the cleanest and yet still efficient that I have now, though I remain hopeful that a similar syntax could get the efficiency of my third approach, which should still be the fastest.

IMHO you should still try and benchmark a view-based solution before deciding to avoid it. It is really easy to implement, and in your case (where you’ll be indexing contiguous data) it should not induce much performance penalty (if any).

While there are probably situations in which 3D arrays can be faster than a vector of matrices, I would hesitate to draw any conclusions from the Discourse post you linked, as there were a bunch of confounding factors there, like the use of non-const global variables, indexing in the wrong order, and comparing things that aren’t actually equivalent.

For what it’s worth, I think the vector-of-matrices option and the view-based option both sound like great ideas. It should be pretty straightforward to benchmark both of them (feel free to post the results here if they’re surprising or suspicious) and figure out which one is better for your case.

2 Likes

I think you can just write x[1:n,1:T,i] == x[:,:,i] for these slices.

Two other packages you may want to look at:

  • JuliennedArrays which will efficiently produce a vector of views. Although this appears to be going through a change of notation…

  • If the slices are small, like n*T < 100, then it may be fast to use StaticArrays. These can be made by reinterpreting the original array.

I meant the benchmarks posted in the replies of that link, where as far as I could tell those issues were resolved and 3D arrays came out faster by far.

But I agree, there could be differences to those benchmarks, and it is worth it to just give the different recommended ideas a shot in my application. I will post the results here when I’m done! (though it will take some time before I set up the rest of the project and can actually run the simulation as part of it).

Thanks again for all the help!

You know, I did want to ask about the x[1:n,1:T,i] == x[:,:,i] notation at some point, I just thought it was less relevant for now. Sometime earlier, I got a substantial improvement in speed by explicitly giving the bounds 1:n and 1:T (also in other parts of my code). I realize I’m just referring to all the elements and in terms of readability, using simply : would have been much more preferable, but I guess it takes extra computation to figure out the actual bounds if they are not explicitly given?

That shouldn’t be the case. If anything, I would expect : to be faster, since it doesn’t require constructing the unit range. Please post a benchmark if you find a case in which : is noticeably slower, and perhaps we can help figure out what’s wrong.

There were a bunch of other confounding factors in those benchmarks that made them ineffective for drawing any conclusions. In particular, the benchmark that @stillyslalom posted compared doing x .+= 1 (which operates in place) on the 3d array with doing x += 1 (which allocates an entire new matrix) on each element of the vector-of-matrices, which explains why the latter appeared to be much slower and allocate lots of memory. Furthermore, the benchmark that @PetrKryslUCSD posted used a non-concrete element type (Array{Matrix} instead of Array{Matrix{Float64}} ) for the array-of-matrices approach, which also explains the huge difference in performance and memory allocation that he saw.

I’m sorry that I’m just posting critiques of other people’s work without numbers to back it up (I’m on mobile right now and can’t actually run the benchmarks), but I would encourage you to do some benchmarking on your own too in order to learn more about what actually performs well in Julia.

1 Like

Here’s an example that tries to avoid the two issues I mentioned with the other benchmarks. In this case, i construct a 3D array of size 100x100x100 and also a vector of 100 100x100 matrices. I then benchmark adding one to each element, using the in-place version in both cases:

julia> function f(x::Array{T, 3}) where T
         x .+= 1
       end
f (generic function with 2 methods)

julia> function f(x::Array{Matrix{T}}) where T
         for matrix in x
           matrix .+= 1
         end
       end
f (generic function with 2 methods)

julia> x_array = zeros(N, N, N);

julia> x_vec_mat = [zeros(N, N) for i in 1:N];

julia> using BenchmarkTools

julia> @btime f($x_array);
  609.838 μs (0 allocations: 0 bytes)

julia> @btime f($x_vec_mat);
  505.012 μs (0 allocations: 0 bytes)

Both implementations are quite fast, and their performance on this task is comparable. In fact, the entire difference between the vector-of-matrices and the 3D array approach shown in the previous thread can be attributed to the two issues I referenced above (not updating the matrices in-place and working with non-concrete element types).

I see! I also did a similar benchmark to check the performance for indexing using : and 1:n, but saw essentially no difference in performance as you predicted. I was even more of a beginner in Julia earlier though, so there might have been other issues with my code at play which led me to change all my : indexes to the form of 1:n… But I’m curious too, and will share anything I find once I reach the point when I can comfortably test these out using my entire code!

So, I’ve finally reached the point where I can start comparing the two implementations, one using 3D arrays, the other breaking it down into vectors of vectors of vectors, but I first want to make sure I’ve optimized both codes to the best of my abilities before posting any results.

Is there a good guide for this? For example, I had a line:

for n = 1:N
     sp.C[t] += (PN[n]*(θN[n][t] - sp.θ[t]))*(θN[n][t] - sp.θ[t])'
end

which became significantly faster after putting @. in the front, but I couldn’t find the documentation for what that does, or figure out how I could achieve the same performance without using it. I’m assuming it nicely does the operations in-place, but it would be good to see how I myself could achieve the same results.

Check out More Dots: Syntactic Loop Fusion in Julia for some information on what @. is and why it’s useful.

1 Like

Going back to the : vs 1:n indexing (sorry I missed it earlier) — the two should perform exactly identically when indexing into Arrays and most array types. In fact, :s get converted to the appropriate 1:n range behind the scenes.

You may see differences with some other array types though — notably SubArray! There can be a big difference between the performance of @view A[:, :] and @view A[1:n, 1:n]… but the : will always be faster than the explicit range because it gives the SubArray more information by its type alone. Whereas ranges might select any number of elements, : will always select the entire dimension. Julia is able to exploit this extra information and can create a much more efficient view in response.

In short: use : if that’s what your intention is.

4 Likes

Thanks! The documentation for @. helped a lot in trying to optimize everything.

In the end, I am almost getting the same performance in my two implementations. The one using higher dimensional arrays is still better due to the reason that for the same array, I sometimes have to iterate through different dimensions, and I assume this might be more efficient than using vectors of vectors of vectors… I also noticed more % of runtime allocated to garbage collection, so it is also probably problematic coding on my part.

For example, I have a 3D array which I iterate through in terms of states and time steps for each N simulations. But, while populating this array before doing the simulations, I have the code:

function sampleParameters!(sp::StochasticParameters, N::Int, θN::Vector{Vector{Vector{Float64}}})
    samples = Matrix{Float64}(undef, length(sp.θ[1]), N)
    for t = 1:sp.T
        Distributions.rand!(Distributions.MvNormal(sp.θ[t], sp.C[t]), samples)
        for i = 1:N
            @. θN[i][t] = sp.predθ[t] + samples[:,i]
        end
    end
end

as opposed to the other implementation (which I haven’t even started optimizing):

function sampleParameters!(sp::StochasticParameters, N::Int, θN::Array{Float64, 3})
    for t = 1:sp.T
        θN[:,t,:] = sp.predθ[:,t]*ones(1,N) + rand(Distributions.MvNormal(sp.θ[:,t], sp.C[t]), N)
    end
end

The main difference seems to stem from this function call.