Indexible structures like Matlab?

I am trying to generate bunch of matrices or vectors, but the number I need varies. In Matlab, I could do something like this:

for k = 1:5
X(k).A = rand(2,2) 
X(k).B = rand(1,1) 
X(k).C = rand(2,3)
end

which would generate a structure X. Later I want to update them with another matrix, which may not be the same dimensions as before, e.g:

X(3).A = rand(3,1) 
X(3).B = rand(2,3) 

What would be a simple way to do something similar in Julia?

Also, can I generate a new variable by defining it from an index before defining its size? i.e. define a variable, then fill it up with elements later? I know its against the rules, but is there a work around? eg:

x=[]
julia> x[1]=rand(2,2)
ERROR: BoundsError: attempt to access 0-element Array{Any,1} at index [1]
Stacktrace:
 [1] setindex!(::Array{Any,1}, ::Any, ::Int64) at .\essentials.jl:452
 [2] top-level scope at REPL[33]:1

Dictionary? And, you could use push!.

3 Likes

Something like this?

mutable struct ABC
    A
    B
    C
end       

X = [ABC(rand(2,2), rand(1,1), rand(2,3)) for k=1:5]

X[3].A = rand(3,1)
X[3].B = rand(2,3)

Edit: yes dictionary is better.

1 Like

I think the OP wants to add an arbitrary number of such matrices. Which is easily accommodated in Matlab, since the number of fields of a struct is not limited to the point of definition. (Which is convenient, but I think a dictionary is a cleaner solution.)

1 Like

For the first question, mutating things and especially changing the size is generally not optimal for performance AFAIU. But if you insist, you could create a mutable struct and use that. Below I create Foo as mutable type with fields A, B, and C, and then build X with a comprehension. I then check one of the elements, mutate it, and check it again:

julia> mutable struct Foo
           A
           B
           C
       end

julia> X = [Foo(rand(2,2), rand(1,1), rand(2,3)) for _ in 1:5]
5-element Vector{Foo}:
 Foo([0.6336484247574032 0.8779150180449113; 0.8998338040653455 0.27731611118206834], [0.6019491573654319], [0.44309305965467294 0.835445212887368 0.37173250938093627; 0.16745344879737623 0.30174534110261075 0.16485393037959706])
 Foo([0.6815944035748012 0.7091238931725286; 0.364759060102372 0.12560844587661246], [0.3473175584242554], [0.9272139520938611 0.8232564184631979 0.2829347646825151; 0.9085998059974083 0.8418665032348402 0.5318129985859041])
 Foo([0.7472536797853224 0.3342281250377974; 0.365138172581275 0.06889689084996697], [0.8667753425436879], [0.7349141877461014 0.20402614145227504 0.6047100042777549; 0.7986589723918815 0.3077295381003953 0.6799131516475592])
 Foo([0.6410987270662076 0.7658444194422644; 0.8760550715783828 0.7369830451086339], [0.5920630294672924], [0.7872685970873206 0.9791993704398225 0.8274853744382098; 0.39602112148215785 0.9614004575374897 0.7604701818469493])
 Foo([0.6702939351739794 0.30185003085636475; 0.4378845771156916 0.050980577051804365], [0.893243351404829], [0.21139611319275287 0.19059993695232325 0.7270900731352916; 0.9472706288162847 0.9207022702820218 0.678729012683486])

julia> X[3].A
2×2 Matrix{Float64}:
 0.747254  0.334228
 0.365138  0.0688969

julia> X[3].A = rand(3,1)
3×1 Matrix{Float64}:
 0.7080470430838635
 0.0690464669584705
 0.7310833891810355

julia> X[3].A
3×1 Matrix{Float64}:
 0.7080470430838635
 0.0690464669584705
 0.7310833891810355

For the second question, you can use push! as @PetrKryslUCSD said:

julia> x = []
Any[]

julia> push!(x, rand(2,2))
1-element Vector{Any}:
 [0.4485394897649224 0.6921317344806841; 0.35031302306558687 0.8553205180302579]

julia> x
1-element Vector{Any}:
 [0.4485394897649224 0.6921317344806841; 0.35031302306558687 0.8553205180302579]

(Edit: But in general, it’s better if you know ahead of time what type the elements of x are, so you could have done better by initializing x with x = Matrix{Float64}[])

1 Like
x = Array{Float64}(undef, 3)
x[1] = 1.2
x[2] = 3.4
x[3] = 5.6
1 Like

An array (or vector) of arrays gives you all of this functionality except the named indexing, I think. Those aren’t readily available in Matlab, but in Julia it is totally cool to have a vector of matrices. Perhaps you could even get better indexing using one of the zillion named array/dimension packages we have.

Thanks you all for you elegant solutions. From your suggestions I think this seems to work fine, but is there an advantage of defining a struct as suggested by @Adriel and @briochemc (thanks for the worked example btw) over the following approach:

k=3
A = Array{Any}(undef, k)
B = Array{Any}(undef, k)
C = Array{Any}(undef, k)
for i = 1:k
    A[i] = rand(2,2)
    B[i] = rand(1,2)
    C[i] = rand(3,2)
end 

julia> A[1]
2×2 Array{Float64,2}:
 0.376011   0.200972
 0.0130828  0.0578326

julia> A[1]= rand(1)
1-element Array{Float64,1}:
 0.2504132962082102

It seems that using a Dict is supposed to be better. I tired this as follows:

julia> for i = 1:k
       push!(A, i => rand(2,2))
       push!(B, i => rand(1,2))
       push!(C, i => rand(3,2))
       end

julia> A
3-element Array{Any,1}:
 1 => [0.6126923215520705 0.1497722038592495; 0.9998693487927839 0.9305777873157031]
 2 => [0.322463651408176 0.1825382265259281; 0.6514312813494405 0.27864426592987046]
 3 => [0.9785911063688968 0.7348436450688112; 0.2084956387475594 0.5552325011955841]

julia> A[1]
1 => [0.6126923215520705 0.1497722038592495; 0.9998693487927839 0.9305777873157031]

But if I now want to multiply two matrices together, it’s a bit clunky because I have to call the 2nd element as follows:

julia> B[1][2]*A[1][2]
1×2 Array{Float64,2}:
 0.716473  0.415339

My questions are:

  1. which is better: a) constructing the mutable struct, b) the undef arrays, c) using a Dict?
  2. Is there a better solution using a dictionary that I have come up with? One that could result in a similar format that can be achieved with the mutable struct solutions above (i.e. with X and A:C in the variable name, e.g. X[1].A)?

The answer depends on what operations you would like to perform on your data and whether the dimensions and types will be the same. If you want to dynamically add elements of different types with different dimensions, a Dict is a viable option.

The following code is the most similar to what you described above in Matlab. It is an array of Dictionaries. It assumes that each dictionary contains two dimensional arrays of Float64. This means you can add new key-value pairs, but only of the same type. (e.g. x[1][:d] = rand(30, 3)). Otherwise, using type Any will incur a performance penalty, which is fine if you need to have key-value pairs of different types.

k = 3

keys = [:a,:b,:c]

x = [Dict(key=>fill(0.0,1,1) for key in keys) for _ in 1:k]

x[1][:a] = rand(3, 2)

Aside from that, we would need to know more information about how you plan to use your data structure.

3 Likes

I agree with this @Christopher_Fisher. It’s hard to say without more details. If you can explain exactly what you are doing, you will get finely tuned help!

If you are forced to use the X[1].A syntax, then you cannot use a dict or an array of arrays, since their syntax requires X[1][:A] or similar. I personally prefer the nice and short .A notation too, so I would recommend you explore packages in the ecosystem as suggested by @tbeason. For example, you could use LabelledArrays.jl from SciML:

julia> using LabelledArrays

julia> labels = (:A, :B, :C)
(:A, :B, :C)

julia> X = [(@LArray [rand(2,2), rand(1,1), rand(2,3)] labels) for _ in 1:5] ;

julia> X[1].A
2×2 Array{Float64,2}:
 0.465824  0.888751
 0.180011  0.864779

julia> X[1].A = rand(5,5)
5×5 Array{Float64,2}:
 0.0424362  0.39973    0.761517   0.0222621  0.232605
 0.172972   0.881108   0.0333937  0.125635   0.135457
 0.395889   0.0872991  0.482909   0.201748   0.566823
 0.502437   0.725862   0.704653   0.477646   0.0560697
 0.816672   0.416991   0.867721   0.788992   0.387657

julia> X[1].A
5×5 Array{Float64,2}:
 0.0424362  0.39973    0.761517   0.0222621  0.232605
 0.172972   0.881108   0.0333937  0.125635   0.135457
 0.395889   0.0872991  0.482909   0.201748   0.566823
 0.502437   0.725862   0.704653   0.477646   0.0560697
 0.816672   0.416991   0.867721   0.788992   0.387657
2 Likes

Thanks all for your suggestions. I am avoiding the use of packages (I want to run it on my mobile phone). I did some testing with three approaches using rand() and fill() thinking that it might be quicker not to generate random numbers. Interesting results:

  1. Mutable struct (consistently faster to initialise with rand rather than fill):
julia> @time begin
       mutable struct Foo
           A
           B
           C
       end
       k = 30
       X = [Foo(rand(2,2), rand(1,2), rand(3,2)) for _ in 1:k]
       X[1].A = rand(3,2)
       X[1].B = rand(2,2)
       X[1].A *X[1].B
       end
  0.033101 seconds (66.55 k allocations: 3.532 MiB)
3×2 Array{Float64,2}:
 0.526554   0.52724
 0.0632389  0.159635
 0.0985826  0.862068

julia> @time begin
       mutable struct Foo
           A
           B
           C
       end
       k = 30
       X = [Foo(fill(2,2), fill(1,2), fill(3,2)) for _ in 1:k]
       X[1].A = rand(3,2)
       X[1].B = rand(2,2)
       X[1].A *X[1].B
       end
  0.044417 seconds (76.86 k allocations: 3.938 MiB)
3×2 Array{Float64,2}:
 0.427698  0.169891
 0.177593  0.0979178
 0.407794  0.303872
  1. Dict approach. Note I couldn’t initialise with fill(), so had to use rand(). This approach was the slowest by far:
julia> @time begin
       k = 30
       keys = [:a,:b,:c]
       x = [Dict(key=>fill(0.0,1,1) for key in keys) for _ in 1:k]
       for i = 1:k
       x[k][:a] = rand(2,2)
       x[k][:b] = rand(1,2)
       x[k][:c] = rand(3,2)
       end
       x[1][:a]= rand(3,2)
       x[1][:b]= rand(2,2)
       x[1][:a]*x[1][:b]
       end
  0.095423 seconds (188.87 k allocations: 9.871 MiB)
3×2 Array{Float64,2}:
 0.922558  0.830295
 0.963615  0.82152
 0.765743  0.685305
  1. undefined array approach (fastest outcome):
julia> @time begin
       k=30
       A = Array{Any}(undef, k)
       B = Array{Any}(undef, k)
       C = Array{Any}(undef, k)
       for i = 1:k
           A[i] = rand(2,2)
           B[i] = rand(1,2)
           C[i] = rand(3,2)
       end
       A[1]= rand(3,2)
       B[1]= rand(2,2)
       A[1]*B[1]
       end
  0.000037 seconds (127 allocations: 12.156 KiB)
3×2 Array{Float64,2}:
 0.445011  0.346964
 0.821819  0.64577
 0.476496  0.376766

julia> @time begin
julia> @time begin
       k=30
       A = Array{Any}(undef, k)
       B = Array{Any}(undef, k)
       C = Array{Any}(undef, k)
       for i = 1:k
           A[i] = fill(2,2)
           B[i] = fill(1,2)
           C[i] = fill(3,2)
       end
       A[1]= rand(3,2)
       B[1]= rand(2,2)
       A[1]*B[1]
       end
  0.000031 seconds (127 allocations: 10.750 KiB)
3×2 Array{Float64,2}:
 1.33985   0.496884
 0.549631  0.0953787
 1.33206   0.450659

Thanks all for your suggestions, I learned lots from you all!

I am not sure I understand the connection between these two statements.

As @Christopher_Fisher and others pointed out above, it would be better to understand the specs of your problem (field names known in advance? heterogeneous types? a large or small number of them? etc). All approaches involve various trade-offs in speed and flexibility.

1 Like

Note that fill and rand are generating different output based on how you are calling them. I recommend wrapping each example in a function when testing and using @btime from BenchmarkTools. BenchmarkTools will provide more accurate results for microbenchmarking.

using BenchmarkTools 

function f1()
    k=30
    A = Array{Any}(undef, k)
    B = similar(A)
    C = similar(A)
    for i = 1:k
      A[i] = fill(0.0, 2, 2)
      B[i] = fill(0.0, 1, 2)
      C[i] = fill(0.0, 3, 2)
    end
    A[1]= rand(3,2)
    B[1]= rand(2,2)
    A[1]*B[1]
end

function f2()
    k = 30
    keys = [:a,:b,:c]
    x = [Dict(key=>fill(0.0,1,1) for key in keys) for _ in 1:k]
    for i = 1:k
      x[k][:a] = rand(2,2)
      x[k][:b] = rand(1,2)
      x[k][:c] = rand(3,2)
    end
    x[1][:a]= rand(3,2)
    x[1][:b]= rand(2,2)
    x[1][:a]*x[1][:b]
end


function f3()
  k=30
  A = Vector{Array{Float64,2}}(undef, k)
  B = similar(A)
  C = similar(A)
  for i = 1:k
      A[i] = fill(0.0, 2, 2)
      B[i] = fill(0.0, 1, 2)
      C[i] = fill(0.0, 3, 2)
  end
  A[1] = rand(3, 2)
  B[1] = rand(2, 2)
  A[1]*B[1]
end

@btime f1()

@btime f2()

@btime f3()

Results:

  2.377 μs (96 allocations: 11.19 KiB)
  10.510 μs (305 allocations: 36.89 KiB)
  2.349 μs (96 allocations: 11.19 KiB)

Typically, something like f3 will be much more performant because the arrays are concrete rather than of type Any. In this case, it seems like type inference is working well, but that is not a guaranteed result in general.

1 Like