Represent multidimensional array at higher dimension

suppose I have a 2x3 array called mat1.

I would like to pretend that this 2x3 array is actually a 2x3x4 array called mat1_3d_representation, such that

mat1_3d_representation[i, j, k] == mat1[i, j]

for all valid scalar i, j, k. And,

mat1_3d_representation[i, j, :]  == repeat([mat1[i,j], 4)

Context:

variables are stored at (arbitrary) different units of observation: E.g. person-time, person-firm-time-transaction, time, firm-person-time, etc.

In a model, it might make sense to add (or multiply, or etc.) a variable v_t at one unit of observation (say, time) to a variable x_i_t_j at a different (superset) unit of observation (say, time-person-firm) (assume that I have good reasons not to be using joins of dataframes or joins in SQL). These variables are represented as multidimensional arrays where indices correspond to integer representation of the units of observation.

I could repeat v_t I * J times and then add the matrices, but this shouldn’t be necessary. Any operation defined on two arrays of size (a, b, c, …) can be redefined on an array of size (a, b, c, …) and an array of same size but missing some dimensions. A type that represents an array at the higher dimension would eliminate ambiguity about what is meant by the operations and would allow multiple dispatch to apply the correct operations.

Maybe there is a simple way to unambiguously apply broadcasting here? Broadcasting works in special cases. Combining it with TransmuteDims.transmute can extend it to other cases, but it can get very messy, which indicates another layer of abstraction would be beneficial.

E.g.:

obs = I, J, T = 2, 3, 4 # observations 
x_i_j_t = ones(Float64, obs)
w_i = 1:I |> collect 
h_i_t = 1:(I*T) |> collect |>  v -> reshape(v, (I, T))
v_t = 1:T |> collect 

x_i_j_t .+ w_i  # correct 

# x_i_j_t .+ v_t # doesn't work.  
using TransmuteDims 
x_t_i_j = transmute(x_i_j_t,  (3, 1, 2))
x_t_i_j .+ v_t # works 

# x_i_j_t  .+ h_i_t
x_i_t_j = transmute(x_i_j_t,  (1, 3, 2))
x_i_t_j .+ h_i_t  # works 


 βₓ, β_w, βᵥ, βₕ = .2, .3, .4, .5
# y_i_j_t = @. βₓ * x_i_j_t + β_w*w_i + βᵥ*v_t + βₕ*h_i_t  # lol, no... though wouldn't it be great if I could represent w_i, v_t, and h_i_t as though they were all of (i, j, t) and this equation worked? 

# to use transmute and broadcasting to evaluate y_i_j_t without an additional abstraction layer, I would need to call transmute multiple times and it would be difficult to read.  

Basically I am hoping that there exists a package that provides a layer of abstraction so I can avoid the above mess. This is totally feasible; if it doesn’t exist I will write functions that execute the above in a more organized fashion, but I’d rather rely on existing packages to the extent that I can.

Does anyone know of existing packages that allow a multiple-dimensional array to be represented as though it were of higher dimension?

I think it depends on what are you doing with the “padded” array afterwards

Using TensorCast.jl:

using TensorCast
mat1 = [1 2 3; 4 5 6]
@cast mat1_3d[i, j, k] := mat1[i, j] (k in 1:4)

Is at least more efficient than repeating with:

repeat(mat1, 1, 1, 4)

The fact that you want something like this sounds more like you might need to rethink the algorithm instead, to be honest.

Edited original question.

You can also just do this with a view, if I’m understanding correctly:

julia> A = rand(2, 3)
2×3 Matrix{Float64}:
 0.856466  0.126871  0.36166
 0.850967  0.117943  0.311555

julia> A_3d = @view A[:, :, fill(end, 4)]
2×3×4 view(::Array{Float64, 3}, :, :, [1, 1, 1, 1]) with eltype Float64:
[:, :, 1] =
 0.856466  0.126871  0.36166
 0.850967  0.117943  0.311555

[:, :, 2] =
 0.856466  0.126871  0.36166
 0.850967  0.117943  0.311555

[:, :, 3] =
 0.856466  0.126871  0.36166
 0.850967  0.117943  0.311555

[:, :, 4] =
 0.856466  0.126871  0.36166
 0.850967  0.117943  0.311555
3 Likes

Seems like xarray can do what you want, unfortunately, there does not seem to be comparable Julia package.
Also the rank concept from APL/J gives similar functionality. My experimental JJ.jl library illustrates the idea, but does not read as nice:

julia> eplus = rank"0 + 0"  # elementwise plus, i.e., like .+ but matching dimensions from the back
JJ.RankedDyad{typeof(+), 0, 0}(+)

julia> using Chain

julia> y_i_j_t = @chain βₓ .* x_i_j_t begin
           rank"1 eplus 1"(β_w .* w_i)
           eplus(βᵥ .* v_t)
           rank"1 eplus 1"(βₕ .* h_i_t)
       end

I think extending from this gives me the right solution. I knew I was overcomplicating it…

Thank you!

Any thoughts on how to use this programmatically?

i.e.

higher_dim_view(arr::AbstractArray, extra_dims_size::NTuple{N, Integer} where N) = 
   @view arr[repeat([Colon()], length(size(arr)))..., [fill(end, i) for i in extra_dims_size]...]

[fill(end, i) for i in extra_dims] will error because of how ‘end’ works.

Thank you again!

Nevermind, the above does actually work…

This comment is more or less why TensorCast.jl exists. You know that all the _t indices need to line up, but don’t care to think about what permutation will put them all in the 3rd place (let alone debug the result).

It digests something very similar to this underscore notation, and inserts many calls to transmute (also written for this purpose):

julia> using TensorCast

julia> let x = x_i_j_t, w = w_i, v = v_t, h = h_i_t

          @cast y[i,j,t] := βₓ * x[i,j,t] + β_w * w[i] + βᵥ * v[t] + βₕ * h[i,t]

          summary(y)
       end
"2×3×4 Array{Float64, 3}"

julia> @pretty @cast y[i,j,t] := βₓ * x[i,j,t] + β_w * w[i] + βᵥ * v[t] + βₕ * h[i,t]
begin
    @boundscheck ndims(x) == 3 || throw(ArgumentError("expected a 3-tensor x[i, j, t]"))
    @boundscheck axes(x, 1) == axes(w, 1) || throw(DimensionMismatch("range of index i must agree"))
    # ...
    local manatee = transmute(v, Val((nothing, nothing, 1)))
    local kangaroo = transmute(h, Val((1, nothing, 2)))
    y = @__dot__(βₓ * x + β_w * w + βᵥ * manatee + βₕ * kangaroo)
end

Indeed. There was some discussion of whether NamedDims.jl should automatically re-arrange dimensions in broadcasting, or merely check that you haven’t produced a mismatch. My NamedPlus.jl has some experiments in this direction, so that something like @named y{i,j,t} = βₓ * x + β_w*w + βᵥ*v + βₕ*h will use the names of each NamedDimsArray to find the right permutation. (The package may have rotted, by now.)

However, I didn’t like this as much as I thought I would. Mainly because the names are only visible once you run the code, while the ones in the TensorCast.jl expression are visible in the source, and hence easy to check visually.

2 Likes