Indexing on a pseudo-dimension

Suppose I have a K x M array, my_arr, but I want to pretend that it is a K x M x N array, my_pseudo_arr , so that indexing it with my_pseudo_arr[k, m, n] equals my_arr[k, m], my_pseudo_arr[k, m, :] equals repeat([my_arr[k,m]], N), and my_pseudo_arr[k, m, n+1] throws an error.

Obviously, I could construct a new higher dimensional array where I repeat my_arr N times. But it seems computationally wasteful to construct this thing when I could solve this problem simply by reinterpretting the getindex function.

Can someone point me in the right direction? Is there existing functionality in Base or some other package that makes this trivial?

Thanks!

You can create a new type inheriting AbstractArray that just wraps an existing array, and then define its getindex. I am not sure if there is a package that already does that.

Also as an exercise for myself, didn’t find a package:

 struct DimPlusOneArray{T, N, AA} <: AbstractArray{T, N}
     parent::AA
     lengthD::Int
     size::NTuple{N, Int}
     function DimPlusOneArray(A, lengthD)
         sz = ntuple(i -> i ≤ ndims(A)  ? size(A, i) : lengthD,
                           Val(ndims(A) + 1))  
         new{eltype(A), ndims(A) + 1, typeof(A)}(A, lengthD, sz) 
     end 
 end
 
 @inline function Base.getindex(M::DimPlusOneArray{T, N, AA},
                                                    I::Vararg{Int, N}) where {T, N, AA} 
     @boundscheck checkbounds(M, I...)
     return Base.getindex(M.parent, I[begin:end-1]...)
 end
 
 Base.size(A::DimPlusOneArray{T, N, AA}) where {T, N, AA} = A.size

It works, but there is a single allocation for array access. Not sure where it comes from:

julia> @time A = DimPlusOneArray(randn((2,2)), 3)
  0.000003 seconds (2 allocations: 144 bytes)
2×2×3 DimPlusOneArray{Float64, 3, Matrix{Float64}}:
[:, :, 1] =
 -0.358432  2.17364
 -0.174398  0.48638

[:, :, 2] =
 -0.358432  2.17364
 -0.174398  0.48638

[:, :, 3] =
 -0.358432  2.17364
 -0.174398  0.48638

julia> @time A[1, 2, 2]
  0.000004 seconds (1 allocation: 16 bytes)
2.1736430112129765

julia> @time A[1, 2, 1]
  0.000005 seconds (1 allocation: 16 bytes)
2.1736430112129765

julia> @time A[1, 1, 1]
  0.000004 seconds (1 allocation: 16 bytes)
-0.35843209612481847

julia> @time A[1, 2, 5]
ERROR: BoundsError: attempt to access 2×2×3 DimPlusOneArray{Float64, 3, Matrix{Float64}} at index [1, 2, 5]
Stacktrace:
 [1] throw_boundserror(A::DimPlusOneArray{Float64, 3, Matrix{Float64}}, I::Tuple{Int64, Int64, Int64})
   @ Base ./abstractarray.jl:744
 [2] checkbounds
   @ ./abstractarray.jl:709 [inlined]
 [3] getindex(::DimPlusOneArray{Float64, 3, Matrix{Float64}}, ::Int64, ::Int64, ::Int64)
   @ Main ~/.julia/dev/RepeatedArrays.jl/RepeatedArrays.jl:16
 [4] top-level scope
   @ ./timing.jl:273 [inlined]
 [5] top-level scope
   @ ./REPL[13]:0
2 Likes

I would guess that it is from actually creating an intermediary array from the slice form? Did you try using @view (not sure if it will work with Vararg)? Or maybe this needs a generator function so there is guarantee the code will specialize the splat using the N information.

That’s a tuple, not an array.

1 Like

There are certainly packages, e.g. mine (which lists alternatives in its readme):

julia> mat = [1 2 3; 4 5 6];

julia> using LazyStack

julia> lazystack((mat, mat))
2×3×2 lazystack(::Tuple{Matrix{Int64}, Matrix{Int64}}) with eltype Int64:
[:, :, 1] =
 1  2  3
 4  5  6

[:, :, 2] =
 1  2  3
 4  5  6

julia> lazystack(fill(mat, 10))
2×3×10 lazystack(::Vector{Matrix{Int64}}) with eltype Int64:
[:, :, 1] =
 1  2  3
 4  5  6
...

However, operations consuming this lazy array will sometimes be much less optimised than the same operations on an Array. What are you doing next?

It is almost always better (IMO) to adjust the next operation to accept the data you really have, rather than wrap the data in something like this.

3 Likes

I think that’s a measurement problem, @btime $A[1, 2, 2] has zero.

2 Likes

Same behaviour for ShiftedArrays

julia> M2 = ShiftedArrays.circshift(M, (1,2,3))
2×2×2 CircShiftedArray{Float64, 3, Array{Float64, 3}}:
[:, :, 1] =
 -0.140213   0.981398
  0.593498  -1.45221

[:, :, 2] =
 -0.870943  -1.08996
  1.00483    1.72976

julia> using BenchmarkTools

julia> @time M2[1,1,1]
  0.000008 seconds (1 allocation: 16 bytes)
-0.14021348397210173

julia> @time M2[1,1,1]
  0.000017 seconds (1 allocation: 16 bytes)
-0.14021348397210173

julia> @btime $M2[1,1,1]
  3.617 ns (0 allocations: 0 bytes)
-0.14021348397210173

True, but an intermediary tuple should not allocate, so maybe lack of inference led the getindex/... to be dynamically dispatched? The problem is, @code_typed gave me no clue of some problem with inference.

Minimal examples, the difference is in the use of the parent function:

  1. Allocation
struct S{T, n, A <: AbstractArray{T}} <: AbstractArray{T, n}
  parent::A
  function S{T,n,A}(a::A) where {T, n, m, A <: AbstractArray{T, m}}
    new{T, n, A}(a)
  end
end

S(a::A) where {T, m, A <: AbstractArray{T, m}} = S{T, m + 1, A}(a)

Base.getindex(a::S{<:Any, n}, I::Vararg{Int, n}) where {n} = (a.parent)[Base.front(I)...]

Base.size(a::S) = (size(a.parent)..., 5)

const arr = S(rand(2))

@allocated arr[1, 2]
@allocated arr[1, 2]  # nonzero
  1. No allocation
struct S{T, n, A <: AbstractArray{T}} <: AbstractArray{T, n}
  parent::A
  function S{T,n,A}(a::A) where {T, n, m, A <: AbstractArray{T, m}}
    new{T, n, A}(a)
  end
end

S(a::A) where {T, m, A <: AbstractArray{T, m}} = S{T, m + 1, A}(a)

parent(a::S) = a.parent

Base.getindex(a::S{<:Any, n}, I::Vararg{Int, n}) where {n} = parent(a)[Base.front(I)...]

Base.size(a::S) = (size(a.parent)..., 5)

const arr = S(rand(2))

@allocated arr[1, 2]
@allocated arr[1, 2]  # zero

Will report a bug now. EDIT: possibly related bug reports:

1 Like

BTW, regarding the original question, the documentation is here (the first two sections can be skipped):

https://docs.julialang.org/en/v1/manual/interfaces/