Virtual (or lazy) representation of a repeated array

Is there an array type, that allows me to create a virtual (or lazy) representation of the following array x ?

sz = (64, 64, 2, 10589)
tmp = reshape(range(-1,1,64),(:,1,1,1));
x = repeat(tmp,1,sz[2],sz[3],sz[4]);

It would be ok to materialize in memory the 64 elements of the range. But I would like to avoid having all 64*64*2*10589 elements.

I tried LazyArrays, but the result does not seem to be an AbstractArray and I cannot use it with Flux.DataLoader for example:

using Flux
data = randn(64, 64, 2, 10589)
tmp = reshape(range(-1,1,64),(:,1,1,1));
x = repeat(tmp,1,sz[2],sz[3],sz[4]);
dataloader = Flux.DataLoader((data,x); batchsize=128)
first(dataloader)
# ok
using LazyArrays
x_lazy = @~ repeat(tmp,1,sz[2],sz[3],sz[4]);
dataloader = Flux.DataLoader((data,x_lazy); batchsize=128)

I get the error:

ERROR: MethodError: no method matching length(::Applied{LazyArrays.DefaultApplyStyle, typeof(repeat), Tuple{Base.ReshapedArray{…}, Vararg{…}}})
The function `length` exists, but no method is defined for this combination of argument types.

Any ideas?

Maybe a combination of FillArrays and BlockArrays?

sz = (64, 64, 2, 10589)
tmp = reshape(range(-1,1,64),(:,1,1,1));
x = repeat(tmp,1,sz[2],sz[3],sz[4]);
using FillArrays, BlockArrays
y = mortar(Fill(tmp, (1,sz[2:4]...)))
x == y #true

This is almost purely virtual I think.

1 Like

My first thought is simply Iterators.cycle. From the docs:

An iterator that cycles through iter forever. If n is specified, then it cycles through iter that many times. When iter is empty, so are cycle(iter) and cycle(iter, n).

Iterators.cycle(iter, n) is the lazy equivalent of Base.repeat(vector, n), while Iterators.repeated(iter, n) is the lazy Base.fill(item, n).

2 Likes

This seems like a bit of an XY problem. Why create this virtual array at all? You seem to want to circumvent the Flux.DataLoader, and return the same data each time. Why not just make that array once, something like x1 = repeat(tmp, 1, sz[2], sz[3], batchsize), and use it every time, instead of pulling this from the DataLoader.

That said I am a bit surprised that LazyArryas fails, but there are probably dozens of other packages which would let you create such a virtual array. Here’s one, with a smaller example.

julia> tmp = rand(Int8, 4)
4-element Vector{Int8}:
 109
  92
  72
  90

julia> x = repeat(tmp, 1, 5)
4×5 Matrix{Int8}:
 109  109  109  109  109
  92   92   92   92   92
  72   72   72   72   72
  90   90   90   90   90

julia> using LazyStack

julia> lazystack(fill(tmp, 5))
4×5 lazystack(::Vector{Vector{Int8}}) with eltype Int8:
 109  109  109  109  109
  92   92   92   92   92
  72   72   72   72   72
  90   90   90   90   90

julia> using Flux

julia> Flux.DataLoader(lazystack(fill(tmp, 5)))
5-element DataLoader(lazystack(::Vector{Vector{Int8}}))
  with first element:
  4×1 Matrix{Int8}

julia> first(ans)
4×1 Matrix{Int8}:
 109
  92
  72
  90
3 Likes

In my case, I want the dataloader to be decoupled from the training loop. The dataloader just provides a batch (as a tuple of arrays) and in the training loop all arrays in the tuple are concatenated channel-wise. For most arrays the data varies along all axes, but not for all. Typically, the training loop should be in a generic package, but the dataloader would be case specific.

Thank you all for the suggestions about FillArrays, BlockArrays and LazyStack. They all work great!