In my case, I want the dataloader to be decoupled from the training loop. The dataloader just provides a batch (as a tuple of arrays) and in the training loop all arrays in the tuple are concatenated channel-wise. For most arrays the data varies along all axes, but not for all. Typically, the training loop should be in a generic package, but the dataloader would be case specific.
Thank you all for the suggestions about FillArrays, BlockArrays and LazyStack. They all work great!