Custom reshape() function for specific usage

Hi there ! Is there any way to minimize the memory usage of the following custom function graded_reshape() ( Or even achieve the same performance as the Julia reshape function ) ?

Suppose we have a usual Julia reshape function which receives Array like rand(10, 10, 10, 10) and do the following reshaping :

    function usual_reshape(A)
       C = reshape(A, (10, 100, 10))
       return C
    end

But instead of naive reshaping above, we want to achieve the following graded_reshape() function :

    function graded_reshape(A)
        B1 = A[:, 1:6, 1:6, :]
        B2 = A[:, 7:10, 7:10, :]
        B3 = A[:, 1:6, 7:10, :]
        B4 = A[:, 7:10, 1:6, :]
        C1 = reshape(B1 , (10, 36, 10))
        C2 = reshape(B2, (10, 16, 10)) 
        C3 = reshape(B3, (10, 24, 10))
        C4 = reshape(B4, (10, 24, 10))
        C = cat(C1, C2, C3, C4; dims=2)
        return C

     end

The latter function has more memory allocations due to the cat function :

    A = rand(10, 10, 10, 10)
    @btime usual_reshape($A)

25.148 ns (2 allocations: 96 bytes)

    @btime graded_reshape($A)

23.406 μs (61 allocations: 158.50 KiB)

Note that the size after reshaping is the same for reshape() and graded_reshape(), i.e. (10, 100, 10). While the latter one adjust the elements within the reshaped dimension.

Is there any good idea ? Thanks in advance !!!

Or even achieve the same performance as the Julia reshape function ?

A julia array is a blob of memory, plus some metadata that signifies the meaning of that data.

The julia reshape function is so fast because no data is copied – we’re just wrapping different metadata around the same blob of memory. Since the reshaped array references the same data, it will change along if the original array is modified.

So you should not think in terms of multi-dimensional arrays / tensors (4 indices) but rather in terms of one-dimensional arrays (Vectors / blobs of memory), with convenience functions that compute a 1-d index from the 4d index. In other words, the fast reshape is effectively a no-op / view.

For example for 4-tensors,

A[i+1, j+1, k+1, ell+1] == A[1 + i + size(A,1)*j + size(A,1) * size(A,2) * k + size(A,1) * size(A,2) * size(A,3) * ell]

So you need to think about whether that can work for you – if the input / output memory layouts are the same then there is a fast reshape, otherwise you need to allocate and copy. This is not a limitation of the julia language, it is a limitation of linear memory addressing (special purpose hardware contexts with multi-dimensional memory addressing are relatively rare).

2 Likes

Note that in Julia this makes a copy of the array. Maybe you could avoid some allocation by using views instead, e.g. try

function graded_reshape(A)
        B1 = @view A[:, 1:6, 1:6, :]
        B2 = @view A[:, 7:10, 7:10, :]
        B3 = @view A[:, 1:6, 7:10, :]
        B4 = @view A[:, 7:10, 1:6, :]
        C1 = reshape(B1 , (10, 36, 10))
        C2 = reshape(B2, (10, 16, 10)) 
        C3 = reshape(B3, (10, 24, 10))
        C4 = reshape(B4, (10, 24, 10))
        C = cat(C1, C2, C3, C4; dims=2)
        return C
     end

Alternatively, maybe you could choose a better data representation that allows for this transformation more naturally than using a single array.

1 Like

Thanks for your kind reply. The use of @view is indeed crucial when doing slicing. But as you and @foobar_lv2 all mentioned, I guess a better design of the underlying data representation should be more essential to my problem. I might think it more carefully.

I guess TensorKit.jl has some possible optimizations closely related to my problem (where they fuse two indices with Z2 symmetry), so I need to take time and read its implementations.