# Partial transpose / block transpose of matrix: transpose!() overwritten

I’m writing a function to do partial transpose of some matrix. Mathematically, consider a d1-by-d1 blocked matrix, where each block is d2-by-d2, such that the whole matrix is n-by-n, with n=d1*d2. The partial transpose I want is to transpose each of the d2-by-d2 blocks; for example, partial transpose of this block matrix

`````` 0.473994  0.283237   │  0.43439    0.342265
0.937677  0.0781005  │  0.0204182  0.260954
─────────────────────┼─────────────────────
0.55918   0.0600351  │  0.753252   0.25546
0.275325  0.936636   │  0.373879   0.329917
``````

would be

`````` 0.473994   0.937677   │  0.43439   0.0204182
0.283237   0.0781005  │  0.342265  0.260954
──────────────────────┼─────────────────────
0.55918    0.275325   │  0.753252  0.373879
0.0600351  0.936636   │  0.25546   0.329917
``````

this function works as expected:

``````using BlockArrays
using LinearAlgebra

function partial_transpose(rho, d1, d2)
idx = [d2 for i = 1:d1]
blkm = BlockArray(rho, idx, idx)
for i = 1:d1
for j = 1:d1
bfm = blkm[Block(i, j)]
trm = transpose(bfm)
blkm[Block(i, j)] = trm
end
end
Array(blkm)
end
``````

e.g. with

``````r = rand(4,4)
partial_transpose(r,2,2)
``````

in order to speedup/cut down allocation since it’s potentially to be used with big matrices, I’m opting for using `transpose!()` with pre-allocated memory. However, this function does not give the correct result:

``````function partial_transpose_np(rho, d1, d2)
idx = [d2 for i = 1:d1]
blkm = BlockArray(rho, idx, idx)
bfm = Array{eltype(rho)}(undef, d2, d2)
trm = Array{eltype(rho)}(undef, d2, d2)
for i = 1:d1
for j = 1:d1
bfm = blkm[Block(i, j)]
transpose!(trm, bfm)
blkm[Block(i, j)] = trm
end
end
Array(blkm)
end
``````

e.g.

``````julia> rho = rand(4,4)
4×4 Array{Float64,2}:
0.549134  0.87116   0.180123  0.619095
0.949157  0.221436  0.915494  0.0948207
0.607774  0.835062  0.549872  0.901475
0.313521  0.25584   0.117484  0.0186971

julia> partial_transpose(rho,2,2)
4×4 Array{Float64,2}:
0.549134  0.949157  0.180123  0.915494
0.87116   0.221436  0.619095  0.0948207
0.607774  0.313521  0.549872  0.117484
0.835062  0.25584   0.901475  0.0186971

julia> partial_transpose_np(rho,2,2)
4×4 Array{Float64,2}:
0.549872  0.117484   0.549872  0.117484
0.901475  0.0186971  0.901475  0.0186971
0.549872  0.117484   0.549872  0.117484
0.901475  0.0186971  0.901475  0.0186971
``````

I’m not explicitly using `@simd` but looks like julia is doing something to the loop? Note that for a single block (e.g. block 1,1), this works

``````a = rand(2,2)
p = rand(2,2)
rx = BlockArray(rand(4,4),[2,2],[2,2])

p = rx[Block(1,1)]; transpose!(a,p); rx[Block(1,1)] = a
``````

Also, for the `partial_transpose_np`, if I use in-place `PseudoBlockArray` it works again (by convention should append `!` to `partial_transpose_np`)

In your `partial_transpose_np` you are missing a dot. Change the line

``````blkm[Block(i, j)] = trm
``````

to

``````blkm[Block(i, j)] .= trm
``````

and it will work.

The reason is that you were reassigning each block (which is internally stored as a separate array) to `trm`. This way, all the blocks were identical (not just equal) after the loop. Instead you should overwrite the content of each block, hence the dot above.

1 Like

BTW, if you’re after performance, I’d work with `PseudoBlockArrays`:

``````function partial_transpose_new!(rho, d1, d2)
idx = [d2 for i = 1:d1]
blkm = PseudoBlockArray(rho, idx, idx)
bfm = Array{eltype(rho)}(undef, d2, d2)
trm = Array{eltype(rho)}(undef, d2, d2)
for i = 1:d1
for j = 1:d1
getblock!(bfm, blkm, i, j);
transpose!(trm, bfm)
setblock!(blkm, trm, i, j)
end
end
rho
end

partial_transpose_new(rho, d1, d2) = partial_transpose_new!(copy(rho), d1, d2)

r = rand(4,4)
@btime partial_transpose(\$r,2,2);
@btime partial_transpose_np(\$r,2,2);
@btime partial_transpose_new(\$r,2,2);
@btime partial_transpose_new!(\$r,2,2);

``````

which on my machine gives

``````  1.660 μs (20 allocations: 1.64 KiB)
1.490 μs (14 allocations: 1.36 KiB)
529.762 ns (9 allocations: 832 bytes)
424.599 ns (8 allocations: 624 bytes)
``````
3 Likes

Thanks a lot! I thought the block is just a translation of the block index to the `CartesianIndex` so missed the dot… which begs the question why the first version without the `.=` worked?

Because the `trm` that you assign is different in each loop iteration (it is a lazy transpose of the current `bfm`)

1 Like

This is a bit unrelated to the question about simd, but in case it’s helpful-- there is another method for computing partial transposes which we use in Convex.jl (https://github.com/JuliaOpt/Convex.jl/blob/cba179feeaa7934e0e70011ef74c181a7e88ccbd/src/atoms/affine/partialtranspose.jl#L44-L69):

``````function partial_transpose_per(x, sys, dims)
n = length(dims)
d = prod(dims)
s = n - sys + 1
p = collect(1:2n)
p[s] = n + s
p[n + s] = s
rdims = reverse(dims)
r = reshape(x, (rdims..., rdims...))
return reshape(permutedims(r,p),(d,d))
end
``````

where `sys` is the index you want to partial transpose (2 in your case), and `dims = (d1, d2)` in this case.
This method seems a fair bit faster for `d1=d2=10` and `20`:

``````julia> d1=d2=10;

julia> r = rand(d1*d2, d1*d2);

julia> @btime partial_transpose_per(\$r, 2, (\$d1, \$d2));
8.688 μs (12 allocations: 78.83 KiB)

julia> @btime partial_transpose_new(\$r, \$d1, \$d2);
29.467 μs (10 allocations: 80.53 KiB)

julia> d1=d2=20;

julia> r = rand(d1*d2, d1*d2);

julia> @btime partial_transpose_per(\$r, 2, (\$d1, \$d2));
188.248 μs (12 allocations: 1.22 MiB)

julia> @btime partial_transpose_new(\$r, \$d1, \$d2);
505.763 μs (10 allocations: 1.23 MiB)
``````

You can also do it lazily by using `PermutedDimsArray` in place of `permutedims`. That can be much faster to make initially since it doesn’t actually move any memory around (when you ask for an element, it traces back through the reshapes and permutes to find the correct one), but slower on accesses, and in my tests it seems slow to instantiate into a usual `Matrix` (via `copyto!` or `collect`). But it could be useful depending on what you need to do with the partial transposed matrix afterwards.

1 Like