Partial transpose / block transpose of matrix: transpose!() overwritten

I’m writing a function to do partial transpose of some matrix. Mathematically, consider a d1-by-d1 blocked matrix, where each block is d2-by-d2, such that the whole matrix is n-by-n, with n=d1*d2. The partial transpose I want is to transpose each of the d2-by-d2 blocks; for example, partial transpose of this block matrix

 0.473994  0.283237   │  0.43439    0.342265
 0.937677  0.0781005  │  0.0204182  0.260954
 ─────────────────────┼─────────────────────
 0.55918   0.0600351  │  0.753252   0.25546 
 0.275325  0.936636   │  0.373879   0.329917

would be

 0.473994   0.937677   │  0.43439   0.0204182
 0.283237   0.0781005  │  0.342265  0.260954 
 ──────────────────────┼─────────────────────
 0.55918    0.275325   │  0.753252  0.373879 
 0.0600351  0.936636   │  0.25546   0.329917 

this function works as expected:

using BlockArrays
using LinearAlgebra

function partial_transpose(rho, d1, d2)
    idx = [d2 for i = 1:d1]
    blkm = BlockArray(rho, idx, idx)
    for i = 1:d1
        for j = 1:d1
            bfm = blkm[Block(i, j)]
            trm = transpose(bfm)
            blkm[Block(i, j)] = trm
        end
    end
    Array(blkm)
end

e.g. with

r = rand(4,4)
partial_transpose(r,2,2)

in order to speedup/cut down allocation since it’s potentially to be used with big matrices, I’m opting for using transpose!() with pre-allocated memory. However, this function does not give the correct result:

function partial_transpose_np(rho, d1, d2)
    idx = [d2 for i = 1:d1]
    blkm = BlockArray(rho, idx, idx)
    bfm = Array{eltype(rho)}(undef, d2, d2)
    trm = Array{eltype(rho)}(undef, d2, d2)
    for i = 1:d1
        for j = 1:d1
            bfm = blkm[Block(i, j)]
            transpose!(trm, bfm)
            blkm[Block(i, j)] = trm
        end
    end
    Array(blkm)
end

e.g.

julia> rho = rand(4,4)
4×4 Array{Float64,2}:
 0.549134  0.87116   0.180123  0.619095 
 0.949157  0.221436  0.915494  0.0948207
 0.607774  0.835062  0.549872  0.901475 
 0.313521  0.25584   0.117484  0.0186971

julia> partial_transpose(rho,2,2)
4×4 Array{Float64,2}:
 0.549134  0.949157  0.180123  0.915494 
 0.87116   0.221436  0.619095  0.0948207
 0.607774  0.313521  0.549872  0.117484 
 0.835062  0.25584   0.901475  0.0186971

julia> partial_transpose_np(rho,2,2)
4×4 Array{Float64,2}:
 0.549872  0.117484   0.549872  0.117484 
 0.901475  0.0186971  0.901475  0.0186971
 0.549872  0.117484   0.549872  0.117484 
 0.901475  0.0186971  0.901475  0.0186971

I’m not explicitly using @simd but looks like julia is doing something to the loop? Note that for a single block (e.g. block 1,1), this works

a = rand(2,2)
p = rand(2,2)
rx = BlockArray(rand(4,4),[2,2],[2,2])

p = rx[Block(1,1)]; transpose!(a,p); rx[Block(1,1)] = a

Also, for the partial_transpose_np, if I use in-place PseudoBlockArray it works again (by convention should append ! to partial_transpose_np)

In your partial_transpose_np you are missing a dot. Change the line

blkm[Block(i, j)] = trm

to

blkm[Block(i, j)] .= trm

and it will work.

The reason is that you were reassigning each block (which is internally stored as a separate array) to trm. This way, all the blocks were identical (not just equal) after the loop. Instead you should overwrite the content of each block, hence the dot above.

1 Like

BTW, if you’re after performance, I’d work with PseudoBlockArrays:

function partial_transpose_new!(rho, d1, d2)
    idx = [d2 for i = 1:d1]
    blkm = PseudoBlockArray(rho, idx, idx)
    bfm = Array{eltype(rho)}(undef, d2, d2)
    trm = Array{eltype(rho)}(undef, d2, d2)
    for i = 1:d1
        for j = 1:d1
            getblock!(bfm, blkm, i, j);
            transpose!(trm, bfm)
            setblock!(blkm, trm, i, j)
        end
    end
    rho
end


partial_transpose_new(rho, d1, d2) = partial_transpose_new!(copy(rho), d1, d2)

r = rand(4,4)
@btime partial_transpose($r,2,2);
@btime partial_transpose_np($r,2,2);
@btime partial_transpose_new($r,2,2);
@btime partial_transpose_new!($r,2,2);

which on my machine gives

  1.660 μs (20 allocations: 1.64 KiB)
  1.490 μs (14 allocations: 1.36 KiB)
  529.762 ns (9 allocations: 832 bytes)
  424.599 ns (8 allocations: 624 bytes)
3 Likes

Thanks a lot! I thought the block is just a translation of the block index to the CartesianIndex so missed the dot… which begs the question why the first version without the .= worked?

Because the trm that you assign is different in each loop iteration (it is a lazy transpose of the current bfm)

1 Like

This is a bit unrelated to the question about simd, but in case it’s helpful-- there is another method for computing partial transposes which we use in Convex.jl (https://github.com/JuliaOpt/Convex.jl/blob/cba179feeaa7934e0e70011ef74c181a7e88ccbd/src/atoms/affine/partialtranspose.jl#L44-L69):

function partial_transpose_per(x, sys, dims)
	  n = length(dims)
	  d = prod(dims)
	  s = n - sys + 1
	  p = collect(1:2n)
	  p[s] = n + s
	  p[n + s] = s
	  rdims = reverse(dims)
	  r = reshape(x, (rdims..., rdims...))
	  return reshape(permutedims(r,p),(d,d))
end

where sys is the index you want to partial transpose (2 in your case), and dims = (d1, d2) in this case.
This method seems a fair bit faster for d1=d2=10 and 20:

julia> d1=d2=10;

julia> r = rand(d1*d2, d1*d2);

julia> @btime partial_transpose_per($r, 2, ($d1, $d2));
  8.688 μs (12 allocations: 78.83 KiB)

julia> @btime partial_transpose_new($r, $d1, $d2);
  29.467 μs (10 allocations: 80.53 KiB)

julia> d1=d2=20;

julia> r = rand(d1*d2, d1*d2);

julia> @btime partial_transpose_per($r, 2, ($d1, $d2));
  188.248 μs (12 allocations: 1.22 MiB)

julia> @btime partial_transpose_new($r, $d1, $d2);
  505.763 μs (10 allocations: 1.23 MiB)

You can also do it lazily by using PermutedDimsArray in place of permutedims. That can be much faster to make initially since it doesn’t actually move any memory around (when you ask for an element, it traces back through the reshapes and permutes to find the correct one), but slower on accesses, and in my tests it seems slow to instantiate into a usual Matrix (via copyto! or collect). But it could be useful depending on what you need to do with the partial transposed matrix afterwards.

1 Like