I’m writing a function to do partial transpose of some matrix. Mathematically, consider a d1-by-d1 blocked matrix, where each block is d2-by-d2, such that the whole matrix is n-by-n, with n=d1*d2. The partial transpose I want is to transpose each of the d2-by-d2 blocks; for example, partial transpose of this block matrix

```
0.473994 0.283237 │ 0.43439 0.342265
0.937677 0.0781005 │ 0.0204182 0.260954
─────────────────────┼─────────────────────
0.55918 0.0600351 │ 0.753252 0.25546
0.275325 0.936636 │ 0.373879 0.329917
```

would be

```
0.473994 0.937677 │ 0.43439 0.0204182
0.283237 0.0781005 │ 0.342265 0.260954
──────────────────────┼─────────────────────
0.55918 0.275325 │ 0.753252 0.373879
0.0600351 0.936636 │ 0.25546 0.329917
```

this function works as expected:

```
using BlockArrays
using LinearAlgebra
function partial_transpose(rho, d1, d2)
idx = [d2 for i = 1:d1]
blkm = BlockArray(rho, idx, idx)
for i = 1:d1
for j = 1:d1
bfm = blkm[Block(i, j)]
trm = transpose(bfm)
blkm[Block(i, j)] = trm
end
end
Array(blkm)
end
```

e.g. with

```
r = rand(4,4)
partial_transpose(r,2,2)
```

in order to speedup/cut down allocation since it’s potentially to be used with big matrices, I’m opting for using `transpose!()`

with pre-allocated memory. However, this function does not give the correct result:

```
function partial_transpose_np(rho, d1, d2)
idx = [d2 for i = 1:d1]
blkm = BlockArray(rho, idx, idx)
bfm = Array{eltype(rho)}(undef, d2, d2)
trm = Array{eltype(rho)}(undef, d2, d2)
for i = 1:d1
for j = 1:d1
bfm = blkm[Block(i, j)]
transpose!(trm, bfm)
blkm[Block(i, j)] = trm
end
end
Array(blkm)
end
```

e.g.

```
julia> rho = rand(4,4)
4×4 Array{Float64,2}:
0.549134 0.87116 0.180123 0.619095
0.949157 0.221436 0.915494 0.0948207
0.607774 0.835062 0.549872 0.901475
0.313521 0.25584 0.117484 0.0186971
julia> partial_transpose(rho,2,2)
4×4 Array{Float64,2}:
0.549134 0.949157 0.180123 0.915494
0.87116 0.221436 0.619095 0.0948207
0.607774 0.313521 0.549872 0.117484
0.835062 0.25584 0.901475 0.0186971
julia> partial_transpose_np(rho,2,2)
4×4 Array{Float64,2}:
0.549872 0.117484 0.549872 0.117484
0.901475 0.0186971 0.901475 0.0186971
0.549872 0.117484 0.549872 0.117484
0.901475 0.0186971 0.901475 0.0186971
```

I’m not explicitly using `@simd`

but looks like julia is doing something to the loop? Note that for a single block (e.g. block 1,1), this works

```
a = rand(2,2)
p = rand(2,2)
rx = BlockArray(rand(4,4),[2,2],[2,2])
p = rx[Block(1,1)]; transpose!(a,p); rx[Block(1,1)] = a
```

Also, for the `partial_transpose_np`

, if I use in-place `PseudoBlockArray`

it works again (by convention should append `!`

to `partial_transpose_np`

)