Efficient and in-place computation of A + adjoint(A)

If A is any complex-valued matrix, is there a more efficient way of computing A + adjoint(A) in-place than the following ?

function hermitian(A, A_tmp)
  adjoint!(A_tmp, A)
  A .+= A_tmp
end

Benchmarking with

using LinearAlgebra
using BenchmarkTools

const N = 200
A = rand(ComplexF64, N, N)
A_tmp = zeros(ComplexF64, N, N)
@btime hermitian(A, A_tmp)

yields 70.243 μs (0 allocations: 0 bytes), which seems quite slow for such a small matrix. Is there any other more efficient implementation ?

1 Like

See https://github.com/JuliaLang/julia/pull/31836

1 Like

Thanks for the link. So, just for anyone else who might be interested. It seems that the following is the most efficient if you want the full matrix to be updated at the end.

function hermitian!(A)
    @inbounds for j in 1:N
        A[j, j] = 2 * real(A[j, j])
        for i in 1:(j-1)
            A[i, j] = A[i, j] + conj(A[j, i])
        end
    end
    @inbounds for j in 1:N
        for i in j+1:N
            A[i, j] = conj(A[j, i])
        end
    end
end

If you’re only interested in the upper diagonal part of the matrix, you can skip the second for loop, and gain a factor of 2 on efficiency.

A little bit faster (on my machine) is:

function hermitian!(A::Matrix{Complex{T}}) where T
    ( N = size(A,1) ) == size(A, 2) || error("Matrix must be square")
    @inbounds for j in 1:N
        A[j, j] = 2 * real(A[j, j])
        for i in 1:(j-1)
            r1, i1 = reim(A[i,j])
            r2, i2 = reim(A[j,i])
            rc, ic = r1+r2, i1-i2
            A[i, j], A[j, i] = Complex{T}(rc,ic), Complex{T}(rc,-ic)
        end
    end
end

Once A[j,i] is touched in the first loop, it is better to write to it while it is hot in the cache.

6 Likes

This simpler version would be as fast as Dan’s solution:

function hermitian!(A)
    (N = size(A,1)) == size(A,2) || error("Matrix must be square")
    @inbounds for i = 1:N, j = i:N
        A[i, j] = A[i, j] + A[j, i]'
        A[j, i] = A[i, j]'
    end
end
6 Likes

Note that if A is an OffsetArray, Julia may terminate with this error:

Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x7ffba590f6c1 -- RtlAllocateHeap at C:\WINDOWS\SYSTEM32\ntdll.dll (unknown line)
in expression starting at none:0
RtlAllocateHeap at C:\WINDOWS\SYSTEM32\ntdll.dll (unknown line)
RtlAllocateHeap at C:\WINDOWS\SYSTEM32\ntdll.dll (unknown line)
malloc at C:\WINDOWS\System32\msvcrt.dll (unknown line)
aligned_offset_malloc at C:\WINDOWS\System32\msvcrt.dll (unknown line)

@inbounds together with 1-based indexing is not advisable. Use axes. Many have argued lately that it’s fine to use 1:size(), but then remove @inbounds.

for i in axes(A,1)

but how to index j?

for i in axes(A, 1), j in i:lastindex(A, 2)

Or, simply, the way it is already, but without the @inbounds annotation.

@inbounds is for when the indices are provably inbounds. That could be achieved by using eachindex, axes, etc. Or by only accepting Array as input type, or by calling Base.require_one_based_indexing if you strongly prefer 1:N indexing.

1 Like

Somebody came up with a really clever solution to this problem, it just needs to be fleshed out and implemented

Without restarting the whole offset array discussion again, reshape(A, size(A)) should return a 1-based view for all arrays

julia> A = zeros(3:4, 4:5)
2×2 OffsetArray(::Matrix{Float64}, 3:4, 4:5) with eltype Float64 with indices 3:4×4:5:
 0.0  0.0
 0.0  0.0

julia> reshape(A, size(A))
2×2 Matrix{Float64}:
 0.0  0.0
 0.0  0.0

There might be missing cases that need to be fixed, but this is what the function is expected to return.

For those of us who like 1-based indexing (:raising_hand_man:) but want performance and genericism without ripping up our code and making it unrecognizeable, shall this be the preferred idiom?

hermitian!(A::AbstractArray) = let A=reshape(A, size(A))
    (N = size(A,1)) == size(A,2) || error("Matrix must be square")
    @inbounds for i = 1:N, j = i:N
        A[i, j] = A[i, j] + A[j, i]'
        A[j, i] = A[i, j]'
    end
end

Is this the notation for creating an OffsetArray of zeros? Isn’t that type piracy?

1 Like

I argue that OffsetArrays should be added to Base, in which case it wouldn’t be piracy anymore :wink:

Yes, it’s type-piracy, and is expected to be removed in the next breaking release of OffsetArrays.

2 Likes