Why is the '
operator the conjugate transpose operator and not just the transpose operator or the conjugate operator?
The conjugate transpose is really natural for complex matrices.
Because conjugate transpose (“adjoint”) is more important than either transpose or conjugation in linear algebra, thanks to its relationship to Euclidean inner products. The Euclidean inner product of vectors is \langle x, y \rangle = x' y = \overline{x^T} y (or x ⋅ y
or x'y
in Julia), in which case \langle x, Ay \rangle = \langle A'x, y \rangle.
(This is the main reason why transposition is an important operation for real matrices in linear algebra, and the reason why the operation has a name and a symbol, as opposed to some other rearrangement of a matrix like flipping it upside down or rotating it 90°. When you go to complex matrices, the analogous operation is the conjugate transpose.)
In general for complex matrices, the transpose alone or the conjugate alone have very limited meaning and utility. You almost always want to talk about the Hermititan adjoint for any practical matters.
One might then wonder why? Well, the way I like to think about it is that the reason is actually the same reason that transpose is recursive in julia. I.e. if I have the block matrix,
julia> M = reshape([[1 2; 3 4], [5 6; 7 8], [9 0; 1 2], [3 4; 5 6]], 2, 2)
2×2 Matrix{Matrix{Int64}}:
[1 2; 3 4] [9 0; 1 2]
[5 6; 7 8] [3 4; 5 6]
taking the transpose of this matrix also takes the transpose of the elements:
julia> transpose(M)
2×2 transpose(::Matrix{Matrix{Int64}}) with eltype Transpose{Int64, Matrix{Int64}}:
[1 3; 2 4] [5 7; 6 8]
[9 1; 0 2] [3 5; 4 6]
The reason we take this transpose recursively is that the block matrix M
can somewhat straigtforwardly be reinterpreted as a regular matrix N
:
julia> N = [[1 2; 3 4] [5 6; 7 8]
[9 0; 1 2] [3 4; 5 6]]
4×4 Matrix{Int64}:
1 2 5 6
3 4 7 8
9 0 3 4
1 2 5 6
with the same algebraic properties. For instance,
julia> M^2
2×2 Matrix{Matrix{Int64}}:
[58 22; 86 38] [64 78; 104 126]
[40 26; 58 22] [74 90; 64 78]
julia> N^2
4×4 Matrix{Int64}:
58 22 64 78
86 38 104 126
40 26 74 90
58 22 64 78
Hence, we need the transpose of N
and M
to match if they’re going to share algebraic properties in a ‘faithful’ way.
What does this have to do with complex numbers? Well, we typically treat complex numbers as their own fundamental algebraic objects, but we actually do have the freedom to reinterpret them in terms of matrix algebra over real numbers. Observe:
julia> 𝟙 = [1 0; 0 1];
julia> 𝕀 = [0 1; -1 0];
julia> 𝟙^2 == 𝟙 # just like 1^2 == 1
true
julia> 𝕀^2 == -𝟙 # just like im^2 == -1
true
and all sorts of things ‘just work’:
julia> exp(𝕀 * π) ≈ -𝟙 # just like exp(im*π) ≈ -1 see https://en.wikipedia.org/wiki/Euler%27s_identity
true
Now a natural question becomes: what is the equivalent of the complex conjugate in this representation? It’s exactly the transpose!
julia> transpose(𝟙) == 𝟙
true
julia> transpose(𝕀) == -𝕀 # conj(im) == -im
true
e.g.
julia> z = 3 - 4im;
julia> √(z * conj(z)) # absolute value of z
5.0 + 0.0im
julia> Z = 3𝟙 - 4𝕀;
julia> √(Z*transpose(Z)) ≈ 5𝟙
true
So for the same reason that it’s very natural for transpose
to be recursive on matrices, transpose
really is only meaningful from an algebraic point of view if it is also accompanied by a complex conjugate.
Hence, in julia we don’t really use transpose
much for linear algebra, and instead always use the algebraically more meaningful adjoint
.
Note that the conjugate transpose operation is the same as the transpose operation for real matrices. This is mentioned in the help docstring:
For real matrices, the adjoint operation is equivalent to a transpose.
julia> A = reshape([x for x in 1:4], 2, 2)
2×2 Matrix{Int64}:
1 3
2 4
julia> A'
2×2 adjoint(::Matrix{Int64}) with eltype Int64:
1 2
3 4
julia> adjoint(A) == transpose(A)
true
Defining '
as adjoint rather than just transpose makes extension to complex numbers work naturally as well.
This seems to me to be making things more complicated than necessary. As I said above, the underlying reason for conjugate-transposition stems from how you define an inner product of complex vectors.
You want an inner (“dot”) product of a vector with itself to be a norm², which should always be ≥ 0 (and = 0 only if the vector is zero). But for complex vectors this means you need a conjugation.
Consider the “column” vector z = [1, i]. Then:
does not make sense as a norm because z \ne 0. Whereas if you conjugate:
you get a reasonable “Euclidean” length² ≥ 0.
In general, for an n-component complex vector z \in \mathbb{C}^n, you have
which is clearly ≥ 0 for all z, and = 0 only if z = 0.
Once you accept that complex conjugation is necessary for inner products of complex vectors, then it follows that you want the conjugate transpose for matrices too (to move the matrix from one side to the other of an inner product).
(It is for the same reason that you want the complex-transposition to be recursive for matrices of matrices, in order that x'y
is an inner product for vectors of vectors.)
Sure, and I quite liked your argument. I just wanted to give a slightly different perspective which I felt was complementary.
In particular, the stuff about inner products comes out for free if we work in this matrix-representation of complex number (though we need to be careful about the fact that my representation was over-complete, so we need to divide off the matrix dimension):
julia> function mdot(x::Matrix, y::Matrix)
@assert size(x, 1) == size(x, 2) == size(y, 1) == size(y, 2)
tr(transpose(x)*y) / size(x, 1)
end;
julia> mdot(𝟙 + 𝕀, 𝟙 + 𝕀)
2.0
Collection{Vector{Int}}
isn’t typically interpreted as Vector{Int}
, unless the collection in a type that’s specifically for that purpose, like Iterators.Flatten
. Likewise, Matrix{Matrix{Float64}}
isn’t the same type as Matrix{Float64}
, I don’t see why it should be reinterpreted that way.
If a type that treats AbstractMatrix{Matrix{Float64}}
is needed, let it be BlockMatrix{Matrix{Float64}}
: the fact that there are two levels of matrices here is an implementation detail - from an interface point of view, it’s just a big matrix. That way transpose(::Matrix)
can just be normal, since there is no reason for it to be recursive.
If you want non-recursive transpose, i.e. if you are not doing linear algebra, you can use permutedims(matrix)
.
If you have a vector of vectors, the motivation for the current design (IIRC) was that the algebraic operation x'y
should still be an inner product, i.e. x'
should be a covector, which means it should still be recursive conjugate-transpose. Then it follows that matrices should do the same thing.
'
is the conjugate transpose, which I’m not opining on here. I’m saying I don’t see why transpose
should be recursive. I think my post applies just fine to linear algebra, which afaik doesn’t typically use Matrix{Matrix{<:Number}}
, rather it uses Matrix{<:Number}
or BlockMatrix{Matrix{T<:Number}} <: AbstractMatrix{T}
, where the inner matrices are an implementation detail, not part of the interface.
This is slightly begging the question. I would say that '
is the adjoint, and in fact x'
in Julia is a synonym for a call to adjoint(x)
: adjoint turns a vector to a covector (an operator that takes the inner product with x
), and moves operators from one side to another of the inner product.
For column vectors and matrices of scalars, this corresponds to the conjugate-transpose operation, but that’s a consequence of being the adjoint and not the other way around. For vectors of vectors, adjoint is necessarily a recursive operation.
It’s a question of whether you want transpose
to be an unconjugated adjoint
, i.e. the linear-algebraic operation, or permutedims
. I think of “transpose” as an algebraic term in this context, but tastes can vary.
And of course, there are ways to define the non-recursive transpose or conjugate-transpose in algebraic terms. But Julia had to pick one definition for each function.
I’m not talking about general language semantics, I’m talking about linear algebra and what actually makes sense from an algebraic point of view.
I got curious what C++ (and Python/NumPy) does, or that is what C++26, will do, i.e. over a decade after Julia (in its standard that is), and seemingly it chose the wrong, or opposite default, for the short(er) function vs Julia’s '
(but it’s same a transpose
function in Julia):
https://en.cppreference.com/w/cpp/numeric/linalg
transposed (C++26) returns a new std::mdspan representing the transpose of the input matrix by the given std::mdspan
(function template)conjugate_transposed (C++26) returns a conjugate transpose view of an object
(function template)
Does C++ have anything in this general area that Julia does not (or Julia they don’t have, I think so)?
I don’t care enough to see if they have '
for natural short syntax. I guess the Julia docs should document in vs C++ section. At least C++ is finally getting A[1] syntax.
https://numpy.org/doc/stable/reference/generated/numpy.ndarray.T.html