For reference, here is an implementation of this strategy:
Wᵀ = reshape([conj(X[i,k] * Z[i,ℓ]) for ℓ in 1:m, k in 1:n, i in 1:N], m*n, N)
reshape(Wᵀ' \ Y, m, n)
I’ve verified that it gives the same answer (to 12 digits) as the iterative lsqr
example above, which gives me some confidence that my code is correct.
The complexity of the direct OLS approach is O(N(mn)^2). In contrast, each iteration of the lsqr
approach with implicit (matrix-free) operators is O(Nmn). So, if the iteration converges quickly enough, and mn is large enough, the latter should be faster