Oops, I see. I’m sorry for the confusion. Thanks for the effort anyway! Do you think a modification of your method could still work, or does it crucially exploit all of the Kronecker-product structure?
To be honest, I still can’t see why
\min_b \| \text{vec}(\text{diag}(Y)) - (X \otimes Z) \text{vec}(b) \|_2
(as suggested in the second post) isn’t the problem you’re trying to solve. So I don’t understand the problem and definitely can’t recommend how to solve it. Sorry.
In fact, this was not correct. Minimizing \Vert Y - Z B X^T \Vert (or the equivalent expression with Kronecker products) looks at all the elements of Z B X^T, when in fact you only want to minimize the differences in the diagonal elements.
So, @mikmoore’s solution is actually not applicable here because it is solving the wrong problem (my fault), in addition to being too slow (because \operatorname{diagm}(y) is too big here).
The iterative algorithm (above) as well as the direct algorithm (above) solve the correct problem, though.
In my use-case unfortunately, the method does not converge in reasonable time, so I’ll need to use an approximate method that I developed earlier in this use case. I’ll try to think about whether it is possible to develop a more direct method at a later stage.
That may mean that your problem is badly conditioned — have you considered using a Tikhonov (ridge regression) regularization? That corresponds to setting a damping parameter \lambda > 0 in IterativeSolvers.lsmr
. This should improve the conditioning and convergence rate, and robustify the solutions to noise, at the expense of residual — you’ll have to experiment to find an appropriate regularization strength for your application.
The corresponding “direct”/non-iterative algorithm (ala the code above) is a variant of the normal equations:
B = reshape(cholesky(Hermitian(Wᵀ * Wᵀ' + λ * I)) \ (Wᵀ * Y), m, n)
(which is dominated by the O(N(mn)^2) cost of computing W^T W).
I originally thought my problem would not be badly conditioned, but that may be a very good direction to look into. Thanks!
I thought I’d try another method. Inverse regression methods are useful if one is interested in a low-rank predictor subspace:
using DataFrames, LinearAlgebra, Statistics
function indmat(Y::AbstractVector; nq=4)
# Slice Y using quantiles
q = quantile(Y, range(0, 1.0, length=nq+1))
@show q
z = [zeros(nq-1); 0.1]
ind = (Y .>= q[1:end-1]') .& (Y .< (q[2:end] .+ z)')
return ind*1.0
end
function lda(X, Y)
# Inverse regression
# similar to linear discriminant analysis
μX = mean(X, dims=1)
Xw = svd(X.-μX).U # whiten X
Xw ./= std(Xw, dims=1) # make unit variance
Ya = indmat(Y) # slice Y
Xhat = Ya*(cov(Ya)\cov(Ya,Xw))
# Try a truncated or approx SVD here instead?
F = svd(Xhat)
DC = Xw*F.V[:, axes(Ya, 2)] # Features or Direction Components
df = DataFrame(DC.-mean(DC, dims=1), [:DC1, :DC2, :DC3, :DC4])
df.Y = Y
return df
end
m, n, N = 100, 500, 100
X, Z, Y = rand(N, n), rand(N, m), rand(N)
Wᵀ =reshape([conj(X[i,k] * Z[i,ℓ]) for ℓ in 1:m, k in 1:n, i in 1:N], m*n,N)
@show size(Wᵀ);
df = lda(Wᵀ', Y)
dfs = stack(df, Not(:Y))
plot(dfs, x=:value, y=:Y, xgroup=:variable, Geom.subplot_grid(
Geom.point, free_x_axis=true), Guide.xlabel(nothing))
These random numbers have interesting features! If you want a predictive model, then simply build it from the DCs.
Thanks! In my problem unfortunately, I can’t do data reduction (I know that all basis functions may matter, and the problem is in fact one of deterministic least-squares)
It’s simple to transfer from the subspace back to the original space (and would be interesting to see how close the prediction is). It should also be unbiased, whereas the ridge method is biased, but can be iteratively corrected (https://doi.org/10.1080/13504850500425840).