I have a scalar valued function whose gradient and hessian I can compute. However, the hessian involves a kronecker product. I’ve been using the Optim.jl package for optimization. Is there an efficient way of passing the hessian to the optimizer optimize to avoid the full kronecker? Or do I just need to use a different julia library? The hessian is of the form H_x=\text{Diag}~(\mathbb{v}) + U \otimes V where v is a vector. This can get really large depending on the inputs and I’m trying to avoid this as much as possible.
Which algorithm are you trying to use? A lot of optimization algorithms don’t expect you to supply an explicit Hessian at all, even if they compute a low-rank L-BFGS approximation internally.
If you want to exploit explicit knowledge of the Hessian, what one ultimately wants is typically the Newton step H^{-1} \nabla f or a similar inverse-Hessian–vector product. In principle you could use an iterative solver for this, like a Newton–Krylov method, since you can multiply by your Hessian matrix quickly (without constructing it explicitly), though you might have to play with preconditioners. However, I’m not sure offhand if there is any optimization library available that allows you to supply an implicit inverse-Hessian in this way.
LazyArrays.jl has a lazy Kron that might help
Just trying my luck here, even though it is not the best approach: is either U or V sparse?
I agree that the first question is what you need to do with that Hessian
Unfortunately, they need not be sparse! I’ve seen some really big improvements in speed and memory storage with sparse matrices form other forum posts. But I’m not sure if I could pass them to an optimizer like in Optim.jl. It seems like if I were writing the optimization loop, then this would work well?
I will look at this library. I was looking at Kronecker.jl earlier today but I do not think this is compatible with the Optim.jl optimizer.
I was using just the Newton() method since I had derived the hessian. I’ve used BFGS and L-BFGS before but I’ve noticed better results with the full hessian supplied as well as I am interested in it numerically.
I see. I think that is what I was looking for i.e. if there were any libraries that would let me supply the hessian implicitly this way. I will look at the Newton-Krylov method and combine it with the other methods suggested. But do i wish it were possible to do things like this
FYI here’s how you can make a lazy version of that operator, showing mat*vec is 100x faster:
julia> n = 100; v = randn(n^2); U = randn(n,n); V = randn(n,n); A = Diagonal(v) + Kron(U,V)
(10000×10000 Diagonal{Float64, Vector{Float64}}) .+ (kron(100×100 Matrix{Float64}, 100×100 Matrix{Float64})):
0.259187 -0.167685 0.271168 -0.268943 0.0333408 0.416952 … -0.118942 -1.20565 0.35198 -0.594093 -0.381805
0.428446 -0.288475 0.0564482 0.273866 -0.0816059 -0.357341 -0.0120836 0.840606 0.663853 2.80417 -0.312597
0.0857345 0.0789609 -0.955052 -0.127889 0.115506 0.108702 -1.97706 0.457311 -1.22293 -0.183816 2.35565
⋮ ⋮ ⋱ ⋮
0.536658 0.408619 -0.844958 0.0488609 0.265918 0.17368 2.90123 1.24804 -2.33695 -1.05264 2.72833
0.49652 -0.171438 0.590326 -0.486331 8.22025e-6 0.262683 0.768027 -1.34999 -1.18563 2.04939 1.42849
julia> @time A * randn(n^2); # efficient matrix*vec
0.000171 seconds (19 allocations: 400.484 KiB)
julia> @time M = Matrix(A); @time M * randn(n^2); # slow full version
0.304267 seconds (11 allocations: 762.954 MiB, 1.05% gc time)
0.016047 seconds (7 allocations: 160.141 KiB)