Avoiding kronecker in optimization

fibrebundl · November 14, 2025, 3:23pm

I have a scalar valued function whose gradient and hessian I can compute. However, the hessian involves a kronecker product. I’ve been using the Optim.jl package for optimization. Is there an efficient way of passing the hessian to the optimizer optimize to avoid the full kronecker? Or do I just need to use a different julia library? The hessian is of the form H_x=\text{Diag}~(\mathbb{v}) + U \otimes V where v is a vector. This can get really large depending on the inputs and I’m trying to avoid this as much as possible.

stevengj · November 14, 2025, 3:35pm

Which algorithm are you trying to use? A lot of optimization algorithms don’t expect you to supply an explicit Hessian at all, even if they compute a low-rank L-BFGS approximation internally.

If you want to exploit explicit knowledge of the Hessian, what one ultimately wants is typically the Newton step H^{-1} \nabla f or a similar inverse-Hessian–vector product. In principle you could use an iterative solver for this, like a Newton–Krylov method, since you can multiply by your Hessian matrix quickly (without constructing it explicitly), though you might have to play with preconditioners. However, I’m not sure offhand if there is any optimization library available that allows you to supply an implicit inverse-Hessian in this way.

dlfivefifty · November 14, 2025, 3:59pm

LazyArrays.jl has a lazy Kron that might help

gdalle · November 14, 2025, 7:55pm

Just trying my luck here, even though it is not the best approach: is either U or V sparse?
I agree that the first question is what you need to do with that Hessian

fibrebundl · November 15, 2025, 7:53am

Unfortunately, they need not be sparse! I’ve seen some really big improvements in speed and memory storage with sparse matrices form other forum posts. But I’m not sure if I could pass them to an optimizer like in Optim.jl. It seems like if I were writing the optimization loop, then this would work well?

fibrebundl · November 15, 2025, 7:55am

I will look at this library. I was looking at Kronecker.jl earlier today but I do not think this is compatible with the Optim.jl optimizer.

fibrebundl · November 15, 2025, 8:00am

I was using just the Newton() method since I had derived the hessian. I’ve used BFGS and L-BFGS before but I’ve noticed better results with the full hessian supplied as well as I am interested in it numerically.

I see. I think that is what I was looking for i.e. if there were any libraries that would let me supply the hessian implicitly this way. I will look at the Newton-Krylov method and combine it with the other methods suggested. But do i wish it were possible to do things like this

dlfivefifty · November 15, 2025, 2:53pm

FYI here’s how you can make a lazy version of that operator, showing mat*vec is 100x faster:

julia> n = 100; v = randn(n^2); U = randn(n,n); V = randn(n,n); A = Diagonal(v) + Kron(U,V)
(10000×10000 Diagonal{Float64, Vector{Float64}}) .+ (kron(100×100 Matrix{Float64}, 100×100 Matrix{Float64})):
 0.259187   -0.167685    0.271168   -0.268943    0.0333408    0.416952  …  -0.118942   -1.20565    0.35198   -0.594093  -0.381805
 0.428446   -0.288475    0.0564482   0.273866   -0.0816059   -0.357341     -0.0120836   0.840606   0.663853   2.80417   -0.312597
 0.0857345   0.0789609  -0.955052   -0.127889    0.115506     0.108702     -1.97706     0.457311  -1.22293   -0.183816   2.35565
 ⋮                                                            ⋮         ⋱   ⋮                                           
 0.536658    0.408619   -0.844958    0.0488609   0.265918     0.17368       2.90123     1.24804   -2.33695   -1.05264    2.72833
 0.49652    -0.171438    0.590326   -0.486331    8.22025e-6   0.262683      0.768027   -1.34999   -1.18563    2.04939    1.42849

julia> @time A * randn(n^2); # efficient matrix*vec
  0.000171 seconds (19 allocations: 400.484 KiB)

julia> @time M = Matrix(A); @time M * randn(n^2); # slow full version
  0.304267 seconds (11 allocations: 762.954 MiB, 1.05% gc time)
  0.016047 seconds (7 allocations: 160.141 KiB)

Topic		Replies	Views
Using a sparse Hessian in Optim.jl Optimization (Mathematical)	21	354	October 29, 2024
Improving performance of a number of functions for large scale gradient based optimization Performance performance , optim , memory-allocation , loops	11	904	November 5, 2021
Computing Hessian of the Langrangian efficiently with DI.jl Optimization (Mathematical) optimization , autodiff	15	371	November 4, 2025
Highly Asymmetric Hessian after optimization Optimization (Mathematical)	26	1779	July 6, 2019
Evaluating the Hessian of the Lagrangian efficiently Performance modelingtoolkit	14	2093	January 27, 2022

Avoiding kronecker in optimization

Related topics