Extremely low values of barrier term mu in Fminbox

I am puzzled by Fminbox decreasing mu (the barier penalty term) to extremely low values. I do not quite understand why this happens and how I can control it.

I run something like (note in case it’s relevant that my_obj_function involves simulation so that may introduce noise and hence keep the gradient “bouncing” around)

result = optimize(
		my_obj_function,
		theta_lower, theta_upper, theta_initial,
		Fminbox(BFGS(linesearch=BackTracking(order=3)), mufactor=1e-6),
			Optim.Options(outer_iterations=100,
						  iterations=2000,
						  show_trace=true,
						  show_every=1,
						  f_tol=1e-7,
						  g_tol=1e-7,
						  time_limit=10000))

this runs and runs for a long time, as shown below. Note: the current solution is extremely far from the barrier, and the inner loop (presumably) stops because the gradient falls below 1e-7. So, why continue to decrease mu?

How does Fminbox decide when to stop? And how many I control this?

(numbers below include barrier contribution)
Iter     Function value   Gradient norm
    14     1.663502e-05     1.795406e-07
 * time: 11.428999900817871  # N.B: this is the time within the current loop not overall

Exiting inner optimizer with x = [12.338068789784117, 66.91942126907855, 72.79429518182737, 67.47383322425802, -11.229420401921177, 52.853988010777655, -15.984116277134515, 10.016841794544156, -7.131259718778112, -15.626120659744577]
Current distance to box: 12.3381
Decreasing barrier term μ.

Fminbox iteration 33
--------------------
Calling inner optimizer with mu = 1.75391e-202

Looking at the output more closely, I notice that the solution x does change across Fminbox iterations.

This happens even after mu is displayed to be equal to 0.0. This underscores my lack of understanding of how the penalty is computed (and whether/how it can be significative even when far from the box and when mu is small).

For example:

iteration 64: [11.789740042511356, 73.0305751619422, 80.07202116397323, 72.3569116263531, -11.349448995109983, 50.35336813482172, -25.54179509687858, 14.512741054995919, -3.2278658334457613, -23.614292935648006]

iteration 65: [11.763470947949374, 73.04787442224753, 80.04497452516104, 72.58847540301744, -11.55593030740027, 49.96214889581432, -25.975397902921237, 14.6401259976849, -3.032635328662871, -23.862001677333343]

iteration 68: [11.718416632008068, 73.56419232538107, 80.93552128259725, 73.01798805714127, -11.365053898739728, 49.96400206567676, -26.55680266454123, 14.958619179359546, -2.640042754379732, -24.42249055658134]

It’s hard to figure out what’s happening without an example but it’s possible the line search is causing convergence issues in each barrier problem. This can happen if your gradient has a very high or very low norm and your function has many minima so a bad line search could either fail to improve your solution, cause slowly decaying oscillations or other convergence issues. This could be a property of your objective and not the constraints if the constraints are not even active and mu is so small. Try using an unconstrained optimiser and starting near the interior solution you have. If it behaves the same way, that probably confirms my suspicion.