Hi,
I encountered a weird phenomenon of solve time calculation in numerical optimisation.
Backgrounds
There is a bunch of approximators, \{ f_j \}_{j \in J}. For example, f_1 is a feedforward (dense) neural network, and f_2 is an another approximator, β¦, f_{|J|} is an approximator that I proposed.
In my problem settings, Iβm considering that approximators receive two arguments as f_j(x, u).
Each approximator is trained to approximate a given target function g in advance (the symbol for approximator parameters are omitted here, e.g., \theta in f^{\theta}_j(x, u)).
Then, I wanna compare the solve times to minimise each approximator f over the second argument while fixing the first argument, that is, for given x \in X,
I found that the solve time decreases when the dimensions of x and u increase when minimising some approximators.
For example, suppose that X \subset \mathbb{R}^n, U \subset \mathbb{R}^m.
I calculated the mean solve time for given set \{ x_i \in X \}_{i \in I}.
Let t_i be the solve time of (1) for x_i.
Then, the mean solve time is \frac{1}{|I|} \sum t_i (I=500 for the below simulation result).
I expected that the mean solve time should increase as n and m increase, but it didnβt! Especially, n=1 and m=1 took a longer time than n=13 and m=4 (of course, the solve time for too big n and m, e.g., n=376, m=17, is longer than that of n=m=1).
In short, n=m=1 took longer time than similar case with moderate dimensions.
Questions
I wonder if itβs possible, and why it is.
Iβve suspected that
- 'cause each approximator is trained to approximate the target function g, it may influence the βcondition of the trained approximator for numerical optimisationβ. For some reason, n=m=1 took extraordinarily longer time than others.
- For some reason, optimisation solvers (especially the convex optimisation solvers) take different strategies or tolerance, etc. for the scalar case (m=1). Or, they need some special cares that Iβve not figured out.
- Just my mistake; invalid comparison scripts due to the lack of my background for solve time comparison.
Please share your opinion
For now, I share the original script for solve time calculation to gain some ideas to interpret this phenomenon and to have some time to properly deliver my situation to other users. To run this, include
it and run main
function with specified dimensions of n and m.
If necessary, Iβll write other simpler scripts as requested.
Notes
How to run?
- Run as
include("test/basic.jl")
main(1, 1); main(13, 4); main(376, 17) # to avoid JIT compilation time, I measured the solve time from the second run for each dimension pair
Results
As the below result, you might notice that for approximator LSE, PICNN, PMA, and PLSE, the solve time (denoted by optimise_time_mean
) for n=m=1 is longer than that for n=13 and m=4.
- (n, m) = (1, 1)
Row β n m epochs approximator optimise_time_mean minimisers_diff_norm_mean optvals_diff_abs_mean no_of_minimiser_success no_of_optval_success number_of_parameters
β Int64 Int64 Int64 String Float64 Float64 Float64 Int64 Int64 Int64
ββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
1 β 1 1 100 FNN 0.000669363 0.0339567 0.00835098 500 500 4417
2 β 1 1 100 MA 0.000423145 0.0517864 0.131502 500 500 90
3 β 1 1 100 LSE 0.0377195 0.00626931 0.126809 500 500 90
4 β 1 1 100 PICNN 0.0104589 0.0512426 0.100103 500 498 25608
5 β 1 1 100 PMA 0.00109487 0.320571 0.0189876 500 500 8188
6 β 1 1 100 PLSE 0.00626701 0.00967128 0.00171072 500 500 8188
- (n, m) = (13, 4)
Row β n m epochs approximator optimise_time_mean minimisers_diff_norm_mean optvals_diff_abs_mean no_of_minimiser_success no_of_optval_success number_of_parameters
β Int64 Int64 Int64 String Float64 Float64 Float64 Int64 Int64 Int64
ββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
1 β 13 4 100 FNN 0.01256 0.994461 0.178648 500 500 5377
2 β 13 4 100 MA 0.000882126 0.219979 0.0636181 500 500 540
3 β 13 4 100 LSE 0.0222627 0.056159 0.0360104 500 500 540
4 β 13 4 100 PICNN 0.00859685 0.366883 0.0343947 500 500 27987
5 β 13 4 100 PMA 0.000694896 0.498253 0.015404 500 500 14806
6 β 13 4 100 PLSE 0.00622943 0.0568668 0.00995621 500 500 14806
- (n, m) = (376, 17)
Row β n m epochs approximator optimise_time_mean minimisers_diff_norm_mean optvals_diff_abs_mean no_of_minimiser_success no_of_optval_success number_of_parameters
β Int64 Int64 Int64 String Float64 Float64 Float64 Int64 Int64 Int64
ββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
1 β 376 17 100 FNN 0.0292232 2.95216 0.165991 500 500 29441
2 β 376 17 100 MA 0.010935 4.09744 0.125646 500 500 11820
3 β 376 17 100 LSE 0.243563 4.01312 0.115364 500 500 11820
4 β 376 17 100 PICNN 0.0734538 3.78249 0.131185 500 500 84534
5 β 376 17 100 PMA 0.0164681 3.51627 0.0878748 500 500 63388
6 β 376 17 100 PLSE 0.00690463 0.5204 0.0815478 500 500 63388
Brief summary for each approximator
Approximator list
FNN: a standard feedforward neural network
MA: max-affine (MA) neural network [1]
LSE: log-sum-exp (LSE) neural network [1]
PMA: parametrised MA (proposed from my research)
PLSE: parametrised LSE (proposed from my research)
Notes
To calculate the solve time, FNN uses interior-point Newton method with box constraints from Optim.jl 'cause itβs non-convex wrt the second argument in general.
For other approximators, convex optimisation is used with SCS.jl and Convex.jl.
Note that MA and LSE are convex approximators and were not proposed for the two-argument setting, but it is obviously convex wrt the second argument as a projected convex function is also convex.