Nested Multivariate Regression

Title: Nested Multivariate Regression
Appreciate some guidance and sample codes. I’m new to Julia language, but have used Python and R a bit. This is a curve fitting / regression of data (csv) exercise. The equation to fit is as below.

w(x, h) = s(x) + (1 - s(x))*(e(x) / h) ^ (1 / n(x) 

whereby s(x), e(x) and n(e) can be a constant, linear, power, exponential or logarithmic format, as below.

	s_con(x) = a
	s_lin(x) = a * x + b
	s_log(x) = a * log10(x) + b
	s_pow(x) = a * x ^ b
	s_exp(x) = a * exp10(x * b)
			 
	e_con(x) = c
	e_lin(x) = c * x + d
	e_log(x) = c * log10(x) + d
	e_pow(x) = c * x ^ d
	e_exp(x) = c * exp10(x * d)
	
	n_con(x) = e
	n_lin(x) = e * x + f
	n_log(x) = e * log10(x) + f
	n_pow(x) = e * x ^ f
	n_exp(x) = e * exp10(x * f)		

The intended workflow/algo is a below.

s_list = [s_con, s_lin, s_log, s_pow, s_exp]
e_list = [e_con, e_lin, e_log, e_pow, e_exp]
n_list = [n_con, n_lin, n_log, n_pow, n_exp]

for s in s_list
	for e in e_list
		for n in n_list
			w(x, h) = min(1, max(0, s + (1 - s) * (e / h) ^ ( 1 / n)))
			-- doing the solver to determine paramater of a, b, c, d, e, f
			-- reuse solved w(x, h) for later analysis, e.g. plot and reports.
		end
	end
end

The plot & report will be analyzed to determine best match s(x) e(x) and n(x) for w(x, h). Thank you.

I am not sure that there is an existing package that can do all of this, but you could try looking at

for fitting some of the models. Also, you can just write a likelihood and optimize with

That said, my major concern is conceptual: a fishing expedition where both log and exp are considered may not be a good starting point for data analysis. Also the nesting of highly nonlinear forms will yield parameters that will be very sensitive to outliers, or even just noise.

The real question is how to iterate array/ list of possible functions and aggregate into complete function for the optimisation step. I’ve done a bit with some of the solvers, e.g. LsqFit or Optim. I may try others, e.g. JuMP and GLM, as suggested. This is not social experiment data, but physics. The model is empirical at best, thus the quest for best combination (of 125 combinations) to fit the data.

I would consider something like

function do_everything(data, s, e, n)
    # do the optimization
    # return whatever you need as NamedTuple
end

results = [do_everything(data, s, e, n)
           for s in s_list, e in e_list, n in n_list]

Sorry for my ignorance ← Julia newbie. Is it possible to do list/array of functions like below?

s_list = [s_con, s_lin, s_log, s_pow, s_exp]

This is just my conceptual algorithm.

Yes, you can create a vector of arbitrary objects.

Thanks very much. Will try it out later this week (or next).

After many attempts, I ended up got it working in R. Will retry Julia when time is more forgiving.

Sharing the solution to my initial problem.

using DataFrames, CSV, Cairo, Gadfly, Dates, Optim, Compose, Statistics

# Julia ver. 1.4.0, Optim ver. 0.20.1 # required for code below

struct MySimplexer <: Optim.Simplexer end
Optim.simplexer(S::MySimplexer, initial_x) = [rand(length(initial_x)) for i = 1:length(initial_x)+1]

function loop_all(loop_flag)

	function ww(param, x1, x2, loop_flag)

		if loop_flag[1] == 1
			ss = param[1] * x1 + param[2] # linear
		elseif loop_flag[1] == 2
			ss = param[1] * log10(x1) + param[2] # logarithmic
		elseif loop_flag[1] == 3
			ss = param[1] * x1 ^ param[2] # power
		elseif loop_flag[1] == 4
			ss = param[1] * exp(x1 * param[2]) # exponnenntial
		else
			ss = param[1] + param[2] # connstannt
		end
		ss = min(1, max(0, ss))

		if loop_flag[2] == 1
			ee = param[3] * x1 + param[4] # linear
		elseif loop_flag[2] == 2
			ee = param[3] * log10(x1) + param[4] # logarithmic
		elseif loop_flag[2] == 3
			ee = param[3] * x1 ^ param[4] # power
		elseif loop_flag[2] == 4
			ee = param[3] * exp(x1 * param[4]) # exponnenntial
		else
			ee = param[3] + param[4] # connstannt
		end
		ee = max(0, ee)

		if loop_flag[3] == 1
			nn = param[5] * x1 + param[6] # linear
		elseif loop_flag[3] == 2
			nn = param[5] * log10(x1) + param[6] # logarithmic
		elseif loop_flag[3] == 3
			nn = param[5] * x1 ^ param[6] # power
		elseif loop_flag[3] == 4
			nn = param[5] * exp(x1 * param[6]) # exponnenntial
		else
			nn = param[5] + param[6] # connstannt
		end

		ww_ww = ss + (1 - ss) * (ee / x2) ^ (1 /  nn)
		return min(1, max(0, ww_ww))
	end

	function sqerror(param)
		err = 0.0
		for i in 1:length(yy)
			pred_i = ww(param, x1[i], x2[i], loop_flag)
			err += (yy[i] - pred_i)^2
		end
		return err
	end

	init_param = [-4.2, 1.6, 4.0, -2.2, 2.5, 4.8] 

	res = optimize(sqerror, init_param, nnelderMead(initial_simplex = MySimplexer()))
	par = Optim.minimizer(res)

	return 0
end

for i in 1:5
	for j in 1:5
		for k in 1:5
			loop_flag = [i, j, k]
			loop_all(loop_flag)
		end
	end
end