Regression with variables in array

Agnes_H · December 28, 2022, 8:30pm

I apologize for reheating this old topic, but how do I deal with this when both variables are dependent on a list? I tried using term, but was unable to. The only way I was able to receive meaningful output was by using eval.

I am trying to use RegressionTables.jl to turn it into format that I am able to export, but to no avail so far.

for i in list
		for j in list_2
			calc = (reg(DF1, @eval @formula($i ~ $j)))
			print(calc)
		end
	end

nilshg · December 29, 2022, 5:35pm

It’s not clear to me what list and list_2 are, and the fact that you are calling reg suggests that you are using some additional package (although as long as it relies on StatsModels you’ll be fine).

The case when both right hand side and left hand side variables come from some sort of iterator is a straightforward extension of what I posted above:

julia> using DataFrames, GLM

julia> df = DataFrame([:y1, :y2, :x1, :x2, :x3] .=> eachcol(rand(3, 5)))
3×5 DataFrame
 Row │ y1        y2        x1        x2         x3       
     │ Float64   Float64   Float64   Float64    Float64  
─────┼───────────────────────────────────────────────────
   1 │ 0.406401  0.974452  0.57843   0.553727   0.816638
   2 │ 0.429666  0.873084  0.808852  0.0404561  0.728702
   3 │ 0.998661  0.865624  0.269401  0.6146     0.630912

julia> rhs = ["y1", "y2"]; lhs = ["x1", "x2", "x3"];

julia> for y ∈ rhs
           for x ∈ lhs
               println(lm(term(y) ~ term(x), df))
           end
       end
StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}, Matrix{Float64}}

y1 ~ 1 + x1

Coefficients:
────────────────────────────────────────────────────────────────────────
                Coef.  Std. Error      t  Pr(>|t|)  Lower 95%  Upper 95%
────────────────────────────────────────────────────────────────────────
(Intercept)   1.22034    0.336678   3.62    0.1714   -3.05755    5.49824
x1           -1.10239    0.566025  -1.95    0.3020   -8.29441    6.08964
────────────────────────────────────────────────────────────────────────
StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}, Matrix{Float64}}

y1 ~ 1 + x2

Coefficients:
───────────────────────────────────────────────────────────────────────
                Coef.  Std. Error     t  Pr(>|t|)  Lower 95%  Upper 95%
───────────────────────────────────────────────────────────────────────
(Intercept)  0.374734    0.423848  0.88    0.5391   -5.01076    5.76023
x2           0.587803    0.886367  0.66    0.6272  -10.6746    11.8502
───────────────────────────────────────────────────────────────────────
StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}, Matrix{Float64}}

y1 ~ 1 + x3

Coefficients:
────────────────────────────────────────────────────────────────────────
                Coef.  Std. Error      t  Pr(>|t|)  Lower 95%  Upper 95%
────────────────────────────────────────────────────────────────────────
(Intercept)   2.96036     1.16501   2.54    0.2387   -11.8425    17.7632
x3           -3.23784     1.59728  -2.03    0.2918   -23.5332    17.0575
────────────────────────────────────────────────────────────────────────
StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}, Matrix{Float64}}

y2 ~ 1 + x1

Coefficients:
────────────────────────────────────────────────────────────────────────
                 Coef.  Std. Error     t  Pr(>|t|)  Lower 95%  Upper 95%
────────────────────────────────────────────────────────────────────────
(Intercept)  0.886426     0.132183  6.71    0.0942  -0.793119    2.56597
x1           0.0325252    0.222227  0.15    0.9075  -2.79114     2.85619
────────────────────────────────────────────────────────────────────────
StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}, Matrix{Float64}}

y2 ~ 1 + x2

Coefficients:
─────────────────────────────────────────────────────────────────────────
                 Coef.  Std. Error      t  Pr(>|t|)  Lower 95%  Upper 95%
─────────────────────────────────────────────────────────────────────────
(Intercept)  0.876624    0.0860848  10.18    0.0623  -0.217188    1.97044
x2           0.0689036   0.180024    0.38    0.7673  -2.21852     2.35632
─────────────────────────────────────────────────────────────────────────
StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}, Matrix{Float64}}

y2 ~ 1 + x3

Coefficients:
───────────────────────────────────────────────────────────────────────
                Coef.  Std. Error     t  Pr(>|t|)  Lower 95%  Upper 95%
───────────────────────────────────────────────────────────────────────
(Intercept)  0.4862      0.225802  2.15    0.2768   -2.38289    3.35529
x3           0.576478    0.309584  1.86    0.3137   -3.35716    4.51012
───────────────────────────────────────────────────────────────────────

As a general rule of thumb, if you find yourself using @eval you’re likely doing it wrong.

Agnes_H · December 30, 2022, 10:51am

Basically I have a dataframe with two blocks of data that should be iterated with each other individually.

list and list_2 are referring to those two blocks. reg is a command used in RegressionTables.jl which is based on GLM.jl.

Thank you very much, the performance is so much better with your solution.

Topic		Replies	Views
How to create a loop for a regression model? Statistics	8	2179	December 29, 2022
Using GLM programmatically General Usage question , metaprogramming , glm	8	946	October 8, 2024
Simple linear regression question Statistics	3	1036	April 5, 2019
GLM.jl with unknown column names Statistics statistics , regression , glm	4	1873	February 19, 2019
Factors for regression models in julia Statistics question , package	5	1069	May 26, 2021

Regression with variables in array

Related topics