Iteration over 10 columns to calculate LinearRegression

I have a data frame with 10 columns(features) COL0 to COL9 and a column RESP. How do I calculate a LinearRegression Model for each pair COL0 to COL9 ~ RESP?
I am expecting to get 10 graphs showing the Model and also a table with the coefficients of my model for each column.

What I tried do far:

x = df.Col0
y = df.RESP

data = DataFrame(X=x,Y=y)

model = lm(@formula(Y~X),data)

And I get what i want for this first pair COl0 ~Resp. But still i need to plot it.

Now i need to do the same steps more 9 times. I want to optimize this as i can have over 100 columns.

I am new to Julia and I really dont have a clue how to get this done. Any help?


See Constructing a formula programmatically in the StatsModels docs.

1 Like

I got this:

model2 = fit(LinearModel, @formula(RESP ~EXPL_0 + EXPL_1 + EXPL_2 + EXPL_3 + EXPL_4 +  EXPL_5 + EXPL_6 + EXPL_7 + EXPL_8 + EXPL_9 + EXPL_10), df)

But i have EXPL_0 to EXPL_1000 for example, how can i avoid to type all the EXPL_0 to EXPL_1000?

Did you read the link I sent you? It shows you how to use term to construct a term.

You want

julia> using DataFrames, GLM;

julia> df = DataFrame();

julia> df.y = rand(N);

julia> for i in 0:100
           df[!, "EXPL_$i"] = rand(N)

julia> lhs = term(:y);

julia> rhs = sum([term(Symbol(:EXPL_, i)) for i in 0:100]);

julia> lm(lhs ~ rhs, df)
1 Like

Note that term these days accepts strings (I PR’ed this after DataFrames moved to returning strings from names(df)), so slightly more concise:

sum([term("EXPL_$i") for i in 0:100])