In Julia, we donβt have the concept of just indexing into a data structure at an index, and both data structure and index location are created if they donβt exist. Thatβs what youβre doing in your example, which is something like R I think?
You can either make an empty vector first, and then push! into it:
v = []
for i in 1:10
push!(v, rand())
end
Above I used an untyped vector, Vector{Any}. Thatβs usually not good for performance, if your loop has a lot of runs. Then in this example it would be better to explicitly make an empty vector of floats.
v = Float64[]
for i in 1:10
push!(v, rand())
end
But for this you need to know the element type youβll get, which is sometimes not easy. For example I donβt know what the resulting type of your lm call would be. So my preferred solution is usually map, which goes over the elements of one or many iterables and automatically stores the result of a function called with the elements:
With your example:
list = [:A, :B]
result = map(1:length(list)) do i
lm((@eval @formula($i ~ x1 + x2)), data)
end
The do i means weβre passing a function to map as the first argument, before 1:length(list). This function will be called for every element in 1:length(list) and the current number will be the argument named i inside that function. It takes a moment to wrap your head around if you havenβt seen it, but itβs a really convenient syntax used widely in Julia wherever you want to pass a function first.
I see youβre still using @eval to construct multiple regressions rather than termsβ¦
I think the main issue that you will have to deal with is how to turn your regression result into a tabular format, given that you want to write out the results to csv. I assume at a minimum youβd want your table to have (1) dependent variable and (2) estimated coefficients on your covariates. In that case you can do:
julia> using DataFrames
julia> list = [:A, :B];
julia> results = DataFrame("y" => list, (["x1", "x2"] .=> [Vector{Float64}(undef, length(list)) for _ β 1:2])...)
2Γ3 DataFrame
Row β y x1 x2
β Symbol Float64 Float64
ββββββΌββββββββββββββββββββββββββββββββββββ
1 β A 1.53e-322 5.77183e-312
2 β B 6.95011e-310 0.0
You can then go through and fill your dataframe in a loop (this is pseudocode but should be more or less correct):
julia> for depvar β list
ols_result = lm(term(depvar) ~ term(:x1) + term(:x2), data)
results[results.y .== devpar, :x1] = coef(ols_result)[1]
results[results.y .== depvar, :x2] = coef(ols_result)[2]
end
Ah I see, you wanted the symbols from list there. In that case, instead of interpolating i, you would interpolate list[i], therefore the expression must be $(list[i]) instead of $i. But you can have it even easier if you map over the symbols directly.
list = [:A, :B]
result = map(list) do sym
lm((@eval @formula($sym ~ x1 + x2)), data)
end
Sure - my example above was incomplete, as you need using DataFrames. If you donβt want to use DataFrames, you could also construct something similar based on NamedTuples, which can also be written to CSV.
I also realise I missed out the all important CSV.write("results.csv", results) as the final step after the loop.
Finally, on the _ thatβs just a placeholder for the variable that holds the value of the current iteration (here, either 1 or 2). People tend to use _ to signify that the value is actually not used, i.e. you could use any valid identifier here and it wouldnβt change the output of the list comprehension (as weβre creating a Vector{Float}(undef, length(list)) in every iteration step).
Sorry my fault for writing pseudocode - the DataFrame indexing in the loop returns a one-element array on the left hand side, so assignment needs to be broadcasted.
Hereβs a full MWE (note the .= in the loop):
julia> using CSV, DataFrames, GLM
julia> data = DataFrame(x1 = rand(500), x2 = rand(500));
julia> data[!, :A] = 5 .+ 0.5*data.x1 - 1.5*data.x2 .+ randn.();
julia> data[!, :B] = 1 .+ 2.5*data.x1 + 3.5*data.x2 .+ randn.();
julia> list = [:A, :B];
julia> results = DataFrame("y" => list, (["x1", "x2"] .=> [Vector{Float64}(undef, length(list)) for _ β 1:2])...);
julia> for depvar β list
ols_result = lm(term(depvar) ~ term(:x1) + term(:x2), data)
results[results.y .== depvar, :x1] .= coef(ols_result)[1]
results[results.y .== depvar, :x2] .= coef(ols_result)[2]
end
julia> results
2Γ3 DataFrame
Row β y x1 x2
β Symbol Float64 Float64
ββββββΌββββββββββββββββββββββββββββ
1 β A 5.17577 0.495342
2 β B 0.975703 2.6233