How to save results for each outcome?

younghan-lee · February 24, 2021, 6:51am

I am trying to save results for each outcome.
For example,

list = [:A, :B]
for i in 1:length(list)
    result$i = lm((@eval @formula($i ~ x1 + x2)), data)
end

So that I can save the results for each outcome like result1, result2, … ,
then export each result to csv file.
Any ideas? Thank you

jules · February 24, 2021, 7:36am

In Julia, we don’t have the concept of just indexing into a data structure at an index, and both data structure and index location are created if they don’t exist. That’s what you’re doing in your example, which is something like R I think?

You can either make an empty vector first, and then push! into it:

v = []
for i in 1:10
    push!(v, rand())
end

Above I used an untyped vector, Vector{Any}. That’s usually not good for performance, if your loop has a lot of runs. Then in this example it would be better to explicitly make an empty vector of floats.

v = Float64[]
for i in 1:10
    push!(v, rand())
end

But for this you need to know the element type you’ll get, which is sometimes not easy. For example I don’t know what the resulting type of your lm call would be. So my preferred solution is usually map, which goes over the elements of one or many iterables and automatically stores the result of a function called with the elements:

With your example:

list = [:A, :B]
result = map(1:length(list)) do i
    lm((@eval @formula($i ~ x1 + x2)), data)
end

The do i means we’re passing a function to map as the first argument, before 1:length(list). This function will be called for every element in 1:length(list) and the current number will be the argument named i inside that function. It takes a moment to wrap your head around if you haven’t seen it, but it’s a really convenient syntax used widely in Julia wherever you want to pass a function first.

nilshg · February 24, 2021, 7:57am

I see you’re still using @eval to construct multiple regressions rather than terms…

I think the main issue that you will have to deal with is how to turn your regression result into a tabular format, given that you want to write out the results to csv. I assume at a minimum you’d want your table to have (1) dependent variable and (2) estimated coefficients on your covariates. In that case you can do:

julia> using DataFrames

julia> list = [:A, :B];

julia> results = DataFrame("y" => list, (["x1", "x2"] .=>  [Vector{Float64}(undef, length(list)) for _ ∈ 1:2])...)
2×3 DataFrame
 Row │ y       x1            x2           
     │ Symbol  Float64       Float64      
─────┼────────────────────────────────────
   1 │ A       1.53e-322     5.77183e-312
   2 │ B       6.95011e-310  0.0

You can then go through and fill your dataframe in a loop (this is pseudocode but should be more or less correct):

julia> for depvar ∈ list
           ols_result = lm(term(depvar) ~ term(:x1) + term(:x2), data)
           results[results.y .== devpar, :x1] = coef(ols_result)[1]
           results[results.y .== depvar, :x2] = coef(ols_result)[2]
       end

younghan-lee · February 24, 2021, 8:19am

@nilshg
Wow this is very helpful. Thank you!
I won’t use @eval anymore
I just have a quick question.
When I run this part,

results = DataFrame( ...

I get an error saying no method matching DataFrame.
Also, could you explain what the underscore means here? for _ in 1:2
Thank you!

younghan-lee · February 24, 2021, 8:22am

Thanks for your kind explanation!
For the my example part,
Should I replace $i with list[i]?

jules · February 24, 2021, 8:25am

Ah I see, you wanted the symbols from list there. In that case, instead of interpolating i, you would interpolate list[i], therefore the expression must be $(list[i]) instead of $i. But you can have it even easier if you map over the symbols directly.

list = [:A, :B]
result = map(list) do sym
    lm((@eval @formula($sym ~ x1 + x2)), data)
end

younghan-lee · February 24, 2021, 8:29am

Thank you very much!

nilshg · February 24, 2021, 8:31am

Sure - my example above was incomplete, as you need using DataFrames. If you don’t want to use DataFrames, you could also construct something similar based on NamedTuples, which can also be written to CSV.

I also realise I missed out the all important CSV.write("results.csv", results) as the final step after the loop.

Finally, on the _ that’s just a placeholder for the variable that holds the value of the current iteration (here, either 1 or 2). People tend to use _ to signify that the value is actually not used, i.e. you could use any valid identifier here and it wouldn’t change the output of the list comprehension (as we’re creating a Vector{Float}(undef, length(list)) in every iteration step).

younghan-lee · February 24, 2021, 8:47am

Thank you for your clarification!
The first part of your codes works perfectly now.
The second part,

for depvar ∈ list
    ols_result = lm(term(depvar) ~ term(:x1) + term(:x2), data)
    results[results.y .== depvar, :x1] = coef(ols_result)[1]
    results[results.y .== depvar, :x2] = coef(ols_result)[2]
end

I get this error.

ERROR: MethodError: no method matching setindex!(::DataFrame, ::Float64, ::BitArray{1}, ::Symbol)

What could be the problem?
Thank you!

nilshg · February 24, 2021, 11:11am

Sorry my fault for writing pseudocode - the DataFrame indexing in the loop returns a one-element array on the left hand side, so assignment needs to be broadcasted.

Here’s a full MWE (note the .= in the loop):

julia> using CSV, DataFrames, GLM

julia> data = DataFrame(x1 = rand(500), x2 = rand(500));

julia> data[!, :A] = 5 .+ 0.5*data.x1 - 1.5*data.x2 .+ randn.();

julia> data[!, :B] = 1 .+ 2.5*data.x1 + 3.5*data.x2 .+ randn.();

julia> list = [:A, :B];

julia> results = DataFrame("y" => list, (["x1", "x2"] .=>  [Vector{Float64}(undef, length(list)) for _ ∈ 1:2])...);

julia> for depvar ∈ list
           ols_result = lm(term(depvar) ~ term(:x1) + term(:x2), data)
           results[results.y .== depvar, :x1] .= coef(ols_result)[1]
           results[results.y .== depvar, :x2] .= coef(ols_result)[2]
       end

julia> results
2×3 DataFrame
 Row │ y       x1        x2       
     │ Symbol  Float64   Float64  
─────┼────────────────────────────
   1 │ A       5.17577   0.495342
   2 │ B       0.975703  2.6233

younghan-lee · February 24, 2021, 11:43am

Oh! I just needed a dot (.) before the equal sign (=). Now it works!
Thank you so much for your time and explanation.
I learned a lot !

Topic		Replies	Views
Help with CSV and Dataframe Data question , package	2	869	January 26, 2021
Save Model Outputs Optimization (Mathematical)	2	335	November 21, 2022
How to create a loop for a regression model? Statistics	8	2180	December 29, 2022
Writing data from Julia to an Excel Spreadsheet New to Julia xlsx	18	7784	August 10, 2024
Looping Over Two Variables (indexes) in a DataFrame New to Julia question	1	282	June 29, 2022

How to save results for each outcome?

Related topics