I’m interested in making some of my modeling functions accept dataframes and fomulae as inputs, using the StatsModels package. Is the following the proper way to achieve this?
My plan would be to do something like
using CSV, StatsModels, DataFrames nerlove = CSV.read("nerlove.csv") nerlove[:lnC] = log.(nerlove[:cost]) nerlove[:lnQ] = log.(nerlove[:output]) nerlove[:lnPL] = log.(nerlove[:labor]) nerlove[:lnPF] = log.(nerlove[:fuel]) nerlove[:lnPK] = log.(nerlove[:capital]) f = @formula(lnC ~ 1 + lnQ + lnPL + lnPF + lnPK)
At this point, I have a formula and a dataframe. Then, I would call a fitting function, e.g., a linear regression function that accepts a formula and a dataframe, like this:
Inside ols(), I would get the dependent variables and regressor matrix doing something like
m = ModelFrame(f, nerlove) mm = ModelMatrix(m) y = Float64.(nerlove[f.lhs]) x = mm.m b = x\y etc.
I have tried this, and it works. My doubts are whether or not the last block is the best way to create ordinary arrays for computing regression coefficients, etc. I also don’t know how to deal with the possibility that the dataframe has missings in it, which is why I do the type conversion, assuming that there aren’t any. Pointers to examples would be very welcome.