JuliaDB & OnlineStats syntax for linear regression



I cannot seem to figure out how to run a LinReg Onlinestat on my JuliaDB table. Especially, how to tell the reduce function what the left-hand-side and right-hand-side variables are. The following runs a regression of y on x and z is ignored:

t = table(@NT(x = randn(1000), y = randn(1000), z = rand(1000)))
reduce(LinReg(), t, select = (:x, :y, :z))


Found it here.


You can also use LinRegBuilder, which lets you fit any regression model on the data after a single pass.

o = reduce(LinRegBuilder(), t, select = (:x, :y, :z))

# y ~ x + z
coef(o, x=[1,3], y=2, bias=false)


LinRegBuilder seem cool. Can it be used to create any arbitrary model compatible with StatsModels?


Can it do categorical variables/dummy variables?


In order to fit any given term (dummy variable, interaction term, etc.), you would need to specifically select it. There’s no support (yet) for formulas.

Here’s an example of making an interaction term between :x and :z, which I’ll admit isn’t the cleanest syntax.

reduce(LinRegBuilder(), t, select=(:x, :y, :z, (:x, :z) => xz -> *(xz...)))