Good Day Folks,
I have a dataframe that constructed as
DF = Dataframe(Price= rand(70:5:170,10), Col1 = rand(1:1:50,10), Col2 = rand(1:1:30,10), Col3 = rand(1:5:100))
I would like to treat Price as the outcome
variable, and the other attributes as the
inputs so that
Price = [Coefficient1]*Col1 + [Coefficient2]*Col2 + [Coefficient3]*Col3
How might I create the regression line with coefficients?
Thank you,
Or if for some reason you don’t want to use a package:
X = [df.Col1 df.Col2 df.Col3]
Y = df.Price
β = X \ Y
and if you want the intercept
β = hcat(ones(size(X,1)), X) \ Y
1 Like
Thank you for this.
Could I then apply the coefficient as
βCol1 + βCol2 + βCol3 + randn()
This works for polynomial functions?
Thank you again,
β
is a vector so you can index into it with square brackets []
as discussed here. So, after this β = hcat(ones(size(X,1)), X) \ Y
you could do something like:
ŷ(x) = β[1] + sum(β[2:4] .* x)
julia> ŷ([49, 6, 96])
85.40338136038268
The \
operator in this case is solving a linear system of equations. For polynomials, have a look at fit
in Polynomials.jl.
As:
using GLM
ols = lm(@formula("y ~.", data)
This assume Y is in the left-most column, yes?
Thanks,
Did you read the documentation? That will throw an error.
julia> df = DataFrame(y = rand(10), x1 = rand(10), x2 = rand(10));
julia> ols = lm(@formula("y ~ .", df))
ERROR: LoadError: MethodError: no method matching var"@formula"(::LineNumberNode, ::Module, ::String, ::Symbol)
Closest candidates are:
var"@formula"(::LineNumberNode, ::Module, ::Any) at /Users/peterdeffebach/.julia/packages/StatsModels/m1jYD/src/formula.jl:60
in expression starting at REPL[5]:1
please read the documentation before trying out code. And when you try out code, please think about why it errors.
Issue identified – with only looking at the instructions I wrote –
that the SYMBOLS were numeric and needed to be
renamed. THEN implement the @formula(Y~X…) using
those SYMBOL names.