Generating the equation of a curve across attributes

Good Day Folks,

I have a dataframe that constructed as

DF = Dataframe(Price= rand(70:5:170,10), Col1 = rand(1:1:50,10), Col2 = rand(1:1:30,10), Col3 = rand(1:5:100))

I would like to treat Price as the outcome
variable, and the other attributes as the
inputs so that

Price = [Coefficient1]*Col1 + [Coefficient2]*Col2 + [Coefficient3]*Col3

How might I create the regression line with coefficients?

Thank you,

GLM.jl will do that

1 Like

Or if for some reason you don’t want to use a package:

X = [df.Col1 df.Col2 df.Col3]
Y = df.Price

β = X \ Y

and if you want the intercept

β = hcat(ones(size(X,1)), X) \ Y
1 Like

Thank you for this.

Could I then apply the coefficient as

βCol1 + βCol2 + βCol3 + randn()

This works for polynomial functions?

Thank you again,

β is a vector so you can index into it with square brackets [] as discussed here. So, after this β = hcat(ones(size(X,1)), X) \ Y you could do something like:

ŷ(x) = β[1] + sum(β[2:4] .* x)

julia> ŷ([49, 6, 96])

The \ operator in this case is solving a linear system of equations. For polynomials, have a look at fit in Polynomials.jl.


using GLM
ols = lm(@formula("y ~.", data)

This assume Y is in the left-most column, yes?


Did you read the documentation? That will throw an error.

julia> df = DataFrame(y = rand(10), x1 = rand(10), x2 = rand(10));

julia> ols = lm(@formula("y ~ .", df))
ERROR: LoadError: MethodError: no method matching var"@formula"(::LineNumberNode, ::Module, ::String, ::Symbol)
Closest candidates are:
  var"@formula"(::LineNumberNode, ::Module, ::Any) at /Users/peterdeffebach/.julia/packages/StatsModels/m1jYD/src/formula.jl:60
in expression starting at REPL[5]:1

please read the documentation before trying out code. And when you try out code, please think about why it errors.

Issue identified – with only looking at the instructions I wrote –
that the SYMBOLS were numeric and needed to be
renamed. THEN implement the @formula(Y~X…) using
those SYMBOL names.