Hello, I am stuck with a problem regarding Linear Regression and Ridge Regression

The following labeled data is given: x={10.9, 12.4, 13.5, 14.6, 14.8, 16.5, 19.4} and y={24.8, 30.0, 31.0, 29.3, 35.9, 39.6, 40.7}.
perform linear and ridge regression.

I used the following code:

using DataFrames, CSV
using PyPlot
using Plots
using GLM

x = [10.9, 12.4, 13.5, 14.6, 14.8, 16.5, 19.4] #x-values
y = [24.8, 30.0, 31.0, 29.3, 35.9, 39.6, 40.7] #Y-values

num_tr_ex = length(y); #96

data = DataFrame(X=x, Y=y)

linearRegressor = lm(@formula(Y ~ X), data)

println(linearRegressor)

linearFit = predict(linearRegressor)
(plot!(x, linearFit,))
display(scatter!(x, y))

I have no idea how to get started with Ridge Regression.

Is this a homework exercise?

1 Like

OP didn’t even bother to remove certain indicative phrases :wink:

Ridge Regression is described here for example.

2 Likes

Yes, But I did the code myself, I am currently stuck with ridge regression. Do you know any resources that might help?

I didn’t quite understand? What do I need to remove?

This reads like a homework task.

Fair enough. Maybe Regression · Julia Packages could help?

I’m looking for resources to help explain ridge regression, i know that in statistics, I just need to convert that into code.

Thanks this link helps, but I wanted to look into a few examples and see how they did it to fully understand the coding part, I’m fairly new to Julia syntax. It would be great if you could help me with that!

It looks to me like GLM doesn’t support ridge regression, see Support for Ridge Regression · Issue #205 · JuliaStats/GLM.jl · GitHub. I let others speak about the recommended replacement, because I’m not qualified…

Edit: a naive implementation could look like

function linear_regression(lambda, X, y)
    Xt = transpose(X)
    (Xt * X + lambda * I) \ Xt * y
end
1 Like

Thanks a lot, I’ll look into it!

A ridge regression can be done as follows:

"""
    RidgeRegression(Y,X,λ,β₀=0)

Calculate ridge regression estimate with target vector β₀.
"""
function RidgeRegression(Y,X,λ,β₀=0)
    K = size(X,2)
    isa(β₀,Number) && (β₀=fill(β₀,K))
    b = (X'X+λ*I)\(X'Y+λ*β₀)      #same as inv(X'X+λ*I)*(X'Y+λ*β₀)
    return b
end

This is from my lecture notes.

4 Likes

An alternative is using the LinearRegressionKit.
There is an example here on how to use ridge regression.

Hereafter is a similar example with your data:

using LinearRegressionKit, StatsModels, DataFrames

using VegaLite

# Dataset

x = [10.9, 12.4, 13.5, 14.6, 14.8, 16.5, 19.4]

y = [24.8, 30.0, 31.0, 29.3, 35.9, 39.6, 40.7]

df = DataFrame(x=x, y=y)

f = @formula( y ~ x)

# traditional regression - as a baseline

lm = regress(f, df)

# ridge regression for some ks

rdf, ps = ridge(f, df, 0.0:0.005:1, traceplots=true)

rdf[1:5 , :]

# plot to see the impact of k on the coefs

ps["coefs traceplot"]

# plot to see the impact of k on the VIF

ps["vifs traceplot log"]

# the ridge regression for a selected k

rlm = ridge(f, df, 0.01)