I’m trying to understand how generalized linear models work by coding a simple example from scratch and I’m hoping some of the stats experts here (I know there are several of you lurking around ) can help. In the case of simple linear regression, the code is super simple:
using Random using StatsPlots X = 1:10 Y = rand(collect(-1:0.1:1), 10) .+ collect(1:2:20) β = hcat(ones(X), X) \ Y # estimate coefficients Yᵢ(xᵢ) = β + βxᵢ # function to estimate Yᵢ scatter(X, Y, legend=false, xlabel="X", ylabel="Y") plot!(X, Yᵢ.(X))
Now, I’m trying to understand how to code a simple GLM for something like this:
using DataFrames using GLM using Random using StatsPlots X = 1:10 Y2 = Y.^4 .+ Y scatter(X, Y2, legend=false, xlabel="X", ylabel="Y")
With the GLM package, it’s obviously very easy to fit a model to this:
model = glm(@formula(Y2 ~ X), DataFrame(X=X, Y2=Y2), Normal(), LogLink())
But I’m trying to understand how to code something like I’ve done above, but for this example.
Wikipedia has a decent example of how to compute the expected value of Yᵢ:
I guess my specific questions are:
log()function in this case?
β = hcat(ones(X), X) \ Yno longer holds true for this case and I need to estimate
βvia MLE. Is this correct? Since Distributions.jl uses MLE for fitting distributions, is it possible to do what I need with that package or do I need to use Optim.jl to solve this problem (or am I totally wrong about all of this )?