I’m trying to understand how generalized linear models work by coding a simple example from scratch and I’m hoping some of the stats experts here (I know there are several of you lurking around ) can help. In the case of simple linear regression, the code is super simple:

```
using Random
using StatsPlots
X = 1:10
Y = rand(collect(-1:0.1:1), 10) .+ collect(1:2:20)
β = hcat(ones(X), X) \ Y # estimate coefficients
Yᵢ(xᵢ) = β[1] + β[2]xᵢ # function to estimate Yᵢ
scatter(X, Y, legend=false, xlabel="X", ylabel="Y")
plot!(X, Yᵢ.(X))
```

Now, I’m trying to understand how to code a simple GLM for something like this:

```
using DataFrames
using GLM
using Random
using StatsPlots
X = 1:10
Y2 = Y.^4 .+ Y
scatter(X, Y2, legend=false, xlabel="X", ylabel="Y")
```

With the GLM package, it’s obviously very easy to fit a model to this:

```
model = glm(@formula(Y2 ~ X), DataFrame(X=X, Y2=Y2), Normal(), LogLink())
```

But I’m trying to understand how to code something like I’ve done above, but for this example.

Wikipedia has a decent example of how to compute the expected value of Yᵢ:

I guess my specific questions are:

- Is
`g`

simply the`log()`

function in this case? -
`β = hcat(ones(X), X) \ Y`

no longer holds true for this case and I need to estimate`β`

via MLE. Is this correct? Since Distributions.jl uses MLE for fitting distributions, is it possible to do what I need with that package or do I need to use Optim.jl to solve this problem (or am I totally wrong about all of this )?