Very basic GLM help

Hello, I have been trying to learn how to use GLM.jl but I really don’t understand it. My goal is to show a linear model on a plot. I know how to create the base plot, but not the linear model or how to plot that onto the plot. Please help me - I really need baby step instructions with this it seems. (or with Algebra of Graphics, I’m not picky, I just need to learn how to do it with something)

``````df = DataFrame(A = 1:2:1000, B = repeat(1:10, inner=50), C = 1:500)
lm(@formula(:A ~ :C),df)  #doesn't work

``````

I don’t know too much about linear models, but following the manual of GLM you should try:

``````df = DataFrame(A = 1:2:1000, B = repeat(1:10, inner=50), C = 1:500)
lm(@formula(A ~ C),df)
``````

note the missing : in front of the variables in formula.
But if you are only interested in linear regression most of the plotting options can do that for you e.g.,
or

1 Like

As @Rudi79 points out, the error with your linear model is with the syntax and if all you want is to add a regression line, you can use the `smooth` keyword argument in `Plots.jl`, for example. Have a look at this example:

``````using DataFrames
using GLM
using StatsPlots

x = rand(10:0.1:30, 30)
f(x) = x + rand(-5:0.1:5)
df = DataFrame(y = f.(x), x = x)

scatter(df.x, df.y, legend=false, smooth=true)
``````

That being said, you stated you are trying to learn GLM.jl, so you can also do something like this:

``````ols = lm(@formula(y ~ x), df)
``````

You can then get the coefficients by calling `coef(ols)` which you can use to plot in a couple of different ways:

``````# create your own function from your coefficients:
model(x) = coef(ols)[1] + coef(ols)[2] * x
scatter(df.x, df.y)
plot!(x, model.(x), legend=false)

# make use of linear algebra to compute all the y values:
scatter(df.x, df.y)
plot!(x, (coef(ols)' * hcat(ones(size(df,1)), x)')', legend=false)
``````

Or, as @nilshg points out:

``````scatter(df.x, df.y)
plot!(x, predict(ols, DataFrame(x = x)), legend=false)
``````

I’ve personally always found it very annoying that you have to pass in some `Table` type to the `predict` function but it is of course still more convenient than the two alternatives I showed.

Note that this basically re-invents the `predict` function in GLM, which can also give you confidence or prediction intervals. Cf this thread here: Plot the confidence interval for a model fit

3 Likes

Thank you for your reply, removing `:` let it calculate the formula. I thought `y` and `x` could be replaced by column headings.

Now I am trying to plot it but have difficulties. Basically I want to do is what is done in R with the commands`geom_smooth(method = "lm")` and `geom_smooth(method = "loess")`.

Following the example here

``````df = DataFrame(A = rand(500), B = repeat(1:10, inner=50), C = 1:500)
linear_model_df = lm(@formula(A ~ C),df)
pred = DataFrame(df(C)); #this does not work
pred.y = predict(model, pred)

``````

Edit: I guess I am confused when to use `x` and `y` and when to use column headings.

Does one always need to state the intervals for the x axis?

It seems to me that your problems come from a more fundamental misunderstanding of Julia’s syntax rather than the specifics of GLM or Plots.jl (or any other plotting library).

In the above, you are calling `DataFrame(df(C))`, which probably should be `DataFrame(C = df.C)`. In Julia, round brackets `()` denote function calls, so `df(C)` means "call function `df` with argument `C`", which doesn’t really make sense as `df` isn’t a function.

3 Likes

Thank you for your replies. I acknowledge that I have only basic understanding of programing syntax and often make mistakes. I will continue to try and learn.

``````df = DataFrame(A = rand(500), B = repeat(1:10, inner=50), C = 1:500)
linear_model_df = lm(@formula(A ~ C),df)
pred = DataFrame(C = 1:500);
pred.y = predict(linear_model_df, pred)

plot(df.C,df.A)
plot!(pred.C,pred.y)
``````
2 Likes