Very basic GLM help

Hello, I have been trying to learn how to use GLM.jl but I really don’t understand it. My goal is to show a linear model on a plot. I know how to create the base plot, but not the linear model or how to plot that onto the plot. Please help me - I really need baby step instructions with this it seems. (or with Algebra of Graphics, I’m not picky, I just need to learn how to do it with something)

df = DataFrame(A = 1:2:1000, B = repeat(1:10, inner=50), C = 1:500)
lm(@formula(:A ~ :C),df)  #doesn't work

I don’t know too much about linear models, but following the manual of GLM you should try:

df = DataFrame(A = 1:2:1000, B = repeat(1:10, inner=50), C = 1:500)
lm(@formula(A ~ C),df)

note the missing : in front of the variables in formula.
But if you are only interested in linear regression most of the plotting options can do that for you e.g.,
http://gadflyjl.org/stable/gallery/statistics/#[Stat.smooth](@ref)-1
or
https://www.queryverse.org/VegaLite.jl/stable/examples/examples_advancedcalculations/#Linear-Regression-1

1 Like

As @Rudi79 points out, the error with your linear model is with the syntax and if all you want is to add a regression line, you can use the smooth keyword argument in Plots.jl, for example. Have a look at this example:

using DataFrames
using GLM
using StatsPlots

x = rand(10:0.1:30, 30)
f(x) = x + rand(-5:0.1:5)
df = DataFrame(y = f.(x), x = x)

scatter(df.x, df.y, legend=false, smooth=true)

That being said, you stated you are trying to learn GLM.jl, so you can also do something like this:

ols = lm(@formula(y ~ x), df)

You can then get the coefficients by calling coef(ols) which you can use to plot in a couple of different ways:

# create your own function from your coefficients:
model(x) = coef(ols)[1] + coef(ols)[2] * x
scatter(df.x, df.y)
plot!(x, model.(x), legend=false)

# make use of linear algebra to compute all the y values:
scatter(df.x, df.y)
plot!(x, (coef(ols)' * hcat(ones(size(df,1)), x)')', legend=false)

Or, as @nilshg points out:

scatter(df.x, df.y)
plot!(x, predict(ols, DataFrame(x = x)), legend=false)

I’ve personally always found it very annoying that you have to pass in some Table type to the predict function but it is of course still more convenient than the two alternatives I showed.

Note that this basically re-invents the predict function in GLM, which can also give you confidence or prediction intervals. Cf this thread here: Plot the confidence interval for a model fit - #4 by nilshg

3 Likes

Thank you for your reply, removing : let it calculate the formula. I thought y and x could be replaced by column headings.

Now I am trying to plot it but have difficulties. Basically I want to do is what is done in R with the commandsgeom_smooth(method = "lm") and geom_smooth(method = "loess").

Following the example here

df = DataFrame(A = rand(500), B = repeat(1:10, inner=50), C = 1:500)
linear_model_df = lm(@formula(A ~ C),df)
pred = DataFrame(df(C)); #this does not work
pred.y = predict(model, pred)

Edit: I guess I am confused when to use x and y and when to use column headings.

Does one always need to state the intervals for the x axis?

It seems to me that your problems come from a more fundamental misunderstanding of Julia’s syntax rather than the specifics of GLM or Plots.jl (or any other plotting library).

In the above, you are calling DataFrame(df(C)), which probably should be DataFrame(C = df.C). In Julia, round brackets () denote function calls, so df(C) means “call function df with argument C”, which doesn’t really make sense as df isn’t a function.

2 Likes

Thank you for your replies. I acknowledge that I have only basic understanding of programing syntax and often make mistakes. I will continue to try and learn.

df = DataFrame(A = rand(500), B = repeat(1:10, inner=50), C = 1:500)
linear_model_df = lm(@formula(A ~ C),df)
pred = DataFrame(C = 1:500);
pred.y = predict(linear_model_df, pred)

plot(df.C,df.A)
plot!(pred.C,pred.y)
2 Likes